You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am encountering an issue using the TextReuseCorpus function where I feed in a vector of texts (using the "text = " option in the function, and: (1) receive a warning of skipped texts due to insufficient length on character strings that should be long enough; and (2) get a different number of skip warnings each time. I am reading in a large vector (>300,000) of texts, ranging from 155 to 9900 characters, and usually 30k to 150k are skipped for being too short. I can take these same skipped strings, run TextReuseCorpus on them, and they'll be fine this time around. Perhaps I'm simply doing something wrong?
The text was updated successfully, but these errors were encountered:
Hello,
I've been working on that, but that seems to be part of the issue -
consistent problem but not reproducible cases. I'll keep toying with it and
see if I can hone in.
Tyler
On Thu, Jan 16, 2020, 6:05 PM Lincoln Mullen ***@***.***> wrote:
Can you please provide a reproducible example?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#90?email_source=notifications&email_token=AA4G7XYSIGDMEFBXRV3P6UDQ6EG4VA5CNFSM4KH7KZ22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJGGHAQ#issuecomment-575431554>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA4G7X3XNA3BF6LKSVCAIFLQ6EG4VANCNFSM4KH7KZ2Q>
.
Following up -- I can't seem to generate a reproducible example, as the behavior is different every time, but I suspect that might point to an issue outside the package? The behavior occurs when the number of texts is above a certain threshold. For instance, I consistently get skip notices when n = 50k, but never when n = 25k.
However, I can run the same code twice at 50k and get different sets of skipped values:
I am encountering an issue using the TextReuseCorpus function where I feed in a vector of texts (using the "text = " option in the function, and: (1) receive a warning of skipped texts due to insufficient length on character strings that should be long enough; and (2) get a different number of skip warnings each time. I am reading in a large vector (>300,000) of texts, ranging from 155 to 9900 characters, and usually 30k to 150k are skipped for being too short. I can take these same skipped strings, run TextReuseCorpus on them, and they'll be fine this time around. Perhaps I'm simply doing something wrong?
The text was updated successfully, but these errors were encountered: