Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash if there is no non-special characters present #23

Open
muebau opened this issue Jan 3, 2019 · 4 comments
Open

crash if there is no non-special characters present #23

muebau opened this issue Jan 3, 2019 · 4 comments

Comments

@muebau
Copy link

muebau commented Jan 3, 2019

It looks like I found some problem:

> inaugLIWCanalysis <- liwcalike(c("Hello"), liwc2015dict)
> inaugLIWCanalysis <- liwcalike(c("Hello..."), liwc2015dict)
> inaugLIWCanalysis <- liwcalike(c("123 :-)"), liwc2015dict)

The above works perfectly.
Now try a input without any non-special characters:

> inaugLIWCanalysis <- liwcalike(c("..."), liwc2015dict)
... snip ...
Browse[5]> where
where 1: textstat_readability.corpus(corpus(x), measure, remove_hyphens, 
    min_sentence_length, max_sentence_length, ...)
where 2: textstat_readability(corpus(x), measure, remove_hyphens, min_sentence_length, 
    max_sentence_length, ...)
where 3: textstat_readability.character("...")
where 4: quanteda::textstat_readability("...")

Browse[5]> n
debug: x <- char_trim(x, "sentences", min_ntoken = min_sentence_length, 
    max_ntoken = max_sentence_length)
Browse[5]> x
text1 
"..." 
Browse[5]> n
debug: n_sent <- nsentence(x)
Browse[5]> x
named character(0)
Browse[5]> n
Error in x[[length(x)]] : 
  attempt to select less than one element in integerOneIndex
In addition: Warning messages:
1: In nsentence.character(x) :
  nsentence() does not correctly count sentences in all lower-cased text
2: In structure(unlist(x, recursive = FALSE), class = "tokens", names = attrs$names,  :
  Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
  Consider 'structure(list(), *)' instead.

So there is some trim of characters but no check if this gives an empty result. So a simple ":-)" in the input (not this uncommon) might become a problem.

What do you think?

@kbenoit
Copy link
Owner

kbenoit commented Jan 6, 2019

Thanks for the note. Can you please provide a reproducible example? I don't need the debugging output, just the error message that I can reproduce. Thanks!

@muebau
Copy link
Author

muebau commented Jan 7, 2019

library(quanteda)
library(quanteda.dictionaries)

okString <- "Hello"
output <- liwcalike(okString, dictionary = data_dictionary_NRC)
head(output)

Works perfectly

library(quanteda)
library(quanteda.dictionaries)


errorString <- "..."
output <- liwcalike(errorString, dictionary = data_dictionary_NRC)

will crash

@muebau
Copy link
Author

muebau commented Jan 7, 2019

library(quanteda)
library(quanteda.dictionaries)

okString <- "ThisWordExceeds65CharsThisWordExceeds65CharsThisWordExceeds65Chars"
output <- liwcalike(okString, dictionary = data_dictionary_NRC)
head(output)

Works perfectly

library(quanteda)
library(quanteda.dictionaries)

okString <- "ThisWordExceeds80CharsThisWordExceeds80CharsThisWordExceeds80CharsThisWordExceeds80Chars"
output <- liwcalike(okString, dictionary = data_dictionary_NRC)
head(output)

will crash (the "word" is 80 characters or longer)

the cause is this trim

char_trim(x, "sentences", min_ntoken = min_sentence_length, max_ntoken = max_sentence_length)

in

textstat_readability.corpus(corpus(x), measure, remove_hyphens, min_sentence_length, max_sentence_length, ...)

@kbenoit
Copy link
Owner

kbenoit commented Jan 7, 2019

Thanks, will investigate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants