crash if there is no non-special characters present #23

muebau · 2019-01-03T18:40:46Z

It looks like I found some problem:

> inaugLIWCanalysis <- liwcalike(c("Hello"), liwc2015dict)

> inaugLIWCanalysis <- liwcalike(c("Hello..."), liwc2015dict)

> inaugLIWCanalysis <- liwcalike(c("123 :-)"), liwc2015dict)

The above works perfectly.
Now try a input without any non-special characters:

> inaugLIWCanalysis <- liwcalike(c("..."), liwc2015dict)
... snip ...
Browse[5]> where
where 1: textstat_readability.corpus(corpus(x), measure, remove_hyphens, 
    min_sentence_length, max_sentence_length, ...)
where 2: textstat_readability(corpus(x), measure, remove_hyphens, min_sentence_length, 
    max_sentence_length, ...)
where 3: textstat_readability.character("...")
where 4: quanteda::textstat_readability("...")

Browse[5]> n
debug: x <- char_trim(x, "sentences", min_ntoken = min_sentence_length, 
    max_ntoken = max_sentence_length)
Browse[5]> x
text1 
"..." 
Browse[5]> n
debug: n_sent <- nsentence(x)
Browse[5]> x
named character(0)
Browse[5]> n
Error in x[[length(x)]] : 
  attempt to select less than one element in integerOneIndex
In addition: Warning messages:
1: In nsentence.character(x) :
  nsentence() does not correctly count sentences in all lower-cased text
2: In structure(unlist(x, recursive = FALSE), class = "tokens", names = attrs$names,  :
  Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
  Consider 'structure(list(), *)' instead.

So there is some trim of characters but no check if this gives an empty result. So a simple ":-)" in the input (not this uncommon) might become a problem.

What do you think?

kbenoit · 2019-01-06T07:26:34Z

Thanks for the note. Can you please provide a reproducible example? I don't need the debugging output, just the error message that I can reproduce. Thanks!

muebau · 2019-01-07T15:43:21Z

library(quanteda)
library(quanteda.dictionaries)

okString <- "Hello"
output <- liwcalike(okString, dictionary = data_dictionary_NRC)
head(output)

Works perfectly

library(quanteda)
library(quanteda.dictionaries)


errorString <- "..."
output <- liwcalike(errorString, dictionary = data_dictionary_NRC)

will crash

muebau · 2019-01-07T16:23:18Z

library(quanteda)
library(quanteda.dictionaries)

okString <- "ThisWordExceeds65CharsThisWordExceeds65CharsThisWordExceeds65Chars"
output <- liwcalike(okString, dictionary = data_dictionary_NRC)
head(output)

Works perfectly

library(quanteda)
library(quanteda.dictionaries)

okString <- "ThisWordExceeds80CharsThisWordExceeds80CharsThisWordExceeds80CharsThisWordExceeds80Chars"
output <- liwcalike(okString, dictionary = data_dictionary_NRC)
head(output)

will crash (the "word" is 80 characters or longer)

the cause is this trim

char_trim(x, "sentences", min_ntoken = min_sentence_length, max_ntoken = max_sentence_length)

in

textstat_readability.corpus(corpus(x), measure, remove_hyphens, min_sentence_length, max_sentence_length, ...)

kbenoit · 2019-01-07T20:15:08Z

Thanks, will investigate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crash if there is no non-special characters present #23

crash if there is no non-special characters present #23

muebau commented Jan 3, 2019

kbenoit commented Jan 6, 2019

muebau commented Jan 7, 2019

muebau commented Jan 7, 2019

kbenoit commented Jan 7, 2019

crash if there is no non-special characters present #23

crash if there is no non-special characters present #23

Comments

muebau commented Jan 3, 2019

kbenoit commented Jan 6, 2019

muebau commented Jan 7, 2019

muebau commented Jan 7, 2019

kbenoit commented Jan 7, 2019