This corpus consists of a total of around 30 000 sentences from around 10 000 YouTube comments. Each sentence is manually annotated as either being a threat of (or sympathy with) violence or not.
The corpus is described in the papers "THREAT: A Large Annotated Corpus for Detection of Violent Threats" (Hammer et al. 2019). A previous version of the corpus was thoroughly evaluated in "Threat detection in online discussions" (Wester et al. 2016) and represent a natural benchmark for future research. The version of the corpus used in Wester et al. (2016) is not publicly available, but can be obtained by contacting the authors. Both articles are included in the folder 'Articles', along with the bib-files for referencing the articles. Please cite both papers in any work using the corpus.
The corpus can be downloaded by following the link below. By clicking the link to download you accept that the coruus is for academic use only and that you will delete the dataset on request.
>>I accept terms of use, proceed to download<<
Sincerely, Hugo Lewi Hammer, Michael Riegler, Lilja Øvrelid and Erik Velldal