Skip to content

Latest commit

 

History

History
19 lines (14 loc) · 749 Bytes

README.md

File metadata and controls

19 lines (14 loc) · 749 Bytes

Serendip


A modification of the tool described in this paper: https://graphics.cs.wisc.edu/Papers/2014/AKVWG14/Preprint.pdf

We expect to implement it for these datasets:

  1. Religious Texts: https://www.kaggle.com/metron/public-files-of-religious-and-spiritual-texts
  2. Wikipedia Articles: https://meta.wikimedia.org/wiki/Data_dump_torrents#English_Wikipedia
  3. Hillary Clinton Emails: https://www.kaggle.com/kaggle/hillary-clinton-emails
  4. News articles: https://www.kaggle.com/snapcrack/all-the-news

We'll are using LDA and Word2Vec for topic modelling and text comparison.


Previous implementation can be found here: https://github.com/uwgraphics/SerendipSlim Or you can visit their website: http://vep.cs.wisc.edu/serendip/