Serendip

A modification of the tool described in this paper: https://graphics.cs.wisc.edu/Papers/2014/AKVWG14/Preprint.pdf

We expect to implement it for these datasets:

Religious Texts: https://www.kaggle.com/metron/public-files-of-religious-and-spiritual-texts
Wikipedia Articles: https://meta.wikimedia.org/wiki/Data_dump_torrents#English_Wikipedia
Hillary Clinton Emails: https://www.kaggle.com/kaggle/hillary-clinton-emails
News articles: https://www.kaggle.com/snapcrack/all-the-news

We'll are using LDA and Word2Vec for topic modelling and text comparison.

Previous implementation can be found here: https://github.com/uwgraphics/SerendipSlim Or you can visit their website: http://vep.cs.wisc.edu/serendip/