Use these NLP, Text Mining and Machine Learning code samples and tools to solve real world text data problems.
Links in the first column take you to the subfolder/repository with the source code.
Task | Related Article | Source Type | Description |
---|---|---|---|
Large Scale Phrase Extraction | phrase2vec article | python script | Extract phrases for large amounts of data using PySpark. Annotate text using these phrases or use the phrases for other downstream tasks. |
Word Cloud for Jupyter Notebook and Python Web Apps | word_cloud article | python script + notebook | Visualize top keywords using word counts or tfidf |
Gensim Word2Vec (with dataset) | word2vec article | notebook | How to work correctly with Word2Vec to get desired results |
Reading files and word count with Spark | spark article | python script | How to read files of different formats using PySpark with a word count example |
Extracting Keywords with TF-IDF and SKLearn (with dataset) | tfidf article | notebook | How to extract interesting keywords from text using TF-IDF and Python's SKLEARN |
Text Preprocessing | text preprocessing article | notebook | A few code snippets on how to perform text preprocessing. Includes stemming, noise removal, lemmatization and stop word removal. |
TFIDFTransformer vs. TFIDFVectorizer | tfidftransformer and tfidfvectorizer usage article | notebook | How to use TFIDFTransformer and TFIDFVectorizer correctly and the difference between the two and what to use when. |
Accessing Pre-trained Word Embeddings with Gensim | Pre-trained word embeddings article | notebook | How to access pre-trained GloVe and Word2Vec Embeddings using Gensim and an example of how these embeddings can be leveraged for text similarity |
Text Classification in Python (with news dataset) | Text classification with Logistic Regression article | notebook | Get started with text classification. Learn how to build and evaluate a text classifier for news classification using Logistic Regression. |
- For more articles, please see this list.
- If you would like to receive articles via email subscribe to my mailing list.
This repository is maintained by Kavita Ganesan. Connect with me on LinkedIn or Twitter.