ANLP-assignment-2

Accelerated Natural Language Processing (2020) assignment 2: Exploring distributional similarity in Twitter

1. What did we research?

In this report, how different context vectors and similarity computation methods influence the similarities between words was investigated.

2. Experiment design

We picked six groups of words. For each group, there is a reference word, and there were three words for each of three categories: similar, moderately similar and not similar (compared to the reference word).

Four methods were evaluated. Two of the methods are PPMI and t-test, which are used to compute context vectors of words; the other two methods are cosine similarity and euclidean distance, which are similarity measures to compare two context vectors of two words.

By combining the two methods of calculating the word vector and the method of calculating the similarity, we obtain four methods of calculating the similarity between two words and apply these methods to a given word list.

3. Files

asgn2.py is the code to do the experiment. In this code, we implemented functions of t_test, euclid_sim, cos_sim, create_ppmi_vectors and create_t_test_vectors.

report.pdf contains of our findings and analysis about how the similarity calculation methods influence the final outcome of similarities.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
asgn2.py		asgn2.py
assignment2.pdf		assignment2.pdf
cw2_report.pdf		cw2_report.pdf
load_map.py		load_map.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ANLP-assignment-2

1. What did we research?

2. Experiment design

3. Files

About

Releases

Packages

Languages

Victoria-Qiao/ANLP-assignment-2

Folders and files

Latest commit

History

Repository files navigation

ANLP-assignment-2

1. What did we research?

2. Experiment design

3. Files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages