mukayese/spell-checking at main · alisafaya/mukayese

History

Name		Name	Last commit message	Last commit date
parent directory ..
hrzafer		hrzafer
ours		ours
README.md		README.md
evaluate.py		evaluate.py
predict_hunspell.py		predict_hunspell.py
predict_zemberek.py		predict_zemberek.py

README.md

Spell Checking

Requirments

- cyhunspell==2.0.2
- zemberek-python==0.1.2

trspell-10 dataset

Download and extract data from the v1.0 release of mukayese-datasets.

cp /data/to/mukayese-datasets/trspell-10.zip .
unzip trspell-10.zip

Hunspell Based Turkish Spell Checkers

Hunspell spell checking models consist of two files : .dic and .aff. First file contains the word roots and second file contains the affixes.

python predict_hunspell.py hrzafer trspell-10.csv hrzafer_preds.jsonl
python predict_hunspell.py ours trspell-10.csv ours_preds.jsonl

Zemberek Based Turkish Spell Checkers

python predict_zemberek.py trspell-10.csv zemberek_preds.jsonl

Evaluation

You can evaluate predictions using evaluate.py script.

Evaluating Hunspell based hrzafer predictions:

$ python evaluate.py --input-file hrzafer_preds.jsonl 

Error Detection Scores:
	Precision = 76.40
	Recall = 99.73
	F1-Score = 86.52

Error Correction Accuracy = 25.52

Evaluating Hunspell based ours predictions:

$ python evaluate.py --input-file ours_preds.jsonl

Error Detection Scores:
	Precision = 100.00
	Recall = 99.25
	F1-Score = 99.62

Error Correction Accuracy = 71.72

Evaluating Zemberek based predictions:

$ python evaluate.py --input-file zemberek_preds.jsonl

Error Detection Scores:
	Precision = 94.31
	Recall = 98.93
	F1-Score = 96.56

Error Correction Accuracy = 62.12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spell-checking

spell-checking

README.md

Spell Checking

Requirments

trspell-10 dataset

Hunspell Based Turkish Spell Checkers

Zemberek Based Turkish Spell Checkers

Evaluation

Files

spell-checking

Directory actions

More options

Directory actions

More options

Latest commit

History

spell-checking

Folders and files

parent directory

README.md

Spell Checking

Requirments

trspell-10 dataset

Hunspell Based Turkish Spell Checkers

Zemberek Based Turkish Spell Checkers

Evaluation