Skip to content

Data and code from the paper "Generation of Training Data for Named Entity Recognition of Artworks"

Notifications You must be signed in to change notification settings

HPI-Information-Systems/art-ner-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Generation of Training Data for Named Entity Recognition of Artworks

Data and pre-trained models from the paper Generation of Training Data for Named Entity Recognition of Artworks published in the Semantic Web Journal 2023 issue.

Data

Pending approval/license by the owner of the corpus.

Models

The models can be downloaded from here

SpaCy

The Spacy pre-trained model 'en_core_web_md' was used a baseline for further training with domain related annotations. The version of Spacy is 3.3.0. Documentation related to the same is available here.

To use the spacy model to annotate a file with texts (see spacy_model/example_file.csv), download the model folder and run the script spacy_model/run_spacy.py as follows

python run_spacy.py model_location example_file.csv

Flair

The Flair model was trained using GloVe (en-glove) and forward and backward Flair Embeddings (news-X). More information on these embedding models can be found in Flair's documentation

In order to run the model with a sentence, the script flair_model/RunNER.py can be executed with the following command

python RunNER.py final-model.pt "This is a sentence"

About

Data and code from the paper "Generation of Training Data for Named Entity Recognition of Artworks"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages