-
Notifications
You must be signed in to change notification settings - Fork 4
ReproducibilityStandards
PeterAdolphs edited this page Jul 5, 2010
·
6 revisions
In order to facilitate comparison with and reproducibility of experiments using DELPH-IN data and tool sets, this page documents standard training and testing data sets for each grammar, and standard evaluation metrics and terminology. We encourage everyone to use the standards listed here, or to describe any deviations in terms of these standards.
- observed coverage: percentage of items that received at least one parse
- verified coverage: percentage of items for which a gold standard analysis was found during treebanking
It is important to specify whether these metrics are calculated over:
- all items in a data set
- all items that have a gold standard analysis
- all items that received a parse
- the intersection of the last two
-
exact match: percentage of items for which the top analysis was the gold analysis
-
DMRS
Home | Forum | Discussions | Events