Skip to content
FrancisBond edited this page Aug 31, 2012 · 14 revisions

SC corpus sense annotation alignment

SC corpus has now been automatically aligned to the SemCor sense annotations. The alignment process found realpred or gpred matches for 96.3% of SemCor word forms. The remaining word forms were either mapping to elements treated by the ERG as semantically empty (e.g., copulas), or treated as MWE by the ERG but not by WordNet (‘such+as’, ‘right+then’, ‘not+even’).

The alignment program generated modified DMRS files, with an optional <sense> element:

<node nodeid='10002' cfrom='0' cto='6'>
   <realpred lemma='first' pos='a' sense='1'/>
   <sortinfo cvarsort='e' sf='prop' tense='untensed' mood='indicative' prog='minus' perf='minus'/>
   <sense wn='2' lexsn='5:00:00:ordinal:00' wn_lemma='first'/>
</node>

The sense-annotated DMRS output is available here

There is also an updated dmrs.dtd and SemCoreMapping.csv: a mapping from each SC corpus item to the annotated SemCor 3.0 concordance, context, and sentence number.

Clone this wiki locally