-
Notifications
You must be signed in to change notification settings - Fork 4
RmrsPos
For compatibility with RMRS and software designed to integrate deep and shallow processing, all MRS pred values should conform to the following templates:
_lemma_pos_sense_rel (lexically introduced predicates)
sense_rel (abstract predicates introduced by constructions)
The lemma component is a string corresponding to the (stem) orthogrophy of the lexical entry, at least for all open-class words, and typically also for closed-class words. By using the stem orthography, we ensure that predicate names used in MRSs produced by deep grammars will be interoperable with predicate names used in MRSs produced by robust shallow processors, which name the predicates based on (lemmatized) forms in the input. The leading underscore is used to distinguish predicate names introduced by specific lexical entries from those introduced by constructions or by lexical types supplying a common predicate for a class of lexical entries.
The mapping from string to lemma is not necessarily one-to-one, for example, some spelling variants are combined here. For example, in one version of the ERG, color and colour both produce the predicate _color_n_1_rel.
The pos component is one of a closed set of single lowercase letters interpreted as described in the following section. This POS-based information (such as might be accessible from a POS-tagger) is used for coarse-grained sense distinctions.
Finer-grained distinctions can be made (in a precision grammar) via the sense component. The sense component can consist of any sequence of characters (letters, numbers, etc.), excluding the underscore which is used to separate the components of the name. In the ERG, verb particle constructions are handled semantically by having the verb contribute a relation particular to that combination. We distinguish these relations by placing the particle's orthography in the sense field. Unlike the other components, the sense component is optional, and if omitted, its separating underscore is also omitted. By convention, a predicate name with no sense component is interpreted as underspecified for sense, so if more than one sense is present in the lexicon for a given orthography and part of speech, each of these predicate names should have a sense component.
Note thatfor the ERG, there was a decision to always have something in the sense field, even if there is currently no ambiguity. The main motivation is to reduce future re-working of the SEM-I as later enrichment might add further sense distinctions.
Every relation and predicate name ends in _rel, for the convenience of the grammar writer, particularly to avoid possible namespace collisions. This suffix (and the leading underscore) can of course be suppressed by MRS display methods if desired.
Somed examples of names are given below:
aardvark | _aardvark_n_rel |
bank | _bank_n_2_rel |
bank | _bank_v_turn_rel |
look | _look_v_up_rel |
Finally, one further detail of formatting should be mentioned: Words with single lexical entries whose orthography is conventionally spelled with a space, such as the English use of ad hoc, appear with the whole orthography in the orth component, but with the space(s) replaced by the plus sign. So the following example is also correct:
ad hoc | _ad+hoc_j_rel |
Note: it is best not use single alphabetic characters in the sense field: _dog_n_a_rel. Doing this would make it much harder to identify all adjective/adverbs with a single regular expression /_a_/. Instead, use a number or two or more characters.
These are the POS labels in the RMRS. They form a rough set of POS tags, suitable for exchange between different POS inventories.
Label | Explanation | Example | Comment |
n := u | noun | banana_n_1 | WordNet n |
v := u | verb | bark_v_1 | WordNet v |
a := u | adjective or adverb (i.e. supertype of j and r) | fast_a_1 | |
j := a | adjective | WordNet a | |
r := a | adverb | WordNet r | |
s := n, s:= v | verbal noun (used in Japanese and Korean) | benkyou_s_1 | |
c := u | conjunction | and_c_1 | |
p := u | adposition (preposition, postposition) | from_p_1, kara_p_1 (から_p_1) | |
q := u | quantifier (needs to be distinguished for scoping code) | this_q_1 | |
x := u | other closed class | ahem_x_1 | |
u | unknown |
Home | Forum | Discussions | Events