Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parsing TFS from tokens #349

Open
arademaker opened this issue Aug 6, 2022 · 4 comments
Open

parsing TFS from tokens #349

arademaker opened this issue Aug 6, 2022 · 4 comments
Labels

Comments

@arademaker
Copy link
Member

arademaker commented Aug 6, 2022

Do we have any method to parse the TFS from tokens?

token [
+FORM "cats"
+FROM "4"
+TO "8"
+ID *diff-list* [ LIST *cons* [ FIRST "1" REST *list* ] LAST *list* ]
+TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG "NNS" +PRB "1.0" ] ]
+CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL - ]
+TRAIT token_trait [ 
 +UW -
 +IT italics
 +LB bracket_null [ LIST *list* LAST *list* ]
 +RB bracket_null [ LIST *list* LAST *list* ]
 +LD bracket_null [ LIST *list* LAST *list* ]
 +RD bracket_null [ LIST *list* LAST *list* ]
 +HD token_head [ +TI "<4:8>"
   +LL ctype [ -CTYPE- string ]
   +TG string ] ]
+PRED predsort
+CARG "cats"
+TICK +
+ONSET c-or-v-onset ]
@oepen
Copy link

oepen commented Aug 7, 2022

Do we have any method to parse the TFS from tokens?

please see lkb::read-dag() in

http://svn.delph-in.net/lkb/trunk/src/glue/dag.lsp

in [incr tsdb()] this is invoked by tsdb::reconstruct(), which will recreate the full feature structure associated with the derivation, including any information 'infused' into the lexical entries from the underlying token feature stuctures, e.g. characterization.

@arademaker
Copy link
Member Author

Hi @goodmami and @oepen,

[t.to_dict() for t in result.derivation().preterminals()]
[{'entity': 'the_1',
  'id': 149,
  'score': -1.639588,
  'start': 0,
  'end': 1,
  'type': 'd_-_the_le',
  'form': 'the',
  'tokens': [{'id': 91,
    'tfs': 'token [ +FORM \\"the\\" +FROM \\"0\\" +TO \\"3\\" +ID *diff-list* [ LIST *cons* [ FIRST \\"0\\" REST *list* ] LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG \\"DT\\" +PRB \\"1.0\\" ] ] +CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL + ] +TRAIT token_trait [ +UW - +IT italics +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list* LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null [ LIST *list* LAST *list* ] +HD token_head [ +TI \\"<0:3>\\" +LL ctype [ -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \\"the\\" +TICK + +ONSET c-or-v-onset ]'}]},...

So I tried the LKB code with the string from the tfs field above, am I right @oepen ?

LKB> (read-dag "token [ +FORM \"the\" +FROM \"0\" +TO \"3\" +ID *diff-list* [ LIST *cons* [ FIRST \"0\" REST *list* ] LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG \"DT\" +PRB \"1.0\" ] ] +CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL + ] +TRAIT token_trait [ +UW - +IT italics +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list* LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null [ LIST *list* LAST *list* ] +HD token_head [ +TI \"<0:3>\" +LL ctype [ -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \"the\" +TICK + +ONSET c-or-v-onset ]")
NIL

@goodmami
Copy link
Member

@arademaker addressing your initial question: no, I don't think I ever got around to adding support for parsing those token structures, but I had thought about it. The delphin.tfs.TypedFeatureStructure class should be capable of containing it once it's parsed, but this TFS format is slightly different from TDL (notice, e.g., there's no commas between feature values), so we can't just use the TDL parser.

@oepen
Copy link

oepen commented Aug 13, 2022

So I tried the LKB code with the string from the tfs field above,

do you have the right grammar loaded? recreating the token feature structure requires the type hierarchy and constraints available, i.e. a complete unifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants