-
Notifications
You must be signed in to change notification settings - Fork 4
ItsdbTop
itsdb is a hard to pronounce, powerful tool for profiling and treebanking.
The automated installation procedure explained on the LkbInstallation page includes itsdb. To invoke itsdb if you've installed the system this way:
M-x lkb RET
M-x itsdb RET
When you annotate an item, the old unannotated entry for that item in the database is not deleted, but rather the database is augmented with another entry recording the updated information about that item, along with a version indicator showing that the annoated entry is more recent than the original one. But this version annotation is not dynamically queried when you impose conditions, so to make the version information usable you have to periodically "normalize" the database.
You normalize by selecting Trees | Normalize and give a name for the new normalized database (since the old one will not be overwritten). This step should not be too time-consuming as long as your databases has fewer than 3000 items in them (recommended). In Hinoki, we find a database with 2000 items and a maximum of 5,000 results is quite slow, taking several hours (2005-03-25).
NOTE: Remember to set the Options | TSQL Condition to no condition, otherwise only some trees will be normalized.
This saves only the results for good trees, making a much smaller profile.
It is possible to save MRSs for the treebanked sentences by setting (setf tsdb::*redwoods-semantix-hook* "mrs::mrs-get-string").
If you for some reason you wish to delete all the trees (e.g., you made a false start with the update process for Run 2) you can (verrrrryyyy carefully) discard the new annotations by selecting Trees | Clear Cut. Be certain (as in positive) that you are removing these annotations for Run 2, not the hand-coded real annotations you constructed painstakingly for Run 1, as clear cutting fells all the trees.
One of the defining properties of the Redwoods treebanks is that they are dynamic: the treebank can be updated when the grammar changes.
Because the discriminants are saved for each parse forest, even when the grammar changes, re-annotation is only necessary in cases where either the parse has become more ambiguous, so new decisions have to be made, or changes in rules or lexical items have made the parse so different that the earlier discriminants are not applicable.
Updating is a two step process: Fully Automatic (which will annotate all trees that are uniquely determined) and Interactive (which will present the annotator with any new decisions that need to be made).
-
Select the gold standard profile (middle button) [<font color="gold">something.n</font>]
-
Select the target profile (Left button) [<font color="lightblue">something/grammar</font>]
-
Load the same grammar as the target profile [(rsa "japanese")]
-
Set Trees | Switches | Automatic Update, and nothing else.
-
Select Trees | Update
-
Wait for a tree annotation window to pop up ...
-
The updates are color coded:
- Magenta: A single correct parse was found
- Blue: There was only one parse but it was not one that was determined by the gold annotations
- Black: There are still remaining ambiguities
You do your annotation on Run 1 of some test suite TS with version A of the grammar. Then change the grammar to produce version B. Create a new instance of the test suite TS for Run 2, and Process | All Items
- using grammar version B. Next, select Run 1 as your gold
standard (middle click), select Run 2 as the current database (left click), make sure version B of the grammar is loaded into the \lkb, make sure that \menuuu{Trees}{Switches}{Automatic Update} is selected, and then select \menuu{Trees}{Update}. This will cause the tree annotation window to appear, and begin zooming through your items, incorporating all annotations that it can from the original treebank in Run 1, and adding those annotations to Run 2.
This will give all the sentences that satisfy the update-match-p() predicate (defined in lkb/src/tsdb/lisp/redwoods.lisp). The default is inputs for which the recorded discriminants fully disambiguate, where there is more than one reading, or those where there is only one reading, and it is the same as before.
;; during updates, a `save' match is indicated by the following conditions:
;;
;; - the current item has not been tree annotated already;
;; - the number of active trees in the current set equals the number of
;; active trees in the gold set;
;; - either the current item has more than one reading, or that single one
;; reading has the exact same derivation as the preferred tree from the
;; gold set.
;; - also, when in `exact-match' update mode, be content if there is one
;; unique result.
;;
Note that has not been tree annotated does not mean the same as unannotated. The former means has not been touched before (e.g. there is no entry in the tree file) the latter means that it has been touched (e.g. there is an entry in the tree file with the value -1).
Treebank only those paces that have changed
-
Select the gold standard profile (middle button) something.n
-
Select the target profile (Left button) something/grammar
-
Load the same grammar as the target profile (rsa "japanese")
-
Unset Trees | Switches | Automatic Update
-
Set Options | TSQL Condition | Unannotated
-
Select Trees | Update
In this stage, the annotator can annotate any trees that have changed, exploiting any relevant existing decisions.
If your profile has MRSs scored, you can check whether they scope or not by setting the switch Options | Result Filter | Mrs Scoping and then browsing the results: Browse | Results. Helpful diagnostic messages should be ouptut in the *common-lisp* buffer.
Note, if you have too many results stored, then this will be either very slow, or crash. You should only really do it for profiles with 1 or 2 results per item.
Home | Forum | Discussions | Events