ItsdbTop

itsdb is a hard to pronounce, powerful tool for profiling and treebanking.

Installation

The automated installation procedure explained on the LkbInstallation page includes itsdb. To invoke itsdb if you've installed the system this way:

   M-x lkb RET
   M-x itsdb RET

Treebanking

Normalizing

When you annotate an item, the old unannotated entry for that item in the database is not deleted, but rather the database is augmented with another entry recording the updated information about that item, along with a version indicator showing that the annoated entry is more recent than the original one. But this version annotation is not dynamically queried when you impose conditions, so to make the version information usable you have to periodically "normalize" the database.

You normalize by selecting Trees | Normalize and give a name for the new normalized database (since the old one will not be overwritten). This step should not be too time-consuming as long as your databases has fewer than 3000 items in them (recommended). In Hinoki, we find a database with 2000 items and a maximum of 5,000 results is quite slow, taking several hours (2005-03-25).

NOTE: Remember to set the Options | TSQL Condition to no condition, otherwise only some trees will be normalized.

Thinning Normalizing

This saves only the results for good trees, making a much smaller profile.

It is possible to save MRSs for the treebanked sentences by setting (setf tsdb::*redwoods-semantix-hook* "mrs::mrs-get-string").

Clear Cutting

If you for some reason you wish to delete all the trees (e.g., you made a false start with the update process for Run 2) you can (verrrrryyyy carefully) discard the new annotations by selecting Trees | Clear Cut. Be certain (as in positive) that you are removing these annotations for Run 2, not the hand-coded real annotations you constructed painstakingly for Run 1, as clear cutting fells all the trees.

Updating

One of the defining properties of the Redwoods treebanks is that they are dynamic: the treebank can be updated when the grammar changes.

Because the discriminants are saved for each parse forest, even when the grammar changes, re-annotation is only necessary in cases where either the parse has become more ambiguous, so new decisions have to be made, or changes in rules or lexical items have made the parse so different that the earlier discriminants are not applicable.

Updating is a two step process: Fully Automatic (which will annotate all trees that are uniquely determined) and Interactive (which will present the annotator with any new decisions that need to be made).

Fully Automatic Update

Select the gold standard profile (middle button) [<font color="gold">something.n</font>]
Select the target profile (Left button) [<font color="lightblue">something/grammar</font>]
Load the same grammar as the target profile [(rsa "japanese")]
Set Trees | Switches | Automatic Update, and nothing else.
Select Trees | Update
Wait for a tree annotation window to pop up ...
The updates are color coded:
- Magenta: A single correct parse was found
- Blue: There was only one parse but it was not one that was determined by the gold annotations
- Black: There are still remaining ambiguities

You do your annotation on Run 1 of some test suite TS with version A of the grammar. Then change the grammar to produce version B. Create a new instance of the test suite TS for Run 2, and Process | All Items

using grammar version B. Next, select Run 1 as your gold

standard (middle click), select Run 2 as the current database (left click), make sure version B of the grammar is loaded into the \lkb, make sure that \menuuu{Trees}{Switches}{Automatic Update} is selected, and then select \menuu{Trees}{Update}. This will cause the tree annotation window to appear, and begin zooming through your items, incorporating all annotations that it can from the original treebank in Run 1, and adding those annotations to Run 2.

This will give all the sentences that satisfy the update-match-p() predicate (defined in lkb/src/tsdb/lisp/redwoods.lisp). The default is inputs for which the recorded discriminants fully disambiguate, where there is more than one reading, or those where there is only one reading, and it is the same as before.

  ;; during updates, a `save' match is indicated by the following conditions:
  ;;
  ;;   - the current item has not been tree annotated already;
  ;;   - the number of active trees in the current set equals the number of
  ;;     active trees in the gold set;
  ;;   - either the current item has more than one reading, or that single one
  ;;     reading has the exact same derivation as the preferred tree from the
  ;;     gold set.
  ;;   - also, when in `exact-match' update mode, be content if there is one
  ;;     unique result.
  ;;

Note that has not been tree annotated does not mean the same as unannotated. The former means has not been touched before (e.g. there is no entry in the tree file) the latter means that it has been touched (e.g. there is an entry in the tree file with the value -1).

Interactive Update

Treebank only those paces that have changed

Select the gold standard profile (middle button) something.n
Select the target profile (Left button) something/grammar
Load the same grammar as the target profile (rsa "japanese")
Unset Trees | Switches | Automatic Update
Set Options | TSQL Condition | Unannotated
Select Trees | Update

In this stage, the annotator can annotate any trees that have changed, exploiting any relevant existing decisions.

Debugging Aids

MRS Checking

If your profile has MRSs scored, you can check whether they scope or not by setting the switch Options | Result Filter | Mrs Scoping and then browsing the results: Browse | Results. Helpful diagnostic messages should be ouptut in the *common-lisp* buffer.

Note, if you have too many results stored, then this will be either very slow, or crash. You should only really do it for profiles with 1 or 2 results per item.

Home | Forum | Discussions | Events

Provide feedback

Saved searches

Use saved searches to filter your results more quickly