Skip to content

LexDbInternals

BenjaminWaldron edited this page May 17, 2005 · 17 revisions

The LexDB uses a PostgreSQL database to provide a source of lexical items for client applications such as the LKB.

The ''fld'' table

The fld table stores user-defined the field definitions used in constructing the rev table below. The contents are set by the script install-lexdb. Field definitions cannot be altered once the LexDB has been created.

The ''rev'' table

The rev table stores revisions of lexical items. It has the following structure. The first 4 fields, which hard-coded into every LexDB, have the following definitions:

name TEXT NOT NULL,
userid TEXT DEFAULT user NOT NULL,
modstamp TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP NOT NULL,
dead BOOLEAN DEFAULT 'f' NOT NULL

Following these are the user-defined fields specific to a particular LexDB. These are obtained from the .fld file provided to the script install-lexdb. They are also stored in the public.fld table of the LexDB for later reference. The following are the user-defined fields used by the ERG LexDB:

 type TEXT
 orthography TEXT
 keyrel TEXT
 altkey TEXT
 alt2key TEXT
 keytag TEXT
 altkeytag TEXT
 compkey TEXT
 ocompkey TEXT
 pronunciation TEXT
 complete TEXT
 semclasses TEXT
 preferences TEXT
 classifier TEXT
 selectrest TEXT
 jlink TEXT
 comments TEXT
 exemplars TEXT
 usages TEXT
 lang TEXT
 country TEXT
 dialect TEXT
 domains TEXT
 genres TEXT
 register TEXT
 confidence real DEFAULT 1
 source TEXT

The fields (name,userid,modstamp) provide the primary key. The field dead is used to marks dead revisions.

The ''rev_key'' table

This table provides keys for the lookup of lexical items by component words. Eg. a revision with orthography 'a few' will be keyed on both 'a' and 'few'. Keys are in normalized (lower case) form as provided by the clinet application. (We do not use the PostgreSQL lower() function as it may differ to the equivalent function used in the client application.)

name TEXT NOT NULL,
userid TEXT DEFAULT user NOT NULL,
modstamp TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP NOT NULL,
key text NOT NULL

The ''dfn'' table

The dfn table stores the mapping used to construct TDL lexical entries from fields of a revision in rev (the entry can then be processed by the client application in the same manner as entries obtained from a textual TDL lexicon file). See LexDbFieldMappings.

mode TEXT NOT NULL,
slot TEXT NOT NULL,
field TEXT NOT NULL,
path TEXT,
type TEXT

The ''meta'' table

This table stored assorted configuration settings and other data.

A sample public.meta is shown below:

          var          |                    val
-----------------------+-------------------------------------------
 lexdb-version         | 4.00
 supported-psql-server | 7.4
 supported-psql-server | 8.0
 filter                | TRUE
 pub-fn                | check_psql_server_version
...
 pub-fn                | dump_public_rev_rev_key_to_tmp_tmp_key
 user                  | bmw20
 mod_time              | 2005-05-17 09:48:59.415022+01

A sample private meta is shown below:

    var     |              val
------------+-------------------------------
 filter     | TRUE
 mod_time   | 2005-05-17 09:48:59.422795+01
 build_time | 2005-05-17 10:04:37.91322+01
  • user is set for each user for whom a private schema has been initialized;

  • filter is an SQL WHERE-clause which determines which rev entries are visible to a user's lex table;

  • mod-time stores the time at which data in the schema was last modified;

  • build_time stores the time at which the (private schema) lex table was last rebuilt.

Public and private schemas

The PostgreSQL database is created and owner by the database user lexdb. This user is the owner of the schema public. In order to make use of the LexDB a client must log on as a separate user. The first time such a user connects to the LexDB a private database schema will be initialized.

A private schema contains a private rev, rev_key and meta tables. When lexical items are modified by the user (or new items created) changes are stored in the private schema. When a user is happy with any changes the new rev (and associated rev_key) entries are then transferred to the public schema. (This requires a lexdb login authorization.)

A private schema also contains a lex and a lex_key tables. These have the same structure as a rev / rev_key table and are used to provide a cache of the user's current lexicon -- that is, we take the union of public.rev with the user's private rev, pass the entries though the user's filter, and take head (most recent) revisions. (The views filt and head correspond to stages in this process, but should not generally be accessed directly.)

... TO DO

Clone this wiki locally