-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Data Serialization Format
Spec 0.1
This is a data serialization standard. It is actually composed of two specification layers. That of the data file itself (layer 1). And that of the schema file that is used to provide further interpretation of the data file (layer 2).
The original intent of this specification is to provide a consistent and flexible way to create technical configurations and data documents. As such it is more akin to win.ini spec than to JSON or XML.
Humans will read the documents. As such, it should be easy for a typical human to read and understand.
Please note that the emphasis is on ‘readable’. Not necessarily on ‘writable’. Because of the ‘Deterministic’ goal (see next section), it is a very strict format. There should be only exactly one way to do things. Because humans are prone to error, it is not expected that humans will easily manually create or edit strict documents.
To help, there is also a “loose” incoming interpretation that account for human errors somewhat. It is expected that a utility be available in all platforms that ‘compiles’ the loose form to a strict form.
This document, when it introduces specifications, will first document the STRICT specification. It then will make suggestions for a LOOSE interpretation.
A goal of the strict version of the MARDS is predictable determinism at all levels.
If two different programs output the same data to MARDS files, those resulting files should be absolutely identical. If a program imports the data from a MARDS data file; makes no changes to the data; and then outputs the same data to a new MARDS data file, the content of that new file should be absolutely identical.
To make this possible, MARDS only allows there to be one way to represent data.
The schema layer, which is technically optional, greatly eases the use of MARDS. There is a general expectation that a schema will be used from the beginning.
A data file consists of a UTF8 text file with:
- A header
- One or more lists of name/value tuples
When strict, the following three lines are required at the top of the file:
%MARDS data 0.1\n %schema [url]\n \n
The [url] is to be replaced by the URL referencing the document’s schema file. Or, [url] can be ‘*’, which means there is no schema file of reference. The ‘\n’ sequences are to be replace with carriage return characters separating the lines.
For the schema document:
%MARDS schema 0.1\n \n
Everything in the body consists of named lists.
Named Lists
First, let’s describe precisely what a named list is.
A named list is a ordered series of tuples. Each tuple contains a name and value, where the name is a description of the value.
So, for example:
color “blue” length “17”
is a named list. In this example, “blue” is a value with a name of “color” and “17” is a value with a name of “length”.
In some ways, a named list is similar to a dictionary, but with a major distinction: the name need not be unique. So, the following is also a named list:
color “blue” color “red” length “17”
In this example, “blue” then “red” have a name of color and “17” has a name of “label”.
Format
The format of a single name/value tuple is:
tabbing name value \n
Each of these element is described in the following sections.
tabbing
Each line starts with 0 or more space characters. This represents the structural level of indentation represented by the line. The number of spaces is n*4, where n represents the depth level of the line.
name
The name of a line represents a ‘label’ to be associated with the value. A name is made up UTF8 unicode characters. The minimum length of the line is 1 character. The maximum length is 80 characters. The name can only include letters, numbers, and the underscore character. Other punctuation is not permitted. True spaces are not permitted. Specifically, XX and XX and the variants under XXXX are forbidden. However, then non-progressing XX character used for non-breaking seperation in some non-latin languages is permitted. Also permitted is the ‘%’ percent sign as long as it is NOT the first character.
If string data is not to have a name, it is represented with an asterisk (code xx). This is a common behaviour for list-like information.
value
A value is a sequence of unicode characters enclosed by quotes.
The actual length of a string is unlimited. (However, see the line continuation section for long strings.)
In loose mode, if quotes are missing, then it is simply interpreted as an unquoted string:
name joe 2 3 slsl 92
becomes:
name “joe 2 3 slsl 92”
If additional quotes are found within the outer quotes, then those quotes are not interpreted. They are simply part of the string. There is no 'escaping' as seen in other formats. So, to represent Larry "Jim" Jones, one can simply do:
name "Larry "Jim" Jones"
More Rules
All values are strings
All values in the name/value tuples are strings. These strings often have greater meaning in the context of the schema. However, at layer 1, all values are strings.
Value order is not arbitrary
The order of values are part of the information stored.
So, for example:
color “blue” color “red” length “17”
is not the same as:
color “red” color “blue” length “17”
Name order is not arbitrary
The order of names is part of the information stored and is not arbitrary.
So, for example,
color “blue” color “red” length “17”
is different than:
length “17” color “blue” color “red”
Ironically, this ordering requirement is difficult to represent in most of the known computer languages.
It should be noted, that if a Schema is used, the name order is strictly defined by the scheme.
Names at the same level of hierarchy must be grouped
like names must be grouped together.
So, for example:
color “blue” length “17” color “red”
is not valid MARSDL. The ‘color’ values must be sequentially grouped together.
“Unnamed” values are named with an asterisk
By convention, a name/value tuple can be “unnamed” by using an asterisk as the name.
For example:
- “front-left”
- “part 3”
However, this is strictly a convention. Those tuples are named. They are named with an asterisk.
Values can be omitted from a tuple
A name/value tuple can have a ‘missing’ value. This is represented by simply not having anything following the name other than a line termination.
For example:
color “blue” length
In this example, the value “blue” has a name of length. However, there is no value assigned to the label length.
Please note that a missing value is NOT the same thing an empty string.
length
and
length “”
are very different. In the first case, the value is missing. In the second case, the value is an empty string.
Another example:
length “4” length “62.3” length length length “19”
In that example, the first values of length are 4 and 62.3. Then, two values are missing. followed by a value of 19.
Name/value tuples can subtend tuples.
A name/value tuple can also be a reference to another series of name/value tuples. Each subtending series is called a ‘level’ of hierarchy.
The subtended series is indicated by a tab level increase of four spaces. When the tab level increase is seen, then the previous line is the reference to this new series.
For example:
addresses ip “192.168.2.33” ip “192.168.2.19” domain “example.com”
In that example, the label of “addresses” has no value. That label (with no value) is a reference to another series containing three more tuples.
Another example:
house “14” siding “brick” doors “3” house “7” siding “stucco” mailbox “2932” stories “2”
In the example, the label of “house” has a value of “14”. It also references a series with two tuples. Then, a label of “house” has a value of “7”. It also reference as series with three tuples.