Translation Graph Definition
by Tom Veatch, 2018
A Translation Graph is definable in multiple interconvertible forms,
for various purposes, including:
(A) Mathematically: as a certain type of set-theoretic object.
(B) TGML: in one or more text files using a translation graph markup
language, TGML, for long-term storage.
(C) DB: in a set of structured database tables, for server-side processing
and access
(D) JSON: in a JSON encoded data structure, for transmission
(E) JS: in a set of populated data structures created and manipulated in
JavaScript or other implementation language, for client-side and
other processing.
These are given in order below.
----------------------------------------------------------------------
(A) Mathematical Form:
----------------------------------------------------------------------
A well-formed Translation Graph or "Document" is an 5-tuple
{C,Types,Tiers,N,A,L} with C, a set of (document names or) classes;
Types, a set of (tier or data) types; Tiers, a set of tiers; N, a set
of contentless nodes, and A, a set of labelled arcs; and L, a set of
labels.
Classes are the title and other names and properties of the document
or its tiers. A document has at least one class, its name or title. A
tier inherits the classes of the document of which it is a part but
may have classes of its own. A set of classes applying to tiers can
show the hierarchy or other structure of relationships among related
but different documents, such as different translations, recieved
versions, renditions. Unlike the Annotation Graph formalism, here it
is not required that one tier represent the unique primary document.
If two or more tiers share a given class (and all within a TG share
its name class), the nodes shared by the tiers represent the
sequential structure of that shared class. A hierarchy of classes
and their progressively refined segmentations emerges from this.
Types are the data types and perhaps methods of access or display
associated with a tier, for example, 'Roman alphabet encoded Words in
Hindi' or Roman_sentences or youtube video segments, etc. A type of a
tier applies to the content of every arc in that tier.
Contentless Nodes and optionally content-bearing Arcs are vertexes and
edges of a directed acyclic graph comprising the document.
A document has a single first node and a single last node, which are
the first and last nodes of every tier. (Thus, each Tier exhausts,
or is the same length as, the entire document.)
A document contains one or more tiers. Each tier comprises a subset
of the nodes and of the arcs, with the constraint that within a tier
there is a unique total ordering of all nodes in the tier from first
to last.
Every Arc links a single predecessor node to a single successor node.
The content of each Arc is an empty or non-empty Label or string,
which may contain arbitrary data.
Every Tier contains a single path of alternating nodes and arcs from
the first to the last node: No branching within Tiers. (This
constraint may prove impractical and be removed later, but seems
helpful to begin with, for reasoning about TGs. Obviously this is not
a minimized graph structure, which implementations may potentially
derive.)
It follows that within a Tier, the number of Nodes minus one is the
number of Arcs.
----------------------------------------------------------------------
(B) Translation Graph Markup Language
----------------------------------------------------------------------
Summary:
* A TGML document defines a TG and comprises one or more text
files each containing a header, and one or more tiers within
optional .. tags.
* Classes are specified in
tags.
* A tier and its types are specified in .. tags.
* Nodes within a tier are enumerated with tags.
* Arcs within a tier are enumerated with .. tag pairs.
* Labels are simply the content arcs between .. tag pairs.
Discussion:
The "in" specified in a tag is used to specify the
list of file names or URIs that the TG is to be found in. A
wellformed TG can incorporate a pre-existing TG without modifying
the latter by specifying the latter's file name, along with its own
file name in the in=".." tag specifier (a comma separated list of
file names). In parsing a TG from TGML starting from a given file,
the list of files it is in is expanded by unification with other
in=".." tags in the various files referenced, starting with the
initial file. If inconsistency of naming or segmentation is
discovered, an error should be thrown and the inconsistency fixed by
the author(s).
Headers specify the classes of the document. The headers of a single
TG, which may be distributed tier by tier across multiple TGML
files, must all share the document name/title/id and may share or
not share other document classes. Headers may also refer to other
files in the TG. A TG file without them is interpreted with two
default classes: first the document title class, which defaults to
the file name itself, and second the document author class, which
defaults to the file owner's user name, if one is provided by the
operating system, or "anonymous" if not. A header is specified by a
single tag. class=".."
may be specified within the header tag, where .. is a
comma-separated list of key:value pairs, where each key defines a
kind of classification of the document, and the associated value
defines the particular classification of that kind.
Each tier specifies one or more types applicable to the entire tier,
followed by a tier body.
Multiple tiers may be specified overlappingly ("simultaneously")
with multiple mutually including initial and final and
tags so long as the tier bodies use all the same nodes in
the same order, and there is one arc for each tier linking each
adjacent pair of nodes. (T T N A A N A A N A A /T /T suggests the
pattern.)
If .. tags are not found, then the entire file
remainder is considered as a single tier with tier name tn=0. If no
type is specified explicitly in a tag, then
type="ref:auto,charset:utf-8" is inferred.
Note: ref:auto is intended to mean that the data in the label,
instead of being interpretable as references to other things,
simply refers to itself, or is auto-referential. This is a
fancy way of saying the characters represent themselves; the
text represents itself; the content is automatically there;
you don't need to get it from somewhere else that the content
references. If instantiated by a program that goes and gets
the content referred to by the data in an arc, well, that
program can simply stick in the very content itself, as the
referenced data.
Other tier types may guide the interpretation of arc content. For
example, ref:http means the arc content is a URI concatenatable with
http:// to create a URL which when accessed will deliver the content
to be inserted for that arc. Similarly, ref:file means the arc
content is a file name.
Note: an example referring to http://youtube.com/aaaaaaaa is:
youtube.com/aaaaaaaa
The tn of a tier is its name, which may be a number, which is
identically given in all the arcs of the tier.
The tier body is a first node (if absent, then is
inferred, though nn can be any string) followed by a sequence of
optionally content-bearing arcs marked for that tier, and
contentless nodes possibly also occurring in other tiers, ending
with a last node.
Note: If unnamed the last node is named -1 by default. -1 is
used because the largest integer of a given unsigned numeric
data type is often a bit pattern that means -1 in the
corresponding signed numeric data type.)
Note: node names are strings and may contain numeric strings. A
naming convention uses the linguistic boundary symbol, #,
before or after the name of an adjacent arc. Thus #a1 is the
node preceding arc a1, while a1# is the node after arc a1.
Thus a plain text file without any tags for TGML, header, tier,
nodes, or arcs is equivalent to the following explicit markup, with
$variables reconstituted in the obvious way.
Each node is a tag with optional node name, nn=".." item(s)
giving the node's name(s), each of which is unique within the entire
document. No is expected (will be ignored), since nodes
span no content. If nn=".." is not given, a parsing system will
construct a unique node name string, and no segmentation boundary
shared across tiers at that point is implied. Nodes on different
tiers with the same name are the same node; that is, shared naming
of nodes across tiers is the way shared segmentation is encoded. If
no shared node name is explicitly given, no sharing is to be
inferred; whereas, if a node name appears in two tiers, then the
arcs after it and the arcs before it in those tiers share an
alignment of preceding or following segmentation boundaries,
respectively. Nodes may be named by numbers or string names,
including multiple comma-separated names. Consistent renaming of
nodes preserves the shared segmentation structure, but might
obliterate theoretical distinctions: One or another linguistic
theory may demand specific multiple segmentation namings such as
word boundary, sentence boundary, etc. Although the constellation
of segmentation points organizing all the content in all the tiers
would remain constant under a node renaming operation, the
significance of a node as both sentence boundary and word boundary,
for example, could be lost unless the node name retained that
information, as exemplified in , which would be
a way to name a node that precedes word 25 and at the same time
sentence 3. If a node has multiple names, which may show up
independently in various tiers, that node is the same node no matter
its name. If it has been declared to have multiple names by those
names occurring, comma-separated in a node name string, those
different node names for it are simply synonyms from the perspective
of TGML node structure; the node named is a single node.
A node may have any number of predecessor and successor arcs, up to
the number of tiers, (in case it is shared across all the tiers),
except the start and end nodes have zero predecessors and successors
respectively. A live data structure may deterministically construct
a list of the predecessor or successor arcs of a node but in TGML
the association of arc with node is done on the arc where the
association is one to one, by the labelling of P and S nodes in
arcs.
Each arc must have one P (predecessor) and one S (successor) node,
and lives on one or more tier(s) as named by the tier name tn="..", and
may have zero or more arc names, where each arc name must be unique
within the tier. An arc may be given in an implicit or explicit
form. An arc may be re-used on another tier by adding the other tier
name to the arc.
There are some subclasses of arcs: implicit, explicit, referring,
derived (by merging not split), and empty arcs.
* An implicit arc is an arc constructed during the parsing of a TGML
Tier upon encountering one node followed by arbitrary content,
comprising zero or more characters of text, followed by another
node: the Arc is constructed to receive a unique, created name or
number; its tier is the one being parsed; its predecessor and
successor are the preceding and succeeding nodes; and its content is
the content occurring between the nodes.
* An explicit arc, by contrast, is defined with explicit information
elements in an tag, and may or may not be given in the text of
the file in the order implied by the P and S node names in the
explicit tag, which override the textual ordering. The tier name if
unspecified within an explicit arc defaults to the tn of the enclosing
...
* A referring arc, A, is an arc which has the information in it needed
to replace it with a dereferenced arc or arc/node sequence, whose
content is defined as the join of the contents of a sequence of arcs
between A's start node, #A, and end nodes, A#, on another compatible
tier which also shares A's start and end nodes. It refers to the
content by tier and node names, thus:
If joinc is '' or unspecified, then the arcs from $P to $S are
copied into the referring/receiving tier, using the same nodes. If
joinc is specified as a nonempty string, then a single arc replaces
A. The literal construction of the arc's contents seems entirely
optional; a constructor would insert the join, using the joinc
string, of the contents of the other tier's arcs from #A to A# into
a replacement arc with the same name, tier, predecessor and
successor nodes.
Therefore, to edit parts of a document into a different order, just
create a new tier with referring arcs for the moved sections, which
can be deferenced from the arcs in the un-reordered tier. This lets
you skip creating an entire copy; a little indexing data does the
whole job.
* A (merger-) derived arc is an arc derived by algorithm from a
sequence of arcs on a named (source) tier spanning from the derived
arc's start node and its end node which nodes of course need to be
on both tiers. Exemplified as:
This asserts that in the $derivedtier, the contents of the
$sourcetier would be taken between the same nodes, all the
intermediate nodes removed and their arc labels joined into a single
arc label string by a join() operation on space. (joinc=" " could
replace the requirement of a JavaScript integration here implied by
by="join(' ')".)
------ Start Footnote: ------
A merging derivation is one kind of thing, a splitting
derivation another. fromt=$tn and by=$function provide for merging
derivation from a sequence of arcs to a single merged arc. A
splitting derivation process might be expressed in code by telling the
code to take a given tier and make another derived from it by
splitting it, say, on space, or end of sentence or using a
morphological analyser to split on morpheme boundaries, etc. The
space of such algorithms includes as subsets the space of
morphological analysis, syntactic parsing, and as such cannot be
expected to be simple, though "split on space" is simple enough that
TGML could imaginably, though non-transparently, be twisted into a
form that it could do something like this:
`generate_arcs_and_nodes(fromt,100,101,"a100.","n100.")`
If TGML were extended to understand backquotes as execution contexts
and arbitrary function calls, such as what a JavaScript integration
might provide, then a TGML parsing system could look at tier fromt
from node fromnode (here, 100) to node tonode(here, 101) constructing
nodes and arcs to go into the current tier with names constructed by
prepending arcnameprefix (here "a100.") to arc names 0,1, .. and
prepending nodenameprefix (here "n100.") to node names 0,1, .. and
with arc contents constructed by whatever algorithm is given in the
generate_arcs_and_nodes() function. That way a seemingly adjacent
pair of nodes can be specified to have a sequence of arcs and nodes
between them, generated by a splitting process from the data in the
other tier between those nodes. It could be as simple as split(" ")
to divide the content up based on space characters, but I say just
write a simple segmenter in that case, and keep it out of the TGML
definition. For now.
------ End Footnote: ------
* Finally, an empty arc may also be useful especially in partial
translations/annotations. Thus,
blablabla
may be filled in with empty (contentless) arcs between start and end
nodes and the given annotation (i.e., from 0 to 10 and 11 to -1).
This lets us maintain the concept that every tier shares the same
start and end nodes, if only trivially and by courtesy since the
non-empty content of the tier is in the middle somewhere. A
different specification for TGs might instead enable tiers to start
and end in the middle and not share the requirement of
document-global start and end nodes. We can wait to see if that
would be better or not; for now we have a way to do something
roughly equivalent, and indeed more general since empty arcs after
all can be interior arcs as well as initial & final arcs in a tier.
Also if a tier can start and end anywhere, then a simple lookup is
turned into a potentially large search problem.
Further work:
TGML being a file format, it would be convenient to have a parser to
verify, and possibly (offer to) correct, the required properties of
a well-formed, valid TGML file. For example, if two arcs claim the
same tier and the same predecessor node, there is an error, since
each tier is a single sequence of arcs and nodes. A TGML editor
could be written perhaps as an EMACS mode or perhaps within a web
browser.
----------------------------------------------------------------------
(C) DB
----------------------------------------------------------------------
TGs in the Database world have a life cycle. A set of tables are
constructed following the structure below. Raw files or TGML encoded
TGs in files may be parsed and INSERTED into the TG tables. UIs may
call for, as well as modify and update, or delete, data from the TG
tables via SELECT, JOIN, SORT, etc. operations. Typically there will
be a server-side SQL engine taking requests from a client UI, making
the appropriate SELECT requests, packaging up the results as JSON, and
sending that to the client for display, reference, editing, etc. As
worked on, a TG may develop more and more tiers.
Language and translation resource data can be derived from TG data and
provided to support the learner; SQL structures and protocols for this
are closely related and needed.
SQL TABLE structure for representing a set of TGs follows:
--------------------
TABLES
MEMBERS
--------------------
docs
id: integer primary key
name/title: text
author: text
URL: text
// content: text NONOJESUS
// checksum: text (MD5 of content)
classes // (can name/classify docs and tiers with this as also via name/author, types, etc.)
// This creates a classification structure within the tiers of the document.
id: integer primary key
doc_id: integer foreign key to docs(id)
tier_id: integer foreign key to tiers(id) // 0 means none, i.e., a doc class
key: text (such as "date of publication", or "LG")
value: text (such as "February 6, 2018" or "en")
tiers
id: integer primary key
doc_id: integer foreign key to docs(id)
tiertypes
// Specify interpretation and data handling methods for tiers.
tier_id: integer foreign key to tiers(id)
// type: text // e.g. file http direct mp4 video audio 32khz 16bit
key: text // e.g., method mediatype rate fileformat sourcetier derivationmethod
value: text // e.g. http video 30hz mp4 $tn lookup(L2,L1,$data)
nodes
idx: integer primary key
id: text
doc_id: integer foreign key to docs(id)
// offset: integer NONOJESUS.
// predarcs: NO, derive that from arc information
// succarcs: NO, derive that from arc information
// A node is not a pointer referring to a text offset,
// but an anchor that the content arcs refer to. Nodes are the
// primary data in a TG, along with their sequential
// pattern as realized in the structure of arc linkages between them.
// Text and content are hung on this primary structure, not the reverse.
arcs
id: integer primary key
name: text
doc_id: integer foreign key to docs(id)
tier_id: integer foreign key to tiers(id)
pred_id: integer foreign key to nodes(id) // no other arc in the same doc+tier can have the same pred_id
succ_id: integer foreign key to nodes(id) // no other arc in the same doc+tier can have the same succ_id
data: text (utf-8)
data is parseable according to the arc's tier's types.
It is a sequence of zero or more identifiers encoded as utf8 text
If type is "auto", data is read directly as the sequence of characters themselves.
Or it can be a delimited sequence of references according to
the tier's type, such as id's of lemmas, wordtypes, wordtokens,
or URIs or other references to outside content.
If prefixed with an access method like http: or file: etc.,
and suffixed by access data such as a URI, the method must be
consistent (redundant) with the tier's types.
---- a useful SQL table set for processing monolingual language data: ----
lemmas -- one row per distinct lexical element (dictionary entry)
id
pos -- part of speech
citation -- orthographic string representing the "citation form" for the lemma
... -- other stuff as needed
wordtypes -- one row per distinct word form or space-separated punctuation string in the corpus
id
orth -- character string; how this word/punt type appears in text
-- the remaining fields apply only to word forms (not free-standing punctuation):
lemma_id -- integer foreign key to lemmas(id)
type_label -- morphological categorization
type_segs -- morphologically segmented rendering of the orthographic form (optional)
... -- other stuff as needed
wordtokens -- one row per space-separated token in the corpus
id
type_id -- integer foreign key to types(id)
doc_id -- integer foreign key to docs(id)
seqno -- relative position of the token within the doc
prev_punc -- punctuation string (if any) attached at beginning of orthographic word
foll_punc -- punctuation string (if any) attached at end of orthographic word
---- Multilingual data: ----
Incidentally, dictionary.org provides a protocol for storage and
access of dictionary data.
This section is a placeholder for SQL or other systems for collecting
language translation data from a growing corpus of worked-up TGs into
dictionaries, concordances, references to audio & video, etc.
Citation phonetics and phonology are provided to the learner in
integrated but separate presentation, available any time relative to
the symbols of the script. Pronunciation tutorials are accessed by a
menu item found by selecting a character in a tier of that language.
Fast speech and dialect phonetics are learned by reading phonological
or citation-form transcripts associated with segmented audio. Once
the learner sees the words in their strange but given L2 order and
with the grammatical tags plus explanations of the tags, they can put
together the syntax options and interpretations themselves; only a
little explicit instruction might be needed to assert syntax
requirements that aren't obvious from positive examples.
But mainly, the learner uses TGs and TG-derived data resources to
approach L2 through word based learning. (I use "word" loosely.) A
word in L2 corresponds to a list of words in L1, all those that it has
been translated as. There may be an inferrable core meaning that
explains the many L1 options, in context; the learner is encouraged to
pull out and remember that core as the meaning of a somewhat
untranslatable L2 word. A word translation can also include some
grammatical tags like 3rd Person Singular or 3Sg so that the grammar
can be understood. There may be a bunch of contexts in which it
occurs in L2, in text, audio, and video, which might be of interest
and helpful to the learner to solidify their understanding of its
meaning and usage by seeing it used in various contexts. A learning
system should keep track of all these different kinds of
correspondences or translation relationships, and be able to present
them effectively at appropriate times, so the learner can integrate
these different kinds of knowledge into their own growing L2
competence.
TG-related software capturing all this would need types for each tier
specifying the tier's language, perhaps as a hierarchy, up to include
language family, and down to include dialect or for that matter
speaker/actor. We would also like to collect, store, and make
accessible the translations implied by the correspondence of
translation tiers. If L1 and L2 tiers both contain a given pair of
nodes, then the corresponding content in the two tiers has a
translation relationship. The minimum corresponding segment pair
would be typically the unit of study. If words, the correspondence
can be copied into a dictionary store. If sentences, it might be
useful in building concordances. In two languages, corresponding
sentences will have different word ordering; the mapping between the
words does not respect ordering. Thus an interlinear glossed
translation (IGT) at the word level is not commutative; translating
from L2 to L1, the IGT presentation retains the word order of the L2
form, while inserting L1 glosses plus grammatical tags. It could be
worth the effort to build an IGT in one direction but not in the other
direction, and a learner might have to make do. Constructing a
similar result in the other direction tags L1 not L2 forms, and uses
L1 not L2 word ordering. If a grammatical tagger is built for each
language, it may then be easy to go both ways. We have work and
leverage here for linguists, each word translated becomes a dictionary
entry; each dictionary entry becomes an offered translation for each
word to be translated; other summary statistics are also to be
updated, and machine translation and other models depending thereon
periodically recalculated.
----------------------------------------------------------------------
(D) JSON
----------------------------------------------------------------------
JavaScript Object Notation as a form for a TG is exemplified below.
JSON is convenient for storage and transmission of individual
documents and tiers, such as between server and client. A client may
request a segment of a TG based on a set of node names between which,
in the different tiers, content is requested, and the JSON form is able
to equally represent such multi-tier segments and entire TGs.
/* an example of a tg in JSON, representing a poem, "Tom lvs Liz" with
* a segmentation thereof into words
*/
tg = { // array of tier names, array of arrays of named arc objects, named-array of nodes
header: [ title: "A Poem",
author: "Tom Veatch",
nTiers: 2,
read_encoding: "utf-8", // a default to be minimally used; captures ASCII files.
write_encoding: "utf-8", // consider using UTF-32 internally and for writing.
tiernames: [ "Words", "Sentences" ],
baseurl: "https://tomveatch.com/tg/"
tierlocs: [ "poem.tg#words", "poem.tg#sentences" ]
],
// p for predecessor, s for successor.
arctiers: [ // Should arcs be numbered or tiernamed or both? both, consistently, after renumbering.
// arc names are sortable only within tiers: no total ordering across tiers.
{ "t0.a0": {txt:"Tom",p:"A",s:"B"},
"t0.a1": {txt:"lvs",p:"B",s:"C"},
"t0.a2": {txt:"Liz",p:"C",s:"D"}
},
{
"t1.a0": {txt:"Tom lvs Liz",p:"A",s:"D" }
}
],
// here each node contains a list (one per tier) of preds and
// successor arcs. Note that in TGML the nodes do not need to name
// their P and S arcs, since they may be implicit, and the arcs do
// that work, but in JSON or other fully-realized data structures the
// referencing is worked out explicitly since it may be convenient to
// be able refer either direction between connected arcs and nodes.
nodes: {
"A": {p: ["",""], s: ["t0.a0","t1.a0"] },
"B": {p: ["t0.a0","" ], s: ["t0.a1","" ] },
"C": {p: ["t0.a1","" ], s: ["t0.a2","" ] },
"D": {p: ["t0.a2","t1.a0"], s: ["" ,"" ] }
}
};
----------------------------------------------------------------------
(E) JS
(Try to maintain consistency between theory and code)
----------------------------------------------------------------------
A TG may also be realized in active computer memory in the form of a
set of populated data structures, created and manipulated in
JavaScript or other implementation language, for client-side or other
processing. A software library with convenient functions to
read/write/traverse/modify a TG would be helpful.
Sample JS code to create and initialize/serialize various TG data
objects follows:
function At(from[],to[]) { // "At" a span: nodes define start/end within each tier.
this.p[] = duplicateArray(from[]);
this.s[] = duplicateArray(to[]);
}
function Arc(label,p,s,content) {
this.label=label;
this.p=p;
this.s=s;
this.content=content;
this.writeArc = function() {
print this.content;// we can assume arcs print only between the nodes they are At
}
this.readArc = function() {
// we should know the previous Node,
// we should parse through into the next Node, so that we also know that Node.
// then assign
label = "t$i.a$j";
p = prev node label;
s = succ node label;
content = read content up to "";
}
this.readNode = function(tierN) {
// in arctxt, reading we must store it for the following arc,
// also we might get called for pre-reading while setting up that arc.
// we should pre-parse through into the next Node, so that we also know that Node.
// Here we have to keep the tier straight
scanf("",label);
nodes[label].p[tierN] = the previous arc label;
nodes[label].s[tierN] = the successor arc label;
// also we don't know where we are in the node array until we pull out the id label,
// with which we can index to the right node's p[] and s[] properties.
}
}
function TG(title,author,nTiers) {
this.arctiers = []; // array of tiers, each being an associative array of arcs
this.nodes = {}; // associative array of nodes, each with preds/succs for each tier
this.docinfo = { "doctitle": title,
"author": author,
"nTiers": nTiers,
"encodings": [],
"tiernames": [],
"tierlocs": []
};
this.specifyTier = function (tierName,tierLoc,tierEncoding) {
if (tierName === undefined) tierName = "unstructured";
if (tierLoc === undefined) tierLoc = "file:tg.txt";
if (tierEncoding === undefined) tierEncoding = "utf8"; // could map to UTF-32 on read
this.tiernames.push(tierName);
this.tierlocs.push(tierLoc);
this.tierencoding.push(tierEncoding);
};
this.copyTier = function(i) {
nt = this.docinfo.nTiers++;
at = this.arcTiers[nt] = duplicateArray(this.arcTiers[i]);
for (n in this.nodes) { n.s.push(""); n.p.push(""); }
for (a in at) {
this.nodes[a.p].s[nt] = a.key;
this.nodes[a.s].p[nt] = a.key;
}
};
this.serialize = function(io) {
if (io==="read") {
baseTier = empty(this.nodes)?true:false;
for (i=0;i
if (ok && baseTier) { this.firstNode = n; }
while (ok) {
ok=ok&&readArc(); // expect any text until
ok=ok&&n=readNode(); // expect any text until
}
if (ok && baseTier) { this.lastNode = n; }
// confirm the final node is the global final node.
}
} else { // assume io==="write"
writeDocInfo(this.docinfo);
for (i=0;i\n'
;
};
It would be convenient, certainly, to have JS code that reads/writes
TGML, JSON, and SQL, such that a TG in any one format can be converted
into any other, and back again, and such code can be tested by making
sure the information and structure survive transcoding in each
direction.