Translation Graphs by Tom Veatch, 2003, 2008, 2017-18 ------------------------------------------------------------------- Motivation ------------------------------------------------------------------- Suppose you are competent in one language (call that language L1) and you are interested in a document (whether as text, audio, or video) in another language which you don't know (call that language L2). Wouldn't it be nice if someone had done some work on that document to make it accessible to you in that other language? Of course a normal paragraph by paragraph translation of the whole, which you may find in a published translation of that authored work, would do something like this, but you wouldn't be getting it in the other language at all, but in your own language. You'd learn the translated content, but you wouldn't learn the language. I have in mind, instead, something that opens the language itself to you, so that as you, the learner, go through the document as analysed and worked up by a linguist/translator, you gradually become able to recognize and understand the elements of the other language and ultimately to understand the original itself. This worked-up form of the document would have to provide a variety of more-accessible forms of the parts of the document: translations that are word-by-word, not just paragraph by paragraph, links to audio playback of pronunciations of the symbols and words, perhaps of larger sections. Using it should teach you and enable you to get a lot of its content; the second or fifth time you encounter a word in that language, you might not have to look at its dictionary entry to begin to understand it yourself. Ultimately, with enough such documents, and enough time devoted to working through them, you would become a competent reader and even listener in that other language. To enable this vision of language learning through a mediated, supported, but direct encounter with the original L2 document, I have had to envision a whole architecture of language data representation, markup, storage, lookup tools, editing systems, display systems and the like, which would be needed to take that original L2 document and add the L1 resources needed and then to make them accessible to you as you read through the document. This is my draft description of that system. The key idea, of course, is multilinearity. Consider the original document as a line of text. Maybe a very very long line, but in the abstract, just a sequence of symbols on a single line. Then, any additional representation that supports or makes accessible to you any piece of the original document, can be considered as a translation of a piece of that first line which is then written onto some additional second (or Nth) line, in a way where you can tell which part of the first line it is a translation of. Such a representation is multilinear. You might have a large number of lines, lines that show pronunciation, others that do vocabulary, lines with big gaps in them, lines that refer to data outside the workup in a dictionary or an audio library or on the internet, lines that call your attention to syntax or dialect features, lines that link to clear pronunciations from an audio dictionary and lines that link to live, vernacular or fast speech recordings of whole sentences or turns, so that you can learn what fast speech sounds like in that language, not just careful pronunciations from a dictionary. And the system that ultimately presents it to you could have an intelligent model of what you know as a learner of this new language and it could display for you, at a suitable pace and with the right amount of repetition and testing, the easiest next elements for you to learn, as you gradually acquire competence with all the many things in that document. Well, such a trainer is one step beyond the scope of this current discussion; it is something that, after we achieve the technical requirements discussed here, we could then aim to build. The Translation Graph infrastructure enables the workup of documents, structurally, and multilingually, and enables their access for learners. Then mediating to a learner/reader is more: a tool that serves as an instructor, carrying on an ongoing dialog with the individual to know what they got out of it, the system should learn how comfortable you are with the elements of L2 as you work with it. Possible application examples and features: For example, an Apple TV remote control that you can click when you don't understand something in the recent history of the current media playback; if it's media annotated with this kind of data, and the remote understands the meaning of the click as "Explain that to me", then the video can pause, an Interlinear Glossed Text (IGT) analysis can be displayed, and the user can browse until they learn what they want, and click onward to continue. Dave Graff's idea: This could applied to multi-lingual Twitter feeds. People who want to understand the L2 twitter data (and authors who want to be understood) might contribute a lot of data checking and editing to such a system. Getting the community usefulness going, after some critical mass is achieved, with live dictionaries, live algorithms, and humans involved, it could become quite useful to all. Tom continues: I want anyone to go into an L2 situation and be maximally supported to learn and understand what they don't know. This is far more ambitious than the Star Trek universal translater. It applies equally to multi-lingual Twitter, to foreign watchers of previously unsubtitled English movies, to audio concordances for learning dialect features, to archiving and study of ancient religious texts, to any form of language whether textual, audio, scans, slides, or video that is of interest to the point it is worth doing the work on it to make it accessible to another language. Put an app into your iPad and watch the TV with it, when it recognizes a place in the film where someone has made a TG workup out of it, that's a language tutorial you can use, and the app provides for the user to click a button and see and go through an IGT to learn -- on the iPad, if the TV isn't smart enough to show it on the TV. Or have it be knowledgeable about you enough to pause and give you a translation of something it thinks will be helpful to you, once in a while. And you can click the "Huh?" button here or there as a question about what did that mean, and it can offer contextualized help. Even partly understanding native speakers can use similar controls over presentation of the Translation Graphs to turn the subtitles on and off. A clicked mouse or enabled pointer over a part of the text could show a reverse-video circle or speech bubble with the corresponding L1 translation of some part of the L2 content shown, adjusted continuously as the mouse moves around the document, so that the learner can point at what they don't know, see it in translation, point away to see the original again, etc., until they don't have to see the translation any more because they understand it in its original L2 form. * Presentation for learning may be computer controlled based on a model of the reader/learner's knowledge, or manually parameterized. Anyway with TGs we now have data to support the learner. A map to the meanings of the grammatical tags found in the markup. A map to a concordance for any morpheme should be a click away. A map to an IPA reference and to a pronunciation guide and script description should be a click away. An audio format where the text is performed in a recording with a bouncing ball display should be a click away. All this may be hidden and the image/video/audio media (dis-)played, with the display of all tiers or a parameterized, selected subset, a click away during playback when the audience is puzzled and wants to understand the part they just heard but didn't understand. A YouTube app with a TG from its original L2 language to the L1 language of various learner populations, would show the video with IGTs scrolling below. A reader app would show the L2 original, allowing pointer-guided selection of parts the reader wants to learn more about, and deselection to indicate they got it. With use, the original becomes understandable to the learner. ------------------------------------------------------------------- Introduction ------------------------------------------------------------------- TGML is a markup language for Translation Graphs. A Translation Graph represents a single language event or document in its multiple translations, forms, layers, levels, segmentations, etc. For example, a Prakrit Buddhist text, with sentence-by-sentence English translations, could be represented with three levels: 1) the unsegmented original, 2) the sentence-by-sentence segmented original, and 3) the sentence-by-sentence translated version. The levels share an abstract, consistent, partial ordering, but possibly no other data, though they correspond with one another. The correspondences are defined by sharing boundaries across levels in the internal segmentation of each level. The TGML concept enables multiple-file representation of TGs such that data of any type representing segments may be aligned together as they correspond with one another sequentially in a document. Segmentation and grouping are defined by nodes and arcs in a simple, DAG ("directed acyclic graph", ignore this name if it is puzzling), or lattice, or tiered structure with arcs for content and nodes, optionally shared across tiers, representing alignment. By definition start/end nodes are required to be shared across all tiers. Each tier's arc contents are of a single type of data or translation. No tier need be more privileged than another. TGs are typically works in progress, as there are as many different translations, renderings, commentaries, and analyses for any linguistic form as the mind of a linguist can conceive. Believe me, that is infinite. Arcs of a single tier together form a single pass through the entire document and represent a single type of data. Examples of the type of data within a given tier might be: original images of palm leaf sheets, or, OCR-generated word hypotheses, or, manually corrected phonetic transcription of a certain actor's verbal rendering, or, a segment of a video (defined by data type, time-stamps, perhaps player or other information), or, an audio recording of one of a thousand instances of the word "OK", by a study's many speakers in various natural contexts of occurrence.