NLTK Functions: s = stem.{Porter,etc.}; s.stem(token) list(tokenize.whitespace(text)) t = tag.{Default,Regexp(patterns),Unigram(backoff=t)}; t.tag(tokens); t.train(corpus); g = cfg.parse_grammar(gmr); char.ChartParse(g,METHOD); TGTK: tier = TG.readTier(fn1); // on a plain text file makes a tier on whitespace tier.write(fn2); // output the tier into a file, saving it. doc = readDoc(fn3); // initial tier for (fnI;;) doc.AddTier(fnI); // returns T if consistent & well-formed. for (seg = doc.FirstSegment(tiername); seg && seg.hasNext(); seg = seg.nextSegment(tiername)) { // segment is an object that selects a segment or arc between nodes on a tier, // thus it defines precedessor and successor nodes at that tier // as well as the including arcs on larger-segment tiers // as well as the include arc sequences on shorter-segment tiers // handle the segment } Each tier has a name and access method and for each arc between adjacent nodes within the tier it has data. The underlying data is provided to a caller by calling the access method given the data. So it might be, Yes Father Yes Father Then calling code could call doc.FirstSegment("WordsTier").access("WordsTier"); to retrieve the ASCII string "Yes" which we thereby take as the actual first word, or it could call doc.FirstSegment("WordsTier").NextSegment("WordsTier").access("WordsTier") to retrieve the ASCII "Father", the actual second word. As another example: IMG1.jpg IMG2.jpg Then calling code could call doc.FirstSegment("ImagesTier").access() to retrieve the filename "IMG1.jpg" To automatically process a tier of data, generating a second tier of data to be incorporating into the same Document, code something like this might do: stemstier = doc.AddTier(CopyNodesFromTier="Words","StemsTier"); s = doc.FirstSegment("Words"); stemstier.replace_with(s,stemmer(s.access("WordsTier") s.NextSegment("Words")