Knowledge-Lite Induction of Underlying Morphologies

Problem: Allomorphic Variation

Allomorphic variation, or differences in form among morphs with the same semantic interpretation, impeeds unsupervised morphological learning methods. Since unsupervised methods learn from surface words, when they discover allomorphic variants of the same morpheme, they treat each as symbollically (semantically) distinct. For example, plural es in glasses will be treated as distinct from plural s in cheeks, even though they are orthographic variants, or allomorphs, of the same plural morpheme.

Morpheme Induction: Using Some Linguistic Knowledge

The KLIUM approach is an extension of unsupervised induction techniques, collectively known as Morfessor, developed by Mathias Creutz and Krista Lagus. The KLIUM approach builds into a Morfessor-style approach a small amount of linguistic knowledge in the form of context-sensitive orthographic rewrite rules. Using this knowledge, it creates the capacity to learn when morphs on the surface, like es and s, should be treated as allomorphs of a single morpheme underlyingly. This is all directed towards producing word segmentations consisting of underlying morphemes.

Benefits

Since the KLIUM approach segments words at the morpheme layer, it produces more semantically-consistent segmentations than unsupervised approaches that segment at the allomorphic, or surface layer.

Since it assigns each allomorph of a particular morpheme the same likelihood, it increases chances that low-freqeuncy allomorphs will be discovered.

Demo

See KLIUM in action.