Knowledge-Lite Induction of Underlying Morphologies
Problem: Allomorphic Variation
Allomorphic variation, or differences in form among morphs with the same
semantic interpretation, impeeds unsupervised morphological learning methods.
Since unsupervised methods learn from
surface words, when they
discover allomorphic variants of the
same morpheme, they treat each as
symbollically (semantically)
distinct. For example, plural
es in
glasses will be treated as distinct from
plural
s in
cheeks, even though they are
orthographic variants, or allomorphs, of the same plural morpheme.
Morpheme Induction: Using Some Linguistic Knowledge
The KLIUM approach is an extension of unsupervised induction techniques, collectively
known as
Morfessor, developed by
Mathias Creutz and
Krista Lagus.
The KLIUM approach builds into a Morfessor-style approach a small amount of linguistic knowledge in the
form of
context-sensitive orthographic rewrite rules.
Using this knowledge, it creates the capacity to learn when morphs on the surface,
like
es and
s,
should be treated as allomorphs of a single morpheme underlyingly.
This is all directed towards producing word segmentations consisting of
underlying morphemes.
Benefits
Since the KLIUM approach segments words at the morpheme
layer, it produces more semantically-consistent segmentations than
unsupervised approaches that segment at the allomorphic, or surface layer.
Since it assigns each allomorph of a particular morpheme the
same likelihood, it increases chances that
low-freqeuncy allomorphs will be discovered.
Demo
See KLIUM
in action.