Morpho-Guessing

Morpho-Guessing

A new feature of version 4.0 is "morpho-guessing" of unknown words. While the old version was able to guess the part of speech of a word based on its context, the new version also considers its spelling. Words that end in "-s" are assumed to be plural nouns or singular verbs; those ending in "-ed", past-tense (or passive) verbs; those ending in "-ing", present participles; those ending in "-ly", adjectives. This greatly improves the ability of the parser to handle sentences containing multiple unknown words. Words that have been treated in this way are marked with a "[!]". (The parser's old system of "unknown words" is still in place, for handling words whose spelling does not match any of the categories listed above; as before, these are marked with a "[?]".)

For example, consider the sentence

        Overhyped megamergers underperform horrifically

None of these four words are in the parser's dictionary. Without morphoguessing, the parser would have no way of deciding between many possible interpretations. With morphoguessing, however, the parser is able to correctly identify the first as an passive verb form (used here as an adjective), the second as a noun, and the last as an adverb; from context, it is then able to guess that the third is a verb.

            +-------A-------+-------Sp-------+-------MVa-------+
            |               |                |                 |
     overhyped[!].v megamergers[!].n underperform[?].v   horrifically[!].e 

     (S (NP overhyped megamergers)
            (VP underperform
                (ADVP horrifically)))

(Given the same words in a different order--"overhyped megamergers horrifically underperform"--it can be seen that they are still correctly identified.)

Morpho-guessing is controlled by a sequence of regular expressions, specified in the file 4.0.regex. This allows for rather fine-grained guessing: thus for example, the current file includes rules for guessing Latin nouns and adjectives, as these occur ffrequently in biology texts, as well as rules for common chemical names, as they would appear in (bio-)chemistry texts.

Spell guessing

In addition, recent versions of link-grammar include an optional spelling checker. If enabled, and an unknown word is encountered, the spelling checker will be consulted for alternative spellings. The use of the spelling checker can be toggled on and off with the flag !spell; use the flag !vars to see a listing of all variables, as well as the current value of each.

Thus, for example:

linkparser> This is a gest
Found 6 linkages (6 had no P.P. violations)
	Linkage 1, cost vector = (CORP=6.3879 UNUSED=0 DIS=0 AND=0 LEN=5)

         +---Ost---+
   +-Ss*b+  +--Ds--+
   |     |  |      |
this.p is.v a guest[~].n 

	Linkage 2, cost vector = (CORP=6.3879 UNUSED=0 DIS=0 AND=0 LEN=5)

         +---Ost--+
   +-Ss*b+  +--Ds-+
   |     |  |     |
this.p is.v a nest[~].n 

	Linkage 3, cost vector = (CORP=6.3879 UNUSED=0 DIS=0 AND=0 LEN=5)

         +---Ost--+
   +-Ss*b+  +--Ds-+
   |     |  |     |
this.p is.v a vest[~].n 

	Linkage 4, cost vector = (CORP=6.3879 UNUSED=0 DIS=0 AND=0 LEN=5)

         +---Ost--+
   +-Ss*b+  +--Ds-+
   |     |  |     |
this.p is.v a test[~].n