Professional Documents
Culture Documents
flicka, ros
pojke, spik
bild, sko
byte
hus
To combine these 5 transducers into one, the simple procedure used here,
was to include all states and transitions in a large transducer for all noun
declensions by giving every FST the same start and end state. This larger
FST for all nouns contains 36 states.
string. The states in the first column are marked with a :-symbol if they
are accepting states.
State
Legal transitions
0
ros:ros:1
1
E: N UTR SG:2
2
en: DEF:4
3
na: DEF:4
4
#: NOM:6
5
#: NOM:6
6:
Table 1. The FST for ros
The description of finite state transducers for the five noun declensions in
Swedish can be found here:
ros.fst, flicka.fst, spik.fst, pojke.fst, bild.fst, sko.fst, byte.fst, hus.fst,
and the combined FST here:
noun.fst
Implementation
The program for transforming a noun from morphological to lexical level
was implemented in Perl and can be found here. When the user specifies a
word, the program will output the lexical description(s) of the word or else
produce Failed.
The FST-description is supplied to the program as a command line
argument.
> perl fst.pl noun.fst
Write a word and press Enter. (q = quit):
spik
spik N UTR SG INDEF NOM
Ambiguous input
The program will produce all possible morphological analyses of
ambiguous word forms like ros and hus:
ros
hus
hus
hus
hus
hus
N
N
N
N
NEU
NEU
NEU
NEU
SG INDEF NOM
SG INDEF GEN
PL INDEF NOM
PL INDEF GEN
#, ros, 1
After two E-transitions our stack still contains only one alternative:
Stack:
We pop this transition from the stack and for each alternative path given
the remaining input # we create a new transition and put it on the stack.
Stack:
In the next round we pop the first transition and find that the newly
produced states are accepting states and that theres no input left to
process. The analysis was a success and the output strings are printed.
References
Daniel Jurafsky and James H. Martin (2000). Speech and language
processing.