satabin / lingua

Lexical and grammatical tools for natural languages

Version Matrix

Lingua Codacy Badge

Lingua is a set of linguistic tools written in Scala. It is divided in several modules.

Module lexikon

The lexikon module provides tools to generate morphological lexica out of a dedicated description language. For example dictionaries, see the resources directory and have a look at .dico files.

To run the lexicon genrator:

$ sbt
> project lexikon
> runMain lingua.lexikon.DikoMain compile lexikon/src/main/resources/français.dico -N /tmp/nfst.dot -F /tmp/fst.dot

Then you can render the generated (N)Fst by using graphviz tools.

This command produces a compiled version of the dictionary in a file named dikput.diko. This compiled version can be queried as follows (from the same sbt session)

> runMain lingua.lexikon.DikoMain query dikoput.diko -q mange

Which will return

Set(DikoEntry(manger,Set(+Sg, @V, +3ème, +G1, +Prés, +Ind)), DikoEntry(manger,Set(+Sg, +1ère, @V, +G1, +Prés, +Ind)))

This means that according to this dictionary, mange stems to manger which is a verb (@V category) conjugated at the first person singular of the indicative present, or at the third person of the indicative present.

For more details on available options, run this main class with option -h

Module fst

This module is a generic Fst module that can be used independently.