- One can write finite state transducers for natural language processing (translation, declension, analysis) using lt-toolbox technologies.
- One creates dix files defining these tranducers, compiles them to bin files and then uses them for the final ends.
- Lt-toolbox libraries and tools are available in C++ and java.
- Write scala/ java wrappers for invoking FST-s in lt-toolbox bin files.
- html-2-rest provides a REST API front for some of these tools. That repo also has the scl bin files.
- PS: Install the lttoolbox package beforehand on the computer you will release from - else tests will fail.
- Use sbt command
release
to publish to maven repos. - You should be able to use it roughly immediately; and after many hours you should see at maven repo listings here.
- Smt Ambaa KulkarNi has led the creation of several such FST based tools. These are very useful for word generation and analysis.
- They're hosted on http://scl.samsaadhanii.in and mirrors.
- Only the lttoolbox compatible bin files are provided without any documentation (by request, not on any public site).
- This is usually a part of the scl website code (under the GNU GENERAL PUBLIC LICENSE), which is antiquated and mostly useless for further development (while still serving as a useful reference for how the core lttoolbox bin files are to be invoked) as it:
- relies on CGI technology.
- is written in perl.
- has a poor build system.
- These lttoolbox compatible bin files don't work with the Java libraries as of July 2017 - see thread.
- This is usually a part of the scl website code (under the GNU GENERAL PUBLIC LICENSE), which is antiquated and mostly useless for further development (while still serving as a useful reference for how the core lttoolbox bin files are to be invoked) as it:
- Smt Ambaa does not provide the dix files (including to this author).
- So there is an outsider cannot consider using them to grok how the bin files are to be invoked, to understand the underlying lttoolbox technology by example or to develop the FST-s further.
The below are sourced mostly from communication with smt ambaa and from experiments.
- "To know the input for tin / krt / taddhita generator, you give a tinanta / krdanta / taddhitaanta to the all_morf.bin, and it will produce the analysis. That analysis is the input for the generator."
- Regarding the level parameter:
- level 1 is used for the avyutpanna praatipadikas and dhaatus (corresponds to inflectional morphology, with sup and tin)
- level 0 is for the vyutpanna praatipadikas.
- level 2: vyutpanna krdanta subanta (a noun form derived by adding a krt suffix)
- level 3: vyutpanna taddhita subantas
- level 4: vyutpanna uttarapadas of a compound
Regarding the level parameter:
- In the case of verb forms, I have only level 1. The Nijantas, though derived are assigned the same level.