gilt / lib-nlp

Simple Natural Language Processing Utilities

GitHub

lib-nlp

A Simple Natural Language Processing Library for Scala.

Word Inflector

The Inflector can be used to generate plural or singular forms for input words.

val word = "world"
val plural = Inflector.pluralize(word)
val singular = Inflector.singularize(plural)

Noun Phrase Extraction

Given some input english text, the Extractor generates relevant Noun Phrases. This is done by processing the input text using the Apache OpenNLP Utilties.

val extractor = new OpenNlpExtractor()

val inputText = "This is a piece of text. There are several sentences. There may be something about a cotton turquoise dress"

extractor.extractInterestingPhrases(inputText).foreach { phrase =>
  println(phrase)
}

Synonyms

This library provides a very simple synonym generator based on WordNet. This requires that the appropriate Wordnet Dictionary files be downloaded, and installed separately.

Synonyms are generated for a given word, when used as a particular Part of Speech.

val synonymProvider = new WordnetSynonymProvider(new java.io.File("./wordnet/dict"))
val synonyms = synonymProvider.getSynonyms("pullover", PartOfSpeech.Noun)

Building on this generator, and in conjunction with Noun Phrase Extraction, a class is provided to generate Synonym Phrases

val synonymPhraseGenerator = new OpenNlpSynonymPhraseGenerator(synonymProvider)
synonymPhraseGenerator.generateSynonyms("cotton summer dress")

License

Copyright 2015 Gilt Groupe, Inc.

Licensed under the MIT License.