What is it?
Just a way to convert
agiga data into a
Why spend the time and resources parsing and annotating over 183 million sentences when it has already been done?
Reading an annotated English Gigaword
import org.clulab.agiga // build a processors.Document val doc = agiga.toDocument("path/to/agiga/xml/ltw_eng_200705.xml.gz")
Example 1: dump a lemmatized form of the English Gigaword
Everything is configured in the
viewproperty to "lemmas"
inputDirproperty to wherever your copy of
agigais nestled on your disk
outputDirproperty to wherever you want your compressed of the lemmatized English Gigaword to be written
(Optional) Change the
nthreadsproperty to the maximum number of threads you prefer to use for parallelization.
All that's left is to run
sbt "runMain sem.AgigaReader"
Options for "view"
|"words"||word form of each token|
|"lemmas"||lemma form of each token|
|"tags"||PoS tag of each token|
|"entities"||NE labels of each token|
- Add output options for dependencies using the DFS ordering described in "Higher-order Lexical Semantic Models for Non-factoid Answer Reranking"