clulab / eidos

Machine reading system for World Modelers

GitHub

Build Status

Eidos

Eidos is an open-domain machine reading system designed by the Computational Language Understanding (CLU) Lab at University of Arizona for the World Modelers DARPA program. Eidos uses a cascade of Odin grammars to extract causal events from free text.

Currently we extract entities such as "food insecurity" (and increases/decreases/quantifications of those entities) and directed causal events that occur between entities such as "food insecurity causes increased migration". In the near future we plan to expand this to also extract correlation, same-as, and is-a relations.

Contents

Usage

How to compile the source code

This is a standard sbt project, so use the usual commands, e.g., sbt compile to compile or sbt assembly to create a jar file. sbt runMain can be used to run some of the example applications directly, as described below. To access Eidos from Java, add the assembled jar file(s) under target/ to your $CLASSPATH. A file like eidos-assembly-0.1.6-SNAPSHOT.jar may suffice, depending on the build. If necessary, see build.sbt for a list of runtime dependencies.

How to use it

The Eidos system is designed to be used in several ways:

Using the scala API

The scala API can produce three distinct output formats:

  • a pretty display
  • a JSON-LD export of the causal graph extracted from the text
  • a JSON serialization (in case you want to later load all of the mentions, including mentions that are not part of the causal graph)

(see src/main/scala/org/clulab/wm/eidos/apps/examples/ExtractFromText.scala)

To produce a pretty display of the extracted mentions

import org.clulab.wm.eidos.EidosSystem
import org.clulab.wm.eidos.utils.DisplayUtils.displayMention

  val text = "Water trucking has decreased due to the cost of fuel."

  // Initialize the reader
  val reader = new EidosSystem()

  // Extract the mentions
  val annotatedDocument = reader.extractFromText(text)

  // Display in a pretty way
  annotatedDocument.odinMentions.foreach(displayMention)

This produces the following output (mentions may appear in different order):

List(NounPhrase, Entity) => Water trucking
	------------------------------
	Rule => simple-np++Decrease_ported_syntax_2_verb
	Type => TextBoundMention
	------------------------------
	NounPhrase, Entity => Water trucking
	  * Attachments: Decrease(decreased,None)
	------------------------------

List(NounPhrase, Entity) => cost of fuel
	------------------------------
	Rule => simple-np
	Type => TextBoundMention
	------------------------------
	NounPhrase, Entity => cost of fuel
	------------------------------

List(Causal, DirectedRelation, EntityLinker, Event) => Water trucking has decreased due to the cost of fuel
	------------------------------
	Rule => dueToSyntax2-Causal
	Type => EventMention
	------------------------------
	trigger => due
	cause (NounPhrase, Entity) => cost of fuel
	effect (NounPhrase, Entity) => Water trucking
	  * Attachments: Decrease(decreased,None)
	------------------------------

To export extractions as JSON-LD

(For information about the JSON-LD export format, please look here.)

import scala.collection.Seq
import org.clulab.serialization.json.stringify
import org.clulab.wm.eidos.EidosSystem
import org.clulab.wm.eidos.serialization.json.JLDCorpus

  val text = "Water trucking has decreased due to the cost of fuel."

  // Initialize the reader
  val reader = new EidosSystem()

  // Extract the mentions
  val annotatedDocument = reader.extractFromText(text)

  // Export to JSON-LD
  val corpus = new JLDCorpus(Seq(annotatedDocument), reader)
  val mentionsJSONLD = corpus.serialize()
  println(stringify(mentionsJSONLD, pretty = true))

This produces the JSON-LD output shown here.

To serialize to JSON

import org.clulab.serialization.json.stringify
import org.clulab.wm.eidos.EidosSystem
import org.clulab.wm.eidos.serialization.json.WMJSONSerializer

  val text = "Water trucking has decreased due to the cost of fuel."

  // Initialize the reader
  val reader = new EidosSystem()

  // Extract the mentions
  val annotatedDocument = reader.extractFromText(text)

  // Or... optionally serialize to regular JSON
  // (e.g., if you want to later reload the mentions for post-processing)
  val mentionsJSON = WMJSONSerializer.jsonAST(annotatedDocument.odinMentions)
  println(stringify(mentionsJSON, pretty = true))

This produces the JSON serialization here (mentions may appear in different order):

Command line usage

Extracting causal events from documents in a directory

sbt "runMain org.clulab.wm.eidos.apps.ExtractFromDirectory /path/to/input/directory /path/to/output/directory"

Files in the input directory should end with txt and the extracted mentions from each file will be saved in corresponding JSON-LD files.

Note: You cannot use tildes (~) in the invocation in lieu of the home directory.

Running an interactive shell

The EidosShell is an interactive shell for testing the output of Eidos. To run it, do

./shell

Running the webapp

To run the webapp version of EidosShell locally, do:

sbt webapp/run

and then navigate to localhost:9000 in a web browser.

How to use Eidos output

Visualizing Eidos output

Eidos reading output can be visualized using INDRA and Jupyter notebooks. See below for an example.

alt text

Using Eidos output for modeling

Events extracted using Eidos can be converted to INDRA Influence statements, which are bespoke data structures designed for modeling causal networks.

Example usage:

>>> from indra.sources import eidos
>>> ep = eidos.process_text("Water trucking has decreased due to the cost of fuel.")
>>> ep.statements
[Influence(cost of fuel(), Water trucking(negative))]

Delphi is a framework built on top of INDRA that assembles causal fragments extracted by Eidos into a causal analysis graph. This causal analysis graph is then converted to a dynamic Bayes network and used to make probabilistic predictions.

License

While we will soon be licensed as Apache, currently one dependency has a GPL licence. This will be removed very soon and the license will be updated.

Related resources

If you are working on this project, you may be interested in additional materials stored in the cloud. Access may be limited by permission settings. Other documents are included in the /doc directory of the repository.

There is one large file of vectors which is useful at runtime if you are interested in ontological grounding. To use this file, download it and place it in the project's src/main/resources/org/clulab/wm/eidos/w2v directory. Then indicate to Eidos that it should be used by setting useW2V = true in src/main/resources/eidos.conf.

Notes

The default size of the memory allocation pool for the JVM is 1/4 of your physical memory, but Eidos may require more RAM than that. It is currently being developed and tested with a 6GB limit.

For those using sbt, the file .jvmopts is included with the source code to arrange for more memory. No other changes should be necessary.

IDEs and other development tools are generally unaware of .jvmopts, but can be configured via an environment variable instead.

JAVA_TOOL_OPTIONS=-Xmx6g

Other situations may require a more general setting.

_JAVA_OPTIONS=-Xmx6g

The procedure for defining these variables is dependent on operating system and shell.