Version Matrix

Build Status Maven Central

reach-assembly

What is it?

reach-assembly is the assembly arm of reach. This project provides a sieve-based system for assembly of event mentions. While still under development, the system currently has support for (1) exact deduplication for both entity and event mentions, (2) unification of mentions through coreference resolution, and (3) the reporting of intra and inter-sentence causal precedence relations (ex. A causally precedes B) using linguistic features, and (4) a feature-based classifier for causal precedence. Future versions will include additional sieves for causal precedence and improved approximate deduplication.

For more details on the sieve-based assembly system, please refer to the following paper:

@inproceedings{GHP+:2016aa,
  author       = {Gus Hahn-Powell and Dane Bell and Marco A. Valenzuela-Esc\'{a}rcega and Mihai Surdeanu},
  title        = {This before That: Causal Precedence in the Biomedical Domain},
  booktitle    = {Proceedings of the 2016 Workshop on Biomedical Natural Language Processing},
  organization = {Association for Computational Linguistics}
  year         = {2016}
  Note         = {Paper available at \url{https://arxiv.org/abs/1606.08089}}
}

Licensing

All our own code is licensed under Apache License Version 2.0. However, some of the libraries used here, most notably CoreNLP, are GPL v2. If BioNLPProcessor is not removed from this package, technically our whole code becomes GPL v2 since BioNLPProcessor builds on Stanford's CoreNLP functionality. Soon, we will split the code into multiple components, so licensing becomes less ambiguous.

Changes

  • 0.0.1 - Assembly system from reach v1.3.2
  • more...

Authors

The assembly system was created by the following members of the CLU lab at the University of Arizona:

Citations

If you use reach-assembly, please cite this paper:

@inproceedings{GHP+:2016aa,
  author       = {Gus Hahn-Powell and Dane Bell and Marco A. Valenzuela-Esc\'{a}rcega and Mihai Surdeanu},
  title        = {This before That: Causal Precedence in the Biomedical Domain},
  booktitle    = {Proceedings of the 2016 Workshop on Biomedical Natural Language Processing},
  organization = {Association for Computational Linguistics}
  year         = {2016}
  Note         = {Paper available at \url{https://arxiv.org/abs/1606.08089}}
}

More publications from the Reach project are available here.

Including reach-assembly in your code

This software requires Java 1.8.

The jar is available on Maven Central. To use, simply add the following dependency to your build.sbt:

libraryDependencies ++= Seq(
    "org.clulab" %% "reach" % "1.3.2"
)

How to compile the source code

This is a standard sbt project, so use the usual commands (i.e. sbt compile, sbt assembly, etc.) to compile. Add the generated jar files under target/ to your $CLASSPATH, along with the other necessary dependency jars. Take a look at build.sbt to see which dependencies are necessary at runtime.

Running things

The interactive Assembly shell

You can run interactively explore assembly output for various snippets of text using the assembly shell:

sbt "runMain org.clulab.assembly.AssemblyShell"

Modifying the code

Reach builds upon our Odin event extraction framework. If you want to modify event and entity grammars, please refer to Odin's Wiki page for details. Please read the included Odin manual for details on the rule language and the Odin API.

Funding

The development of Reach was funded by the DARPA Big Mechanism program under ARO contract W911NF-14-1-0395.