globalnamesarchitecture / gnmatcher   0.1.1

MIT License GitHub

Fuzzy matching library for scientific names with emphasis on performance and scalability

Scala versions: 2.11 2.10

Global Names Matcher

image

Global Names Matcher or gnmatcher is a Scala 2.10.3+ library for very fast fuzzy matching of a query string against given set of strings.

Installation

The artifacts for gnmatcher live on Maven Central.

Insert SBT line as follows to install the dependency:

libraryDependencies += "org.globalnames" %% "gnmatcher" % "0.1.0"

Corresponding maven code:

<dependency>
    <groupId>org.globalnames</groupId>
    <artifactId>gnmatcher_2.11</artifactId>
    <version>0.1.0</version>
</dependency>

<dependency>
    <groupId>org.globalnames</groupId>
    <artifactId>gnmatcher_2.10</artifactId>
    <version>0.1.0</version>
</dependency>

Matching

gnmatcher implements sophisticated heuristic algorithms to match semantical parts of scientific biological names as follows:

  • authors match answers to a question: how similar the authors string Linnaeus, Muller 1767 to the Muller and Linnaeus?

Authors Matching

The entire algorithm is ported from Ruby implementation developed by Patrick Leary of uBio and EOL fame. To find out the answer to the question above, run the code as follows:

$ sbt matcher/console
scala> import org.globalnames._
scala> AuthorsMatcher.score(Seq(Author("Linnaeus"), Author("Muller")), Some(1767),
     |                      Seq(Author("Muller"), Author("Linnaeus")), None)
res0: Double = 0.5

Contributors

License

Released under MIT license