Global Names Matcher or gnmatcher
is a Scala 2.10.3+ library for very fast
fuzzy matching of a query string against given set of strings.
The artifacts for gnmatcher
live on Maven
Central.
Insert SBT line as follows to install the dependency:
libraryDependencies += "org.globalnames" %% "gnmatcher" % "0.1.0"
Corresponding maven code:
<dependency>
<groupId>org.globalnames</groupId>
<artifactId>gnmatcher_2.11</artifactId>
<version>0.1.0</version>
</dependency>
<dependency>
<groupId>org.globalnames</groupId>
<artifactId>gnmatcher_2.10</artifactId>
<version>0.1.0</version>
</dependency>
gnmatcher
implements sophisticated heuristic algorithms to match semantical parts of
scientific biological names as follows:
- authors match answers to a question: how similar the authors string
Linnaeus, Muller 1767
to theMuller and Linnaeus
?
The entire algorithm is ported from Ruby implementation developed by Patrick Leary of uBio and EOL fame. To find out the answer to the question above, run the code as follows:
$ sbt matcher/console
scala> import org.globalnames._
scala> AuthorsMatcher.score(Seq(Author("Linnaeus"), Author("Muller")), Some(1767),
| Seq(Author("Muller"), Author("Linnaeus")), None)
res0: Double = 0.5
- Alexander Myltsev http://myltsev.com alexander-myltsev@github
- Dmitry Mozzherin dimus@github
Released under MIT license