saucam / shiva   0.1.1

Apache License 2.0 GitHub

A library for Simple High dimensional Indexed Vector search Algorithms

Scala versions: 3.x 2.13
id title slug
README
README
/readme

shiva [WIP]

shiva is a library for Simple High dimensional Indexed Vector search Algorithms.

CI codecov Sonatype Snapshots Sonatype Releases Docs

Overview

Basic guiding principle is to be:

  • Simple (non-distributed, single threaded indexing, easy to use)
  • Support high dimensional vectors, optimize memory for speed
  • Support many distance metrics
  • Scale out to different indices and algorithms

Installation

To use Shiva, add the following to your build.sbt

For release versions:

resolvers +=
  "Sonatype OSS Releases" at "https://s01.oss.sonatype.org/content/repositories/releases"

libraryDependencies ++= Seq(
  "io.github.saucam" %% "shiva-core" % "<version>"
)

For snapshot versions:

resolvers +=
  "Sonatype OSS Snapshots" at "https://s01.oss.sonatype.org/content/repositories/snapshots"

libraryDependencies ++= Seq(
  "io.github.saucam" %% "shiva-core" % "<version>"
)

Usage

The following gives a simple example on how to use the hnsw index in the library after adding the dependency:

val index = HnswIndexBuilder[Int, Double, IntDoubleIndexItem](
  dimensions = 3,
  maxItemCount = 1000000,
  m = 32,
  distanceCalculator = new EuclideanDistanceDouble
).build()

val item1 = IntDoubleIndexItem(1, Vector(4.05d, 1.06d, 7.8d))
val item2 = IntDoubleIndexItem(2, Vector(8.01d, 2.06d, 1.8d))
val item3 = IntDoubleIndexItem(3, Vector(9.34d, 3.06d, 3.1d))

index.add(item1)
index.add(item2)
index.add(item3)

val results = index.findKSimilarItems(item1.id, 10)
results.foreach(println())

Distance Metrics

Currently supported distance metrics are:

  • Inner Product
  • Euclidean Distance
  • Cosine Distance
  • Manhattan Distance
  • Minkowski Distance

Contributing

See the contributor's guide