Production-ready K-Means clustering for Apache Spark with pluggable Bregman divergences (KL, Itakura-Saito, L1, etc). 6 algorithms, 740 tests, cross-version persistence. Drop-in replacement for MLlib with mathematically correct distance functions for probability distributions, spectral data, and count data.
- bregman-divergence
- clustering
- euclidean-distance
- entropy
- cosine-similarity
- itakura-saito-divergence
- k-means
- embeddings
- spark-mllib
- spark
- similarity-search
- kullback-leibler-divergence
Scala versions:
2.10
Latest version
[](https://index.scala-lang.org/derrickburns/generalized-kmeans-clustering/massivedatascience-clusterer)
JVM badge
[](https://index.scala-lang.org/derrickburns/generalized-kmeans-clustering/massivedatascience-clusterer)