Production-ready K-Means clustering for Apache Spark with pluggable Bregman divergences (KL, Itakura-Saito, L1, etc). 6 algorithms, 740 tests, cross-version persistence. Drop-in replacement for MLlib with mathematically correct distance functions for probability distributions, spectral data, and count data.
- kullback-leibler-divergence
- clustering
- spark-mllib
- cosine-similarity
- k-means
- itakura-saito-divergence
- bregman-divergence
- euclidean-distance
- embeddings
- spark
- entropy
- similarity-search
Scala versions:
2.10
massivedatascience-clusterer 0.7.0
Group ID:
com.massivedatascience
Artifact ID:
massivedatascience-clusterer_2.12
Version:
0.7.0
Release Date:
Feb 12, 2026
Licenses:
Files:
Full Scala Version:
2.12.18
Developers:
libraryDependencies += "com.massivedatascience" %% "massivedatascience-clusterer" % "0.7.0"
ivy"com.massivedatascience::massivedatascience-clusterer:0.7.0"
//> using dep "com.massivedatascience::massivedatascience-clusterer:0.7.0"
import $ivy.`com.massivedatascience::massivedatascience-clusterer:0.7.0`
<dependency> <groupId>com.massivedatascience</groupId> <artifactId>massivedatascience-clusterer_2.12</artifactId> <version>0.7.0</version> </dependency>
compile group: 'com.massivedatascience', name: 'massivedatascience-clusterer_2.12', version: '0.7.0'