Production-ready K-Means clustering for Apache Spark with pluggable Bregman divergences (KL, Itakura-Saito, L1, etc). 6 algorithms, 740 tests, cross-version persistence. Drop-in replacement for MLlib with mathematically correct distance functions for probability distributions, spectral data, and count data.
- bregman-divergence
- cosine-similarity
- embeddings
- itakura-saito-divergence
- k-means
- similarity-search
- entropy
- spark-mllib
- euclidean-distance
- kullback-leibler-divergence
- clustering
- spark
Scala versions:
2.10
massivedatascience-clusterer 0.7.0
Group ID:
com.massivedatascience
Artifact ID:
massivedatascience-clusterer_2.13
Version:
0.7.0
Release Date:
Feb 12, 2026
Licenses:
Files:
Full Scala Version:
2.13.14
Developers:
libraryDependencies += "com.massivedatascience" %% "massivedatascience-clusterer" % "0.7.0"
ivy"com.massivedatascience::massivedatascience-clusterer:0.7.0"
//> using dep "com.massivedatascience::massivedatascience-clusterer:0.7.0"
import $ivy.`com.massivedatascience::massivedatascience-clusterer:0.7.0`
<dependency> <groupId>com.massivedatascience</groupId> <artifactId>massivedatascience-clusterer_2.13</artifactId> <version>0.7.0</version> </dependency>
compile group: 'com.massivedatascience', name: 'massivedatascience-clusterer_2.13', version: '0.7.0'