// hosted on Maven Central, for Scala 2.11 only at the moment libraryDependencies += "com.github.jongwook" %% "spark-ranking-metrics" % "0.0.1"
Ranking is an important component for building recommender systems, and there are various methods to evaluate the offline performance of ranking algorithms .
Users are usually provided with a certain number of recommended items, so it is usual to measure the performance of only the top-k recommended items, and this is why the metrics are often suffixed by "@k".
Currently, Spark's implementations for ranking metrics are quite limited, and its NDCG implementation assumes that the relevance is always binary, producing different numbers. Meanwhile, RiVal aims to be a toolkit for reproducible recommender system evaluation and provides a robust implementation for a few ranking metrics, but it is written as a single-threaded application and therefore is not applicable to the scale which Spark users typically encounter.
To complement this,
SparkRankingMetrics contains scalable implementations for NDCG, MAP, Precision, and Recall, using Spark's DataFrame/Dataset idioms.
 Gunawardana, A., & Shani, G. (2015). Evaluating recommendation systems. In Recommender systems handbook (pp. 265-308). Springer US.
import org.apache.spark.mllib.recommendation.MatrixFactorizationModel import org.apache.spark.mllib.recommendation.Rating import org.apache.spark.rdd.RDD import com.github.jongwook.SparkRankingMetrics // A recommendation model obtained by Spark's ALS val model = MatrixFactorizationModel.load(sc, "path/to/model") val result: RDD[(Int, Array[Rating])] = model.recommendProductsForUsers(20) // A flattened RDD that contains all Ratings in the result val prediction: RDD[Rating] = result.values.flatMap(ratings => ratings) val groundTruth: RDD[Rating] = /* the ground-truth dataset */ // create DataFrames using the RDDs above val predictionDF = spark.createDataFrame(prediction) val groundTruthDF = spark.createDataFrame(groundTruth) // instantiate using either DataFrames or Datasets val metrics = SparkRankingMetrics(predictionDF, groundTruthDF) // override the non-default column names metrics.setItemCol("product") metrics.setPredictionCol("rating") // print the metrics println(metrics.ndcgAt(10)) println(metrics.mapAt(15)) println(metrics.precisionAt(5)) println(metrics.recallAt(20))
This repository contains a test case that checks the numbers produced by
SparkRankingMetrics are identical to what RiVal's corresponding implementations produce, using the MovieLens 100K dataset. The result is: