SparkDetective simplifies the collection and analysis of Spark performance metrics.
Use sparkDetective for troubleshooting interactive and batch Spark workloads.
Use it also to collect metrics for long-term retention or as part of a CI/CD pipeline.
SparkDetective is also intended as a working example of how to use Spark Listeners for collecting Spark task metrics data.
- Main author and contact:
- Kumar Sambhav
- For Spark 2.x and 3.x
- Tested on Spark 2.4, 3.0, 3.1
- Note for Spark 2.1 and 2.2 -> please use sparkDetective version 0.16
- Note: sparkDetective is available on maven
- Spark 3.x and 2.4 with scala 2.12:
- Scala:
bin/spark-shell --packages io.github.sambhav37:sparkDetective_2.12:0.17
- Python:
bin/pyspark --packages io.github.sambhav37:sparkDetective_2.12:0.17
- note:
pip install sparkmeasure
to get the Python wrapper API.
- note:
- Scala:
- Spark 2.x with Scala 2.11:
- Scala:
bin/spark-shell --packages io.github.sambhav37:sparkDetective_2.11:0.17
- Python:
bin/pyspark --packages io.github.sambhav37:sparkDetective_2.11:0.17
- note:
pip install sparkmeasure
to get the Python wrapper API.
- note:
- Scala:
- Bleeding edge: build sparkDetective jar using sbt:
sbt +package
and use--jars
with the jar just built instead of using--packages
.