absaoss / spark-data-standardization   0.2.1

Apache License 2.0 GitHub

A library for Spark that helps to stadardize any input data (DataFrame) to adhere to the provided schema.

Scala versions: 2.13 2.12 2.11

Spark Data Standardization Library

License Release

  • Dataframe in
  • Standardized Dataframe out


Needed Provided Dependencies

The library needs following dependencies to be included in your project

"org.apache.spark" %% "spark-core" % SPARK_VERSION,
"org.apache.spark" %% "spark-sql" % SPARK_VERSION,
"za.co.absa" %% s"spark-commons-spark${SPARK_MAJOR}.${SPARK_MINOR}" % "0.6.1",

Usage in SBT:

"za.co.absa" %% "spark-data-standardization" % VERSION 

Usage in Maven

Scala 2.11 Maven Central


Scala 2.12 Maven Central


Scala 2.13 Maven Central


Spark and Scala compatibility

Scala 2.11 Scala 2.12 Scala 2.13
Spark 2.4.7 3.2.1 3.2.1

How to Release

Please see this file for more details.

How to generate Code coverage report

sbt ++<scala.version> jacoco

Code coverage will be generated on path: