Spark Data Standardization Library
- Dataframe in
- Standardized Dataframe out
Usage
Needed Provided Dependencies
The library needs following dependencies to be included in your project
"org.apache.spark" %% "spark-core" % SPARK_VERSION,
"org.apache.spark" %% "spark-sql" % SPARK_VERSION,
"za.co.absa" %% s"spark-commons-spark${SPARK_MAJOR}.${SPARK_MINOR}" % "0.3.1",
Usage in SBT:
"za.co.absa" %% "spark-data-standardization" % VERSION
Usage in Maven
<dependency>
<groupId>za.co.absa</groupId>
<artifactId>spark-data-standardization_2.11</artifactId>
<version>${latest_version}</version>
</dependency>
<dependency>
<groupId>za.co.absa</groupId>
<artifactId>spark-data-standardization_2.12</artifactId>
<version>${latest_version}</version>
</dependency>
Spark and Scala compatibility
Scala 2.11 Scala 2.12 Spark 2.4.X 3.2.1
How to Release
Please see this file for more details.
How to generate Code coverage report
sbt jacoco
Code coverage will be generated on path:
{project-root}/target/scala-{scala_version}/jacoco/report/html