absaoss / spark-data-standardization   0.3.0

Apache License 2.0 GitHub

A library for Spark that helps to standardize any input data (DataFrame) to adhere to the provided schema.

Scala versions: 2.13 2.12 2.11

Spark Data Standardization Library

License Release Java 11

  • Dataframe in
  • Standardized Dataframe out

Usage

Needed Provided Dependencies

The library needs following dependencies to be included in your project

"org.apache.spark" %% "spark-core" % SPARK_VERSION,
"org.apache.spark" %% "spark-sql" % SPARK_VERSION,
"za.co.absa" %% s"spark-commons-spark${SPARK_MAJOR}.${SPARK_MINOR}" % "0.6.3",

Usage in SBT:

"za.co.absa" %% "spark-data-standardization" % VERSION 

Usage in Maven

Scala 2.12 Maven Central

<dependency>
   <groupId>za.co.absa</groupId>
   <artifactId>spark-data-standardization_2.12</artifactId>
   <version>${latest_version}</version>
</dependency>

Scala 2.13 Maven Central

<dependency>
   <groupId>za.co.absa</groupId>
   <artifactId>spark-data-standardization_2.13</artifactId>
   <version>${latest_version}</version>
</dependency>

Spark and Scala compatibility

Scala 2.12 Scala 2.13
Spark 3.5.x 3.5.x

How to Release

Please see this file for more details.

How to generate Code coverage report

sbt ++<scala.version> jacoco

Code coverage will be generated on path:

{project-root}/target/scala-{scala_version}/jacoco/report/html