tupol / spark-tools

Executable Apache Spark Tools: Format Converter & SQL Processor

GitHub

Spark Tools

Maven Central   GitHub   Travis (.org)   Codecov   Javadocs   Gitter   Twitter  

Description

This project contains some basic runnable tools that can help with various tasks around a Spark based project.

The main tools available:

  • FormatConverter Converts any acceptable file format into a different file format, providing also partitioning support.
  • SimpleSqlProcessor Applies a given SQL to the input files which are being mapped into tables.
  • StreamingFormatConverter Converts any acceptable data stream format into a different data stream format, providing also partitioning support.
  • SimpleFileStreamingSqlProcessor Applies a given SQL to the input files streams which are being mapped into file output streams.

This project is also trying to create and encourage a friendly yet professional environment for developers to help each other, so please do no be shy and join through gitter, twitter, issue reports or pull requests.

Prerequisites

  • Java 6 or higher
  • Scala 2.11 or 2.12
  • Apache Spark 2.3.X or higher

Getting Spark Tools

Spark Tools is published to Maven Central and Spark Packages:

where the latest artifacts can be found.

  • Group id / organization: org.tupol
  • Artifact id / name: spark-tools
  • Latest version is 0.3.0

Usage with SBT, adding a dependency to the latest version of tools to your sbt build definition file:

libraryDependencies += "org.tupol" %% "spark-tools" % "0.3.0"

Include this package in your Spark Applications using spark-shell or spark-submit

$SPARK_HOME/bin/spark-shell --packages org.tupol:spark-tools_2.11:0.3.0

What's new?

0.4.0-SNAPSHOT

  • Added StreamingFormatConverter
  • Added FileStreamingSqlProcessor, SimpleFileStreamingSqlProcessor
  • Bumped spark-utils dependency to 0.4.1

0.3.0

  • Package processors was renamed to tools
  • SqlProcessor.registerSqlFunctions takes now implicit parameters: spark session and application context
  • Added StreamingFormatConverter
  • Added FileStreamingSqlProcessor, SimpleFileStreamingSqlProcessor

For previous versions please consult the release notes.

License

This code is open source software licensed under the MIT License.