malcolmgreaves / data-tc

A type class for data of all sizes.

GitHub

data-tc

Build Status Coverage Status Codacy Badge Stories in Ready Join the chat at https://gitter.im/malcolmgreaves/data-tc Maven Central

A unifying typeclass describing collections and higher-order data transformation and manipulation actions common to a wide variety of data processing tasks. Inspired by the Scala collections API.

Why Use data-tc?

Write an algorithm that accepts a type that adheres to the Data type class and watch it work everywhere! Data describes higher-order-functions that manipulate and transform generic collections. It makes use of the typeclass design pattern for ad-hoc polymorphism. This strategy stands in contrast to the common inheritence-based sub-typing polymorphism that is familiar in nearly every OO langauge. Provuding duck typing like developer productivity with the safety of strong static type checking, typeclasses are a powerful and incredibly flexible mechanism for describing generic behaviors.

Implementations for concrete types of the Data typeclass include:

Installation

Add the following to your build.sbt file:

libraryDependencies ++= Seq("io.malcolmgreaves" %% "data-tc-{scala,spark,fink,extra}" % "X.Y.Z")

Where X.Y.Z is the most recent one from sonatype.

Examples

We strive for high code coverage. Check out all of the tests:

For a rather small use case, check out Sum(), which shows how to implement the common sum functionality on a Data type class instance with Numeric elements:

object Sum extends Serializable {
    
  import datatc.Data
  // Brings implicits in scope for things like `map`, `flatMap`, etc.
  // as object oriented style infix notation. These are still using
  // the type class method definitions!
  import Data.ops._
      
  def apply[N: Numeric: ClassTag, D[_]: Data](data: D[N]): N = {
    val add = implicitly[Numeric[N]].plus _
    data.aggregate(implicitly[Numeric[N]].zero)(add, add)
  }

  def apply[N: Numeric](first: N, numbers: N*): N = {
    val add = implicitly[Numeric[N]].plus _
    numbers.foldLeft(first)(add)
  }
}

With this Sum object, we can perform a summation over a Traversable instance:

// Brings into scope all Data typeclass evidence for Scala collections.
import datatc.scala._
Sum(Traversable(1.0, 2.0, 3.0)) == 6.0
Sum(1, 2, 3) == 6

Repository Structure

This project is organized into several sbt sub-projects:

  • data-tc-scala

    • Typeclass definition as datatc.Data
    • Implementations using Scala collections under datatc.scala._
    • Only depends on Scala standard library
  • data-tc-spark

    • Implementation using Spark RDD under datatc.spark._
  • data-tc-flink

    • Implementation using Flink DataSet udner datatc.flink._
  • data-tc-extra

    • Additional functionality using the Data typeclass with 3rd party libraries.

Contributing

We <3 contributions! We want this code to be useful and used! We use pull requests to review and discuss changes, fixes, and improvements.

License

Copyright 2015-2018 Malcolm Greaves

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.