elyast / wookie

Library for building data products

Website GitHub

Wookie - building data products

  • Reuse components using Sparkles - data processing monads
  • Map over generic lists of functions
  • Provides base classes for writing:
    • CLI applications
    • Spark / Spark Streaming jobs
  • Provides collector API and sample collectors
  • Spark SQL Server - automatically register tables given directory (supports json, parquet, csv, jdbc and cassandra)

Modules

  • app-api - base classes/objects that helps writing basic commandline applications
  • collector-api - base classes to write collectors
  • spark-api - utility classes/objects for writing Spark / Spark Streaming applications
  • spark-api-kafka - utility classes/objects for writing Spark Streaming applications using Kafka input streams
  • sqlserver - Spark SQL server that automatically register and refresh tables given root directory, supports local file formats like json, csv, parquet as well as remote ones like Cassandra or Elasticsearch
  • examples/

Dependencies