catalystcode / streaming-instagram

A library for reading social data from Instagram using Spark Streaming.

GitHub

A library for reading social data from Instagram using Spark Streaming.

Travis CI status

Usage example

Run a demo via:

# set up all the requisite environment variables
export INSTAGRAM_AUTH_TOKEN="..."

# compile scala, run tests, build fat jar
sbt assembly

# run locally
java -cp target/scala-2.11/streaming-instagram-assembly-0.0.7.jar InstagramDemo standalone

# run on spark
spark-submit --class InstagramDemo --master local[2] target/scala-2.11/streaming-instagram-assembly-0.0.7.jar spark

How does it work?

Instagram doesn't expose a firehose API so we resort to polling. The InstagramReceiver pings the Instagram API every few seconds and pushes any new images into Spark Streaming for further processing.

Currently, the following ways to read images are supported:

Release process

  1. Configure your credentials via the SONATYPE_USER and SONATYPE_PASSWORD environment variables.
  2. Update version.sbt
  3. Enter the SBT shell: sbt
  4. Run sonatypeOpen "enter staging description here"
  5. Run publishSigned
  6. Run sonatypeRelease