A library for reading public search results from Reddit using Spark Streaming.
Before you start using the Reddit stream for Spark, you will need to make sure you have registered a Reddit app in your account. If you have not done so, you can follow these steps:
- Log in and click on the
preferenceslink on the top right.
- Click the
- Click the
Create an appbutton and fill in the form.
Run a demo via:
# set up all the requisite environment variables export REDDIT_APPLICATION_ID="..." export REDDIT_APPLICATION_TOKEN="..." # compile scala, run tests, build fat jar sbt assembly # run locally java -cp target/scala-2.11/streaming-reddit-assembly-0.0.1.jar RedditDemo standalone # run on spark spark-submit --class RedditDemo --master local target/scala-2.11/streaming-reddit-assembly-0.0.1.jar spark
Add to your own project by adding this dependency in your
libraryDependencies ++= Seq( //... "com.github.catalystcode" %% "streaming-reddit" % "0.0.1", //... )
How does it work?
Currently, this streaming library polls Reddit's /r/all/search.json endpoint at interval that conforms to Reddit's API guidelines (http://github.com/reddit/reddit/wiki/API). However, at some point in the near future, this will be migrated to use Reddit live using its websockets support.
- Configure your credentials via the
sbt sonatypeOpen "enter staging description here"