crealytics / spark-google-adwords   0.9.2

Apache License 2.0 GitHub

A library for querying Google AdWords data with Apache Spark, for Spark SQL and DataFrames

Scala versions: 2.11 2.10

Spark Google AdWords Library

Join the chat at https://gitter.im/spark-google-adwords/Lobby

A library for querying Google AdWords data with Apache Spark, for Spark SQL and DataFrames.

Build Status

Requirements

This library is tested with Spark 2.1+. It might work on older versions, but we don't provide any support on that.

Linking

You can link against this library in your program at the following coordinates:

Scala 2.11

groupId: com.crealytics
artifactId: spark-google-adwords_2.11
version: 0.9.2

Using with Spark shell

This package can be added to Spark using the --packages command line option. For example, to include it when starting the spark shell:

Spark compiled with Scala 2.11

$SPARK_HOME/bin/spark-shell --packages com.crealytics:spark-google-adwords_2.11:0.9.2

Features

This package allows querying Google AdWords reports as Spark DataFrames. The API accepts several options (see the Google AdWords developer docs for details):

  • clientId, clientSecret: a client identifier and secret that you can generate like this.
  • developerToken: a token that identifies your API activity
  • refreshToken: a token that you can generate using the AdWordsAuthHelper as shown below. This token represents the user consent to grant access to a certain set of APIs and will be used to generate further, more short-lived access tokens which are actually used to authenticate calls to the AdWords API. For more information also see the official documentation.
  • reportType: The report type you want to query. Use the same CAPITALS_WITH_UNDERSCORE spelling as in the listing.
  • clientCustomerId: id of the account for which you want to query data.
  • userAgent (optional, default = Spark): An arbitrary user-agent that will be used when querying the API.
  • during (optional, default = LAST_30_DAYS): The time range for which you want to query data. Check the official documentation for allowed values or use StartDate,EndDate for a custom date range.

Scala API

Spark 1.4+:

Generate a refresh token (if you don't have one yet):

import com.crealytics.google.adwords._
val clientId = "123456789123-yourclientid.apps.googleusercontent.com"
val clientSecret = "yourclientsecret-1"
val authHelper = new AdWordsAuthHelper(clientId, clientSecret)

// The next line prints a URL that you have to open in the browser and copy the displayed authentication code
println(authHelper.authorizationUrl)

// Paste the authentication code from the browser window here to get the refresh token
println(authHelper.getRefreshToken("TheAuthenticationTokenFromTheBrowser"))

Create a DataFrame from an AdWords report:

import org.apache.spark.sql.SQLContext

val sqlContext = new SQLContext(sc)
val df = sqlContext.read
    .format("com.crealytics.google.adwords")
    .option("clientId", clientId)
    .option("clientSecret", clientSecret)
    .option("developerToken", "YourDeveloperToken")
    .option("refreshToken", "1/YourRefreshToken")
    .option("reportType", "SHOPPING_PERFORMANCE_REPORT")
    .option("clientCustomerId", "1234567890")
    .option("userAgent", "Spark")
    .option("during", "LAST_30_DAYS")
    .load()

Building From Source

This library is built with SBT. To build a JAR file simply run sbt assembly from the project root. The build configuration includes support for Scala 2.11.