Spark Google AdWords Library
A library for querying Google AdWords data with Apache Spark, for Spark SQL and DataFrames.
Requirements
This library is tested with Spark 2.1+. It might work on older versions, but we don't provide any support on that.
Linking
You can link against this library in your program at the following coordinates:
Scala 2.11
groupId: com.crealytics
artifactId: spark-google-adwords_2.11
version: 0.9.2
Using with Spark shell
This package can be added to Spark using the --packages
command line option. For example, to include it when starting the spark shell:
Spark compiled with Scala 2.11
$SPARK_HOME/bin/spark-shell --packages com.crealytics:spark-google-adwords_2.11:0.9.2
Features
This package allows querying Google AdWords reports as Spark DataFrames. The API accepts several options (see the Google AdWords developer docs for details):
clientId
,clientSecret
: a client identifier and secret that you can generate like this.developerToken
: a token that identifies your API activityrefreshToken
: a token that you can generate using the AdWordsAuthHelper as shown below. This token represents the user consent to grant access to a certain set of APIs and will be used to generate further, more short-lived access tokens which are actually used to authenticate calls to the AdWords API. For more information also see the official documentation.reportType
: The report type you want to query. Use the sameCAPITALS_WITH_UNDERSCORE
spelling as in the listing.clientCustomerId
: id of the account for which you want to query data.userAgent
(optional, default =Spark
): An arbitrary user-agent that will be used when querying the API.during
(optional, default =LAST_30_DAYS
): The time range for which you want to query data. Check the official documentation for allowed values or useStartDate,EndDate
for a custom date range.
Scala API
Spark 1.4+:
Generate a refresh token (if you don't have one yet):
import com.crealytics.google.adwords._
val clientId = "123456789123-yourclientid.apps.googleusercontent.com"
val clientSecret = "yourclientsecret-1"
val authHelper = new AdWordsAuthHelper(clientId, clientSecret)
// The next line prints a URL that you have to open in the browser and copy the displayed authentication code
println(authHelper.authorizationUrl)
// Paste the authentication code from the browser window here to get the refresh token
println(authHelper.getRefreshToken("TheAuthenticationTokenFromTheBrowser"))
Create a DataFrame from an AdWords report:
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val df = sqlContext.read
.format("com.crealytics.google.adwords")
.option("clientId", clientId)
.option("clientSecret", clientSecret)
.option("developerToken", "YourDeveloperToken")
.option("refreshToken", "1/YourRefreshToken")
.option("reportType", "SHOPPING_PERFORMANCE_REPORT")
.option("clientCustomerId", "1234567890")
.option("userAgent", "Spark")
.option("during", "LAST_30_DAYS")
.load()
Building From Source
This library is built with SBT. To build a JAR file simply run sbt assembly
from the project root. The build configuration includes support for Scala 2.11.