A Spark toolkit, not exactly like jq but inspired by it
- RDD[String] within json or jsonArray data type
- Number -> Scala Int, Long, Double
- String -> Scala String
- Object -> Scala Map
- Array -> Scala List
- Boolean -> Scala Boolean
- Compose Field, such as "map1.map2.intField" -> Type above
libraryDependencies += "com.magicsoho" %% "spark-jq" % "0.1.0"
<dependency>
<groupId>com.magicsoho</groupId>
<artifactId>spark-jq_${your_scala_binary_version}</artifactId>
<version>0.1.0</version>
</dependency>
-
first of all
import sjq.RDDLike._
-
rdd.parseJson
- parse json RDD into JSONObject RDD
-
rddJson.fields("field1", "filed2")
- return an RDD[List(field1Type, field2Type)]
-
rddFields(n)
- return RDD[element n in list]
-
rddJson.key[T]("field1") or rdd.field(fieldFoo)
- return RDD[T]
-
rddJson.jsonObject("objKey")
- return an JSONObject RDD
-
rddField.[Int|Long|Double|Boolean|List[T]|Map[T1,T2]|JSONObject]
- map RDD[Any] into RDD[T with specified type]
-
first of all
import sjq.Lambda._
-
addInt | addDouble
- an RDD[(Key, (Int, Int))] use with reduceByKey(addInt)
-
addTuple2
- an RDD[(Key, ((AnyNumber, AnyNumber), (AnyNumber, AnyNumber))] use with reduceByKey(addTuple2)
- support regex field
- support more format: csv, xml, ...
- support other input data, sql, kafka, flume, ...
- support other RDD reduce function utils
MIT