music-of-the-ainur / solr.almaren

Solr Connector For Almaren Framework

Version Matrix

Solr Connector

Build Status

libraryDependencies += "com.github.music-of-the-ainur" %% "solr-almaren" % "0.2.9-2-4"

Solr Connector was implemented using https://github.com/lucidworks/spark-solr. The Solr Connector just works on Solr Cloud. For all the options available for the connector check on this link.

spark-shell --master "local[*]" --packages "com.github.music-of-the-ainur:almaren-framework_2.11:0.2.8-$SPARK_VERSION,com.github.music-of-the-ainur:solr-almaren_2.11:0.2.7-2-4" --repositories https://repo.boundlessgeo.com/main/

Source and Target

Source

Parameteres

Parameters Description
collection collection name
ZookeeperHost(zkhost) localhost:9983
options Description(Value)
----------------------- ------------------------------------------------------------------------------
query limits the rows you want to load into Spark("body_t:solr")
fields specify a subset of fields("id,author_s,favorited_b")
filters to apply filters on the values in documents("firstName:Sam,lastName:Powell")
rows specify the number of rows to be displayed on the page(100)
max_rows Limits the result set to a maximum number of rows(5000)
request_handler Set the Solr request handler for queries("/export","/select")

Example

import com.github.music.of.the.ainur.almaren.Almaren
import com.github.music.of.the.ainur.almaren.solr.Solr.SolrImplicit

val almaren = Almaren("App Name")

almaren.builder.sourceSolr("collection","zkHost1:2181,zkHost2:2181",Map("field_names" -> "first_name,last_name","rows" -> 100))

almaren.builder.targetSolr("collection","zkHost1:2181,zkHost2:2181",options)

Target:

Parameters

Parameters Description
collection collection name
ZookeeperHost(zkhost) localhost:9983
Savemode SaveMode.ErrorIfExists
options Description(Value)
----------------------- ----------------------------------------------------------------
soft_commit_secs set soft_commit_sec(10 seconds)
commit_within force commit to happen after specified time(5000 milliSeconds)
batch_size number of documents sent in a HTTP call (1000)
gen_uniq_key generating unique key for each document(true)
solr_field_types specify field types for solr("rating:string,title:text_en")

Example

import com.github.music.of.the.ainur.almaren.solr.Solr.SolrImplicit
import com.github.music.of.the.ainur.almaren.builder.Core.Implicit
import com.github.music.of.the.ainur.almaren.Almaren
import org.apache.spark.sql.SaveMode

val almaren = Almaren("App Name")

almaren.builder
    .sourceSql("""SELECT sha2(concat_ws("",array(*)),256) as id,*,current_timestamp from deputies""")
    .coalesce(30)
    .targetSolr("deputies","cloudera:2181,cloudera1:2181,cloudera2:2181/solr",Map("batch_size" -> "100000","commit_within" -> "10000"),SaveMode.Overwrite)
    .batch