The library provides Spark Source to efficiently handle streaming from object store for files saved in dynamically changing paths,
e.g. s3://my_bucket/my-data/2022/06/09/, that might be described as s3://my_bucket/my-data/{YYYY}/{MM}/{DD}/.
The library is available on Maven Central.
If you use SBT you can add it to your build.sbt:
libraryDependencies += "ai.hunters" %% "spark-adaptive-file-connector" % "1.0.0"for Maven:
<dependency>
    <groupId>ai.hunters</groupId>
    <artifactId>spark-adaptive-file-connector_2.12</artifactId>
    <version>1.0.0</version>
</dependency>or for other dependency manager.
This is the simplest example of using the connector in Scala application:
val readFileJsonStream = spark.readStream
      .format("dynamic-paths-file")
      .option("fileFormat", "json")
      .schema(yourSchema)
      .load("s3://my-bucket/prefix/{YYYY}/{MM}/{DD}")sbt "clean;compile"
sbt "clean;test"
The easiest way to start playing with this library is an integration test DynamicPathsFileStreamSourceProviderITest.
It shows how to:
- pass parameters to Spark to be able to use this source connector
 - check what paths have been read and how fast
 - recover from checkpoint
 - see output
 
When you use this library in your app please remember to register
the DynamicPathsFileStreamSourceProvider in your DataSourceRegister. You can see example what was needed for tests in
src/test/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister.
An Advanced S3 Connector for Spark to Hunt for Cyber Attacks (Data&AI Summit, San Francisco 2022)