This is a set of Source and Sink classes that can be used to read and write JSON documents to a Couchbase cluster.
The library is provided as-is without any guarantees and is considered to be in alpha-testing stage.
The updated connector that supports Flink 1.7.1 and its new APIs has been published as a snapshot into Maven snapshot repository and is in its final testing stage.
To add the connector, add snapshots repository into your pom.xml
:
<repositories>
<repository>
<id>oss.sonatype.org-snapshot</id>
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
And use the following maven dependency:
<dependency>
<groupId>com.couchbase.client</groupId>
<artifactId>flink-connector-couchbase_2.12</artifactId>
<version>0.4.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>com.couchbase.client</groupId>
<artifactId>flink-connector-couchbase_2.12</artifactId>
<version>${connectorVersion}</version>
</dependency>
${connectorVersion}
= the version of this connector. See the git tags for valid values.
The examples
directory contains basic examples of how to use the connector.
This source connects to DCP port on every node on the cluster and receives document changes as they are processed by the cluster.
Received changes are parsed from the binary format into CouchbaseDocumentChange
objects and grouped into splits according to received DCP snapshots
(that is, each split corresponds to a single DCP snapshot).
The sink supports restoring DCP stream from previously stored vbucket offsets, however returned to the sink splits that were not processed by Flink workers do not roll DCP stream offsets back.
This source connects to QUERY service of a Couchbase cluster, executes provided to the sink query and streams fetched by the query documents in splits of configurable size.
Query source supports providing positional and named arguments to the query. If both named and positional arguments provided, only named arguments will be used. Neither arguments nor query cannot be changed after sink starts streaming documents.
Before starting, source enumerates all matching documents. Then it streams selected documents by wrapping the provided query
into a pagination query SELECT META().id, data.* FROM ($QUERY) as data LIMIT $SPLI_SIZE OFFSET $SPLIT_OFFSET
.
The sink does not detect concurrent modification of the result set and does not account for that while streaming the documents.
This source has been deprecated as it uses deprecated Flink API.
The source works similarly to the CouchbaseDcpSource
.
This sink accepts JsonDocument
objects and upserts them into a collection.
Received documents are first buffered into an ArrayList
and then flushed upon Flink's request in a transaction.
This sink has been deprecated in favor of CouchbaseCollectionSink as it uses deprecated Flink APIs.
Please use Github issues for questions and support. The project is maintained by [email protected]
.