GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark / SparkSQL with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs)/ SpatialSQL that efficiently load, process, and analyze large-scale spatial data across machines.
GeoSpark contains several modules:
|GeoSpark-SQL||SQL/DataFrame||SparkSQL 2.1 and later||Spark-core, Spark-SQL, GeoSpark-core|
|GeoSpark-Viz||RDD, SQL/DataFrame||RDD - Spark 2.X/1.X, SQL - Spark 2.1 and later||Spark-core, Spark-SQL, GeoSpark-core, GeoSpark-SQL|
|GeoSpark-Zeppelin||Apache Zeppelin||Spark 2.1+, Zeppelin 0.8.1+||Spark-core, Spark-SQL, GeoSpark-core, GeoSpark-SQL, GeoSpark-Viz|
- Core: GeoSpark SpatialRDDs and Query Operators.
- SQL: SQL interfaces for GeoSpark core.
- Viz: Visualization extension of GeoSpark Spatial RDD and DataFrame.
- GeoSpark-Zeppelin: GeoSpark visualization plugin for Apache Zeppelin
Please visit GeoSpark website for details and documentations.
- GeoSpark 1.2.0 is released.
- Tons of bug fixes and new functions! Please read GeoSpark release note.
- GeoSparkViz now supports DataFrame API. Please read Visualize Spatial DataFrame/RDD.
- GeoSpark-Zeppelin can connect GeoSpark to Apache Zeppelin. Please read Interact with GeoSpark via Zeppelin
- GeoSparkViz Maven coordinate change. Please read Maven coordinate.
- This release includes the PR from 13 contributors. Please read GeoSpark release note to learn their names.
- The full research paper of GeoSpark has been accepted by Geoinformatica Journal. This paper has over 40 pages to dissect GeoSpark in details and compare it with many other existing systems such as Magellan, Simba, and SpatialHadoop.
GeoSpark development team has published four papers about GeoSpark. Please read Publications.
GeoSpark received an evaluation from PVLDB 2018 paper "How Good Are Modern Spatial Analytics Systems?" Varun Pandey, Andreas Kipf, Thomas Neumann, Alfons Kemper (Technical University of Munich), quoted as follows:
GeoSpark comes close to a complete spatial analytics system. It also exhibits the best performance in most cases.