apache / gravitino   0.9.1

Apache License 2.0 Website GitHub

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.

Scala versions: 2.13 2.12

Apache Gravitino™

GitHub Actions Build GitHub Actions Integration Test License Contributors Release Open Issues Last Committed OpenSSF Best Practices

Introduction

Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages metadata directly in different sources, types, and regions, providing users with unified metadata access for data and AI assets.

Gravitino Architecture

🚀 Key Features

  • Unified Metadata Management: Manage diverse metadata sources through a single model and API (e.g., Hive, MySQL, HDFS, S3).
  • End-to-End Data Governance: Features like access control, auditing, and discovery across all metadata assets.
  • Direct Metadata Integration: Changes in underlying systems are immediately reflected via Gravitino’s connectors.
  • Geo-Distribution Support: Share metadata across regions and clouds to support global architectures.
  • Multi-Engine Compatibility: Seamlessly integrates with query engines without modifying SQL dialects.
  • AI Asset Management (WIP): Support for AI model and feature tracking.

🌐 Common Use Cases

  • Federated metadata discovery across data lakes and data warehouses
  • Multi-region metadata synchronization for hybrid or multi-cloud setups
  • Data and AI asset governance with unified audit and access control
  • Plug-and-play access for engines like Trino or Spark
  • Support for evolving metadata standards, including AI model lineage

📚 Documentation

The latest Gravitino documentation is available at gravitino.apache.org/docs/latest.

This README provides a basic overview; visit the site for full installation, configuration, and development documentation.

🧪 Quick Start

Use Gravitino Playground (Recommended)

Gravitino provides a Docker Compose–based playground for a full-stack experience.
Clone or download the Gravitino Playground repository and follow its README.

Run Gravitino Locally

  1. Download and extract a binary release.
  2. Edit conf/gravitino.conf to configure settings.
  3. Start the server:
./bin/gravitino.sh start
  1. To stop:
./bin/gravitino.sh stop

Press CTRL+C to stop.

🧊 Iceberg REST Catalog

Gravitino provides a native Iceberg REST catalog service.
See: Iceberg REST catalog service

🔌 Trino Integration

Gravitino includes a Trino connector for federated metadata access.
See: Using Trino with Gravitino

🛠️ Building from Source

Gravitino uses Gradle. Windows is not currently supported.

Clean build without tests:

./gradlew clean build -x test

Build a distribution:

./gradlew compileDistribution -x test

Or compressed package:

./gradlew assembleDistribution -x test

Artifacts are output to the distribution/ directory.

More build options: How to build Gravitino

👨‍💻 Developer Resources

🤝 Contributing

We welcome all kinds of contributions—code, documentation, testing, connectors, and more!

To get started, please read our CONTRIBUTING.md guide.

🔗 ASF Resources

🪪 License

Apache Gravitino is licensed under the Apache License, Version 2.0.
See the LICENSE file for details.

Apache®, Apache Gravitino™, Apache Hadoop®, Apache Hive™, Apache Iceberg™, Apache Kafka®, Apache Spark™, Apache Submarine™, Apache Thrift™, and Apache Zeppelin™ are trademarks of the Apache Software Foundation in the United States and/or other countries.