Sunday, December 22, 2024

Databricks Open Sources Unity Catalog, Creating the Industry’s Only Universal Catalog for Data and AI

Databricks, the Data and AI company, announced that it is open sourcing Unity Catalog, the industry’s only unified solution for data and artificial intelligence (AI) governance across clouds, data formats and data platforms. This initiative builds on Databricks’ commitment to open ecosystems, ensuring customers have the flexibility and control they need without vendor lock-in. Databricks is ushering in a new era for open catalog standards for data and AI with support from Amazon Web Services (AWS), Google Cloud, Microsoft, NVIDIA, Salesforce, and more.

Unity Catalog OSS offers a universal interface that supports any data format and compute engine, including the ability to read tables with Delta Lake, Apache Iceberg™, and Apache Hudi™ clients via Delta Lake UniForm. It also supports the Iceberg REST Catalog and Hive Metastore (HMS) interface standards. Additionally, Unity Catalog OSS provides for unified governance across tabular, non-tabular data, and AI assets, such as machine learning (ML) models and generative AI tools, letting organizations simplify management at scale.

Unity Catalog: The Leading Data and AI Catalog

Databricks introduced Unity Catalog in 2021 to meet customer demand: organizations need an interoperable catalog for their data and AI workloads. Historically, organizations relied on multiple different single-purpose solutions, creating silos between platforms and between data and AI assets. These silos made it difficult to build modern data and AI applications, which combine tabular data in multiple table formats, unstructured data, ML models, vector indices, and AI tools. Customers created complex webs to manage metadata silos, copied data into different places or different formats to enable access by various engines, or maintained DIY solutions to sync metadata between catalogs. Ultimately, this led to increased costs and complexity, as well as weak governance and fragmented access control. Unity Catalog breaks down those silos for over 10,000 organizations.

Also Read: Cvent Acquires Reposite to Expand its Global Vendor and Supplier Sourcing Marketplace

“Our customers love Unity Catalog. It lets them manage all their data objects — tabular data, unstructured data, and AI and ML assets — in a single source of truth within the Databricks Data Intelligence Platform, versus gluing together multiple single-purpose solutions,” said Ali Ghodsi, Co-founder and CEO at Databricks. “Our platform is the only major data platform in the industry where all data is in an open format by default — now, metadata and governance are open as well, giving enterprises the governance solution they need in today’s data and AI landscape. We’re excited to open source Unity Catalog and release the code. We’ll continue to evolve the open standard in close collaboration with our partners.”

Unity Catalog OSS is the industry’s only universal catalog for data and AI. Key features include:

  • Interoperability: Unity Catalog OSS offers a universal interface that supports any data format and compute engine, including the ability to read tables with Delta Lake, Apache Iceberg™, and Apache Hudi™ clients via Delta Lake UniForm. It also supports the Iceberg REST Catalog and Hive Metastore (HMS) interface standards. Unity Catalog OSS is interoperable with all major cloud platforms, including Microsoft Azure, AWS, GCP, and Salesforce; compute engines like Apache Spark™, Presto, Trino, DuckDB, Daft, PuppyGraph, and StarRocks; and data and AI platforms including dbt Labs, Confluent, Eventual, Fivetran, Granica, Immuta, Informatica, LanceDB, LangChain, Tecton, and Unstructured.
  • Unified governance: Unity Catalog OSS enables unified governance across tabular data, non-tabular data, and AI assets, such as ML models and generative AI tools, letting organizations simplify management, discovery and development at scale.
  • Openness: With its open APIs and Apache 2.0 licensed open source server, Unity Catalog OSS maximizes flexibility and customer choice by enabling broad interoperability across various engines, tools, and platforms.

SOURCE: Databricks

Subscribe Now

    Hot Topics