Unity Catalog 101

Contributors
Avril Aysha
Michelle Leon
Product Lead at Databricks
Matthew Powers
CFA, Staff Developer Advocate
Subscribe to our newsletter
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Unity Catalog is a data catalog for managing all your data and AI assets easily and securely. You can use Unity Catalog to organize your data assets, collaborate, manage data access, and ensure compliance with data regulations.

This blog will introduce you to how Unity Catalog works and how you can use it to manage your data assets. We will explore Unity Catalog’s great features for data management, examine its architecture and main concepts, and work through a practical hands-on coding example.

If you’re new to data catalogs, you can review the basics of data catalogs, which includes a discussion of why you should use a data catalog in any serious data-driven business organization. 

Unity Catalog is in active development and the community is growing quickly, so it’s a really exciting time to join the project! You can find our recommendations for contributing to the project at the end of this post.

Let’s jump in! 🪂

What Is Unity Catalog?

Unity Catalog is a modern data catalog that helps you organize and manage all of your data and AI assets.

With Unity Catalog, you can:

  • Organize all your data assets in one place
  • Govern access to data assets and ensure compliance through a single source of truth
  • Use your favorite query engines and tools to process your data

Unity Catalog offers full interoperability for all your data objects, regardless of their formats. Unlike many other catalogs, its capabilities extend beyond managing structured, tabular data. Unity Catalog can also help you organize and govern unstructured data, including text, images, video and audio, AI and ML models, vector databases, and user-defined functions  (UDFs) . 

This means Unity Catalog can serve as the single source of truth for all of your data.

Why Should I Use Unity Catalog?

Unity Catalog has some great features that make it a great choice for managing your data. It makes it easy to:

  1. Find your data
  2. Use your favorite data tools
  3. Collaborate with many users on the same data
  4. Manage who can access your data

Let’s take a closer look at each of these features.

Find Your Data Easily

Unity Catalog makes it easy to organize large amounts of data for quick and easy discovery and retrieval.

Any serious business organization creating value from data stores large amounts of it for processing and analysis. The volume of these data assets can grow exponentially over time, making it very difficult to quickly find the exact data you need to do your job. Without clear organization, data discovery becomes a bottleneck and slows you down.

Unity Catalog helps by using the metadata of your data objects to automatically tag and organize your assets. This means there’s no more need to search through your local files or S3 buckets for that specific version of that one table you were working on the other day. All your data assets are registered in one central repository; you don’t need to know where it’s stored, you just need to know its name.

Unity Catalog uses a three-level naming convention to organize your data: <catalog>.<schema>.<asset>. You’ll read more about this in the Unity Catalog Architecture & Concepts section below.

Use Your Favorite Tools

Unity Catalog is fully open source and supports open data formats. 

Many data warehousing solutions store your data and metadata in proprietary formats. This is fine when you’re working in their environment but becomes a problem when you need to process the data elsewhere—for example, because another query engine is faster. 

Unity Catalog provides open access to all of your data and metadata by supporting the OpenAPI spec and open table formats. This eliminates the risk of vendor lock-in. You are free to access and process your data using your favorite tools and engines. 

Unity Catalog is built by a great and friendly open source community. See our recommendations for contributing to the project to learn how you can join the community!

Collaborate on Data Assets

Unity Catalog makes it easy for multiple users to collaborate on the same data assets.

Suppose you need to work together on a Parquet table with an external client. You create a copy of the table and share it with them. You then both make edits to this file, and when you’re done you just need to merge the changes in the copied table back to the source. But in the meantime, your colleague has made changes to the source table, and the column names no longer match. You now have to manually figure out how to resolve the data conflicts. Not fun.

Unity Catalog solves this problem by storing all your data assets in a central repository. Users who want to collaborate can access the same table or ML model from the central catalog. Changes to the assets are managed reliably through secure transactions to avoid data corruption. This way, you no longer need to maintain multiple versions across many locations when collaborating and you can avoid data clutter and corruption.

Manage Data Access

Unity Catalog makes it easy to manage access to data assets per user role.

As you just saw, Unity Catalog helps you with data discovery and collaboration by storing all of your data assets in a central data repository. But not everyone should have equal access to this centralized data. Your company’s HR director, for example, will probably need access to company-wide employee data (including salaries)  that you and your data engineering colleague who are working on a BI dashboard project probably should not be able to see.

Unity Catalog helps you define access policies based on user profiles.

This is also great for sharing data outside of your organization. 

Suppose you want to share a dataset with an external organization, but the dataset has some sensitive columns that should not be shared. Instead of manually copying the table without the sensitive columns, you can use Unity Catalog to allow certain users to only access specific parts of your data asset, for example by performing a shallow clone of your Delta Lake table.

Unity Catalog provides centralized and secure governance through temporary credential vending. Credential vending allows data access to be precise (users only access the data they need to see) and fully auditable. 

Unity Catalog Architecture & Concepts

Unity Catalog stores all of your data assets in a three-level namespace:

  • Catalog
  • Schema
  • Data asset

A catalog is the top-level organizational unit for your data assets. Catalogs often correspond to business units or other higher-level categorizations.

A schema is the next level of organization, used to collect your data and AI assets into more granular logical categories. Often a schema will represent a single project or use case. Unity Catalog schemas should not be confused with table schemas. Catalog schemas and table schemas are not related to each other, despite the name.

A data asset is any data object that you want to store for future processing and analysis. Unity Catalog supports many different kinds of assets, including tables, volumes, models, and functions.

Data Assets

Unity Catalog can manage all sorts of data assets for you:

  • Tables store tabular data in many formats.
  • Volumes store semi- and unstructured data such as nested JSON, audio, video, and text.
  • Functions store units of programming logic (also known as user-defined functions, or UDFs) that return a scalar value or a set of rows.
  • Models store AI and ML models packaged with MLflow.

Tables and volumes can be either managed or external

Managed tables and volumes are fully managed by Unity Catalog, which means that Unity Catalog fully owns the creation, deletion, and storage of these data assets. When you create or delete a managed data asset, Unity Catalog automatically creates or deletes the underlying files.

External tables and volumes are stored outside of managed storage, for example in cloud object stores. If you create or delete an external table, you will manually create or delete the corresponding files. In this case, Unity Catalog only manages the metadata for these data assets.

Getting Started with Unity Catalog

Now let’s take a look at how you can use Unity Catalog to work with your data assets.

You will need to clone the open source Unity Catalog GitHub repository:

git clone git@github.com:unitycatalog/unitycatalog.git

You will also need Java 17 or above installed on your machine. You can run the java --version command in a terminal to verify that you have the right version of Java installed.

Starting the Unity Catalog Server

Open a terminal window and navigate into the unitycatalog repo directory. Run bin/start-uc-server to spin up a local server. 

Here is what you should see:

unitycatalog ❯ bin/start-uc-server
###################################################################
#  _    _       _ _            _____      _        _              #
# | |  | |     (_) |          / ____|    | |      | |             #
# | |  | |_ __  _| |_ _   _  | |     __ _| |_ __ _| | ___   __ _  #
# | |  | | '_ \| | __| | | | | |    / _` | __/ _` | |/ _ \ / _` | #
# | |__| | | | | | |_| |_| | | |___| (_| | || (_| | | (_) | (_| | #
#  \____/|_| |_|_|\__|\__, |  \_____\__,_|\__\__,_|_|\___/ \__, | #
#                      __/ |                                __/ | #
#                     |___/               v0.1.0-SNAPSHOT  |___/  #
###################################################################

Leave this terminal window as it is to keep the Unity Catalog server running, and open a new terminal window to start running Unity Catalog commands and interacting with your data.

Working with Tables

The Unity Catalog local server comes with some tables preloaded by default. These are great for quick experimentation.

List Tables

You can list all the tables in a catalog schema using the bin/uc table list command. You will need to specify the catalog and schema by name using the --catalog and --schema flags, like this:

> bin/uc table list --catalog unity --schema default
┌─────────────────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬────────────────────────────────────┐
│      NAME       │CATALOG│SCHEMA_│TABLE_T│DATA_SO│COLUMNS│STORAGE│COMMENT│PROPERT│CREATED│UPDATED│              TABLE_ID              │
│                 │ _NAME │ NAME  │  YPE  │URCE_FO│       │_LOCATI│       │  IES  │  _AT  │  _AT  │                                    │
│                 │       │       │       │ RMAT  │       │  ON   │       │       │       │       │                                    │
├─────────────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼────────────────────────────────────┤
│marksheet        │unity  │default│MANAGED│DELTA  │[{"n...│file...│Mana...│{"ke...│1721...│1721...│c389adfa-5c8f-497b-8f70-26c2cca4976d│
├─────────────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼────────────────────────────────────┤
│marksheet_uniform│unity  │default│EXTE...│DELTA  │[{"n...│file...│Unif...│{"ke...│1721...│1721...│9a73eb46-adf0-4457-9bd8-9ab491865e0d│
├─────────────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼────────────────────────────────────┤
│numbers          │unity  │default│EXTE...│DELTA  │[{"n...│file...│Exte...│{"ke...│1721...│1721...│32025924-be53-4d67-ac39-501a86046c01│
├─────────────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼────────────────────────────────────┤
│user_countries   │unity  │default│EXTE...│DELTA  │[{"n...│file...│Part...│{}     │1721...│1721...│26ed93b5-9a18-4726-8ae8-c89dfcfea069│
└─────────────────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴────────────────────────────────────┘

As you can see here, the Unity Catalog local server comes with four tables by default, all stored in the unity catalog under the default schema. One of these tables is managed, and the others are external. All four tables are in the Delta Lake format.

Let’s take a closer look at a specific table.

Read Tables

You can use the bin/uc table read command to read the content of a table. You will need to refer to it by its full_name, in the format <catalog>.<schema>.<table>. You can limit the number of returned rows using the max_results keyword. For example, let’s look at the first three rows of the numbers table:

 bin/uc table read --full_name unity.default.numbers --max_results 3
┌───────────────────────────────────────┬──────────────────────────────────────┐
│as_int(integer)                        │as_double(double)                     │
├───────────────────────────────────────┼──────────────────────────────────────┤
│564                                    │188.75535598441473                    │
├───────────────────────────────────────┼──────────────────────────────────────┤
│755                                    │883.6105633023361                     │
├───────────────────────────────────────┼──────────────────────────────────────┤
│644                                    │203.4395591086936                     │
└───────────────────────────────────────┴──────────────────────────────────────┘

Create Tables

Use the bin/uc table create command to create a new table in your Unity Catalog. 

You will need to specify its full_name and columns along with their data types. This will define the new table’s schema (again, not to be confused with the Unity Catalog schema). For external tables, you will also need to define the storage_location.

You can create tables in many different formats, including Delta Lake, Parquet, ORC, JSON, CSV, AVRO, and TXT. Use the format flag to specify the format. If not specified, a Delta Lake table will be created.

Let’s see this in action. Run the command below with the correct path/to/storage to create a new external Delta Lake table with two columns, some_numbers and some_letters:

> bin/uc table create --full_name unity.default.test --columns "some_numbers INT, some_letters STRING, some_times TIMESTAMP" --storage_location $DIRECTORY$

Note that you will need to manually set the $DIRECTORY$ variable to the correct storage location. If you don’t know where Unity Catalog is storing your files, take a look at the metadata of an existing table using bin/uc table get --full_name <catalog>.<schema>.<table> to see its storage location.

This should output something like:

Table created successfully at: 

┌────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────┐
│        KEY         │                                               VALUE                                                │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│NAME                │test                                                                                                │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│CATALOG_NAME        │unity                                                                                               │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│SCHEMA_NAME         │default                                                                                             │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│TABLE_TYPE          │EXTERNAL                                                                                            │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│DATA_SOURCE_FORMAT  │DELTA                                                                                               │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│COLUMNS             │{"name":"some_numbers","type_text":"int","type_json":"{\"name\":\"some_numbers\",\"type\":\"integer\│
│                    │",\"nullable\":true,\"metadata\":{}}","type_name":"INT","type_precision":0,"type_scale":0,"type_inte│
│                    │rval_type":null,"position":0,"comment":null,"nullable":true,"partition_index":null}                 │
│                    │{"name":"some_letters","type_text":"string","type_json":"{\"name\":\"some_letters\",\"type\":\"strin│
│                    │g\",\"nullable\":true,\"metadata\":{}}","type_name":"STRING","type_precision":0,"type_scale":0,"type│
│                    │_interval_type":null,"position":1,"comment":null,"nullable":true,"partition_index":null}            │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│STORAGE_LOCATION    │file:///Users/avriiil/Documents/git/my-forks/unitycatalog/etc/data/external/unity/default/tables/te │
│                    │st2                                                                                                 │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│COMMENT             │null                                                                                                │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│PROPERTIES          │{}                                                                                                  │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│CREATED_AT          │1721644623209                                                                                       │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│UPDATED_AT          │1721644623209                                                                                       │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│TABLE_ID            │2e8b23f2-4ff7-4a10-8d23-b8c7bae2bdb0                                                                │
└────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────┘


Get Table Metadata

Tables stored in Unity Catalog have rich metadata. You can use the bin/uc table get command to examine a table’s metadata.

Let's take a look at the metadata for the numbers table:

> bin/uc table get --full_name unity.default.numbers
┌────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────┐
│        KEY         │                                               VALUE                                                │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│NAME                │numbers                                                                                             │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│CATALOG_NAME        │unity                                                                                               │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│SCHEMA_NAME         │default                                                                                             │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│TABLE_TYPE          │EXTERNAL                                                                                            │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│DATA_SOURCE_FORMAT  │DELTA                                                                                               │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│COLUMNS             │{"name":"as_int","type_text":"int","type_json":"{\"name\":\"as_int\",\"type\":\"integer\",\"nullable│
│                    │\":false,\"metadata\":{}}","type_name":"INT","type_precision":0,"type_scale":0,"type_interval_type":│
│                    │null,"position":0,"comment":"Int                    column","nullable":false,"partition_index":null}│
│                    │{"name":"as_double","type_text":"double","type_json":"{\"name\":\"as_double\",\"type\":\"double\",\"│
│                    │nullable\":false,\"metadata\":{}}","type_name":"DOUBLE","type_precision":0,"type_scale":0,"type_inte│
│                    │rval_type":null,"position":1,"comment":"Double column","nullable":false,"partition_index":null}     │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│STORAGE_LOCATION    │file:///Users/avriiil/Documents/git/my-forks/unitycatalog/etc/data/external/unity/default/tables/nu│
│                    │mbers/                                                                                              │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│COMMENT             │External table                                                                                      │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│PROPERTIES          │{"key1":"value1","key2":"value2"}                                                                   │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│CREATED_AT          │1721238005617                                                                                       │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│UPDATED_AT          │1721238005617                                                                                       │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│TABLE_ID            │32025924-be53-4d67-ac39-501a86046c01                                                                │
└────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────┘

For more table functionality, check out the tables documentation.

Working with Volumes

Unity Catalog volumes are useful for registering datasets that are non-tabular or not supported as Unity Catalog tables. They’re a great option for JSON files, text files, Lance datasets, and media files such as audio, video, and images.

Here's an example of a Unity Catalog schema that contains three volumes:

Unity Catalog unstrucutred_data chart

The Unity Catalog local server comes with some volumes preloaded by default. Let’s take a look at them.

List Volumes

You can list the volumes in a catalog schema using the bin/uc volume list command:

> bin/uc volume list --catalog unity --schema default
┌────────┬────────┬──────────┬────────┬────────┬────────┬────────────────────────────────────┬────────┬────────┬────────┐
│CATALOG_│SCHEMA_N│   NAME   │COMMENT │CREATED_│UPDATED_│             VOLUME_ID              │VOLUME_T│STORAGE_│FULL_NAM│
│  NAME  │  AME   │          │        │   AT   │   AT   │                                    │  YPE   │LOCATION│   E    │
├────────┼────────┼──────────┼────────┼────────┼────────┼────────────────────────────────────┼────────┼────────┼────────┤
│unity   │default │txt_files │null    │17212...│17212...│74695d77-d48b-4f8e-9894-54a3e110b1ae│MANAGED │file:...│unity...│
├────────┼────────┼──────────┼────────┼────────┼────────┼────────────────────────────────────┼────────┼────────┼────────┤
│unity   │default │json_files│null    │17212...│17212...│d3f18882-eb1f-4cbb-bbc4-0347091224e8│EXTERNAL│file:...│unity...│
└────────┴────────┴──────────┴────────┴────────┴────────┴────────────────────────────────────┴────────┴────────┴────────┘

You should see two volumes: txt_files (managed) and json_files (external).

Get Volume Metadata

You can get the metadata of a volume using the bin/uc volume get command:

> bin/uc volume get --full_name unity.default.json_files
>
┌─────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────┐
│         KEY         │                                              VALUE                                              │
├─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────┤
│CATALOG_NAME         │unity                                                                                            │
├─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────┤
│SCHEMA_NAME          │default                                                                                          │
├─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────┤
│NAME                 │json_files                                                                                       │
├─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────┤
│COMMENT              │null                                                                                             │
├─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────┤
│CREATED_AT           │1721234405627                                                                                    │
├─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────┤
│UPDATED_AT           │1721234405627                                                                                    │
├─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────┤
│VOLUME_ID            │d3f18882-eb1f-4cbb-bbc4-0347091224e8                                                             │
├─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────┤
│VOLUME_TYPE          │EXTERNAL                                                                                         │
├─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────┤
│STORAGE_LOCATION     │file:///Users/avriiil/Documents/git/my-forks/unitycatalog/etc/data/external/unity/default/volume│
│                     │s/json_files/                                                                                    │
├─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────┤
│FULL_NAME            │unity.default.json_files                                                                         │
└─────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────┘

Read Volumes

To read the contents of a volume, use the bin/uc volume read command. This will list the directories and/or files in your volume:

> bin/uc volume read --full_name unity.default.json_files
d.json [file]
c.json [file]
dir1 [directory]

Here, you can see that the json_files volume contains two JSON files and one directory. Let’s dig a little deeper to read the contents of a specific file using the path keyword:

> bin/uc volume read --full_name unity.default.json_files --path c.json
{
  "marks" :[
    {"name" :  "a" , "score" :  20},
    {"name" :  "b" , "score" :  30},
    {"name" :  "c" , "score" :  40},
    {"name" :  "d" , "score" :  50}
  ]
}

Nice work! You’ve read the contents of a file stored in a volume.

Create Volumes

Now let's try creating a new external volume. 

First, physically create a directory with some files in it. For the purposes of illustration, let’s assume you’ve created a directory called /tmp/my_volume and put two files in it, doc001.txt and doc002.json.

Next, create the volume in Unity Catalog using the bin/uc volume create command:

> bin/uc volume create --full_name unity.default.my_volume --storage_location /tmp/my_volume

Now add a file to your local volume directory:

> mkdir /tmp/my_volume
> touch /tmp/my_volume/new_doc.txt

Unity Catalog will catalog any new files in your registered volume. You can see the new contents of this volume using the bin/uc volume read command:

> bin/uc volume read --full_name unity.default.my_volume
doc001.txt
doc002.json
new_doc.txt

You can read more about working with volumes in the Unity Catalog volumes documentation.

Working with Functions

You can register user-defined functions in Unity Catalog schemas. Storing and managing functions is great for reusing code and applying permissions or filters.

The following diagram shows an example of a Unity Catalog instance with two functions, sum and my_function:

Unity Catalog team and data chart

List Functions

You can list the functions in your Unity Catalog namespace using the bin/uc function list command:

> bin/uc function list --catalog unity --schema default
You should see something that looks like this:
┌────────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┐
│    NAME    │CATALOG_│SCHEMA_N│INPUT_PA│DATA_TYP│FULL_DAT│RETURN_P│ROUTINE_│ROUTINE_│ROUTINE_│PARAMETE│IS_DETER│SQL_DATA│IS_NULL_│SECURITY│SPECIFIC│COMMENT │PROPERTI│FULL_NAM│CREATED_│UPDATED_│FUNCTION│EXTERNAL│
│            │  NAME  │  AME   │  RAMS  │   E    │ A_TYPE │ ARAMS  │  BODY  │DEFINITI│DEPENDEN│R_STYLE │MINISTIC│_ACCESS │  CALL  │ _TYPE  │ _NAME  │        │   ES   │   E    │   AT   │   AT   │  _ID   │_LANGUAG│
│            │        │        │        │        │        │        │        │   ON   │  CIES  │        │        │        │        │        │        │        │        │        │        │        │        │   E    │
├────────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┤
│sum         │unity   │default │{"par...│INT     │INT     │null    │EXTERNAL│t = x...│null    │S       │true    │NO_SQL  │false   │DEFINER │sum     │Adds ...│null    │unity...│17183...│null    │8e83e...│python  │
├────────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┤
│lowercase   │unity   │default │{"par...│STRING  │STRING  │null    │EXTERNAL│g = s...│null    │S       │true    │NO_SQL  │false   │DEFINER │lower...│Conve...│null    │unity...│17183...│null    │33d81...│python  │
└────────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┘

Get Function Metadata

You can get the metadata of one of these functions using bin/uc function get. For example, if you enter this command:

> bin/uc function call --full_name unity.default.sum --input_params "1,2,3"

6

You should see something that looks like this:

┌────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────┐
│        KEY         │                                               VALUE                                                │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│NAME                │sum                                                                                                 │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│CATALOG_NAME        │unity                                                                                               │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│SCHEMA_NAME         │default                                                                                             │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│INPUT_PARAMS        │{"parameters":[{"name":"x","type_text":"int","type_json":"{\"name\":\"x\",\"type\":\"integer\",\"nul│
│                    │lable\":false,\"metadata\":{}}","type_name":"INT","type_precision":null,"type_scale":null,"type_inte│
│                    │rval_type":null,"position":0,"parameter_mode":"IN","parameter_type":"PARAM","parameter_default":null│
│                    │,"comment":null},{"name":"y","type_text":"int","type_json":"{\"name\":\"y\",\"type\":\"integer\",\"n│
│                    │ullable\":false,\"metadata\":{}}","type_name":"INT","type_precision":null,"type_scale":null,"type_in│
│                    │terval_type":null,"position":1,"parameter_mode":"IN","parameter_type":"PARAM","parameter_default":nu│
│                    │ll,"comment":null},{"name":"z","type_text":"int","type_json":"{\"name\":\"z\",\"type\":\"integer\",\│
│                    │"nullable\":false,\"metadata\":{}}","type_name":"INT","type_precision":null,"type_scale":null,"type_│
│                    │interval_type":null,"position":2,"parameter_mode":"IN","parameter_type":"PARAM","parameter_default":│
│                    │null,"comment":null}]}                                                                              │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│DATA_TYPE           │INT                                                                                                 │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│FULL_DATA_TYPE      │INT                                                                                                 │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│RETURN_PARAMS       │null                                                                                                │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ROUTINE_BODY        │EXTERNAL                                                                                            │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ROUTINE_DEFINITION  │t = x + y + z\nreturn t                                                                             │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ROUTINE_DEPENDENCIES│null                                                                                                │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│PARAMETER_STYLE     │S                                                                                                   │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│IS_DETERMINISTIC    │true                                                                                                │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│SQL_DATA_ACCESS     │NO_SQL                                                                                              │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│IS_NULL_CALL        │false                                                                                               │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│SECURITY_TYPE       │DEFINER                                                                                             │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│SPECIFIC_NAME       │sum                                                                                                 │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│COMMENT             │Adds two numbers.                                                                                   │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│PROPERTIES          │null                                                                                                │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│FULL_NAME           │unity.default.sum                                                                                   │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│CREATED_AT          │1718315581372                                                                                       │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│UPDATED_AT          │null                                                                                                │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│FUNCTION_ID         │8e83e2d9-e523-46a1-b69c-8fe9212f1057                                                                │
├────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│EXTERNAL_LANGUAGE   │python                                                                                              │
└────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────┘

The ROUTINE_DEFINITION is probably most helpful here: t = x + y + z\nreturn t. This function takes in three arguments and returns their sum. The DATA_TYPE field tells us that the output should be of type INT.

Calling Functions from Unity Catalog

Let's try calling this function.

We'll use the bin/uc function call command to reference the function by its full name and pass three input parameters:

import datetime

def dateToQuarter(date_string):
date = datetime.datetime.strptime(date_str, '%Y-%m-%d')
quarter = (date.month - 1) // 3 + 1
return print(f'The quarter for the provided date is: Q{quarter}.')

You can register this function in Unity Catalog as follows:

> bin/uc function create --full_name unity.default.dateToQuarter --data_type STRING --input_params "date_string string" --language "python" --def "import datetime\ndate = datetime.datetime.strptime(date_str, '%Y-%m-%d')\nquarter = (date.month - 1) // 3 + 1\nreturn print(f'The quarter for the provided date is: Q{quarter}.')"

Here, we define a new function by its full name, specifying the data type of the output as well as the input parameters and their data types.

This command should output something like:

┌────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────┐
│        KEY         │                                          VALUE                                           │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│NAME                │dateToQuarter                                                                             │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│CATALOG_NAME        │unity                                                                                     │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│SCHEMA_NAME         │default                                                                                   │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│INPUT_PARAMS        │{"parameters":[{"name":"date","type_text":"string","type_json":"{\"name\":\"date\",\"type\│
│                    │":\"string\",\"nullable\":true,\"metadata\":{}}","type_name":"STRING","type_precision":nul│
│                    │l,"type_scale":null,"type_interval_type":null,"position":0,"parameter_mode":null,"paramete│
│                    │r_type":null,"parameter_default":null,"comment":null}]}                                   │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│DATA_TYPE           │STRING                                                                                    │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│FULL_DATA_TYPE      │STRING                                                                                    │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│RETURN_PARAMS       │null                                                                                      │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│ROUTINE_BODY        │EXTERNAL                                                                                  │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│ROUTINE_DEFINITION  │import pandas as pd\nquarter=pd.Timestamp(date).quarter\nreturn print(f'The quarter for   │
│                    │the provided date is: Q{quarter}.')                                                       │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│ROUTINE_DEPENDENCIES│null                                                                                      │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│PARAMETER_STYLE     │S                                                                                         │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│IS_DETERMINISTIC    │true                                                                                      │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│SQL_DATA_ACCESS     │NO_SQL                                                                                    │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│IS_NULL_CALL        │true                                                                                      │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│SECURITY_TYPE       │DEFINER                                                                                   │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│SPECIFIC_NAME       │dateToQuarter                                                                             │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│COMMENT             │null                                                                                      │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│PROPERTIES          │null                                                                                      │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│FULL_NAME           │unity.default.dateToQuarter                                                               │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│CREATED_AT          │1724295607870                                                                             │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│UPDATED_AT          │null                                                                                      │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│FUNCTION_ID         │3363ccc9-5000-460e-8b85-2273f8a2323c                                                      │
├────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│EXTERNAL_LANGUAGE   │python                                                                                    │
└────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────┘

You can call your new function using uc/bin function call, as you saw above:

> bin/uc function call --full_name unity.default.dateToQuarter --input_params "2022-08-03"

This will output the corresponding quarter as expected:

The quarter for the provided date is: Q3.

Nice work!

Join the Project!

Unity Catalog is the industry’s only universal catalog for data and AI assets. The project is new and in active development, which means it’s a really exciting time to join! New functionality is continually being designed and implemented, and we are looking to grow the contributing community.

The Unity Catalog project is built and maintained by an open and friendly community that values kind communication and building a productive environment for maximum collaboration and fun. We welcome contributions from all developers, regardless of your experience or programming background—you can write Java, Scala, or Python code, create documentation, submit bugs, or give talks to the community. 

If you’d like to join the Unity Catalog project, here are some ideas to get you started:

  • Check out the Good First Issues in the GitHub repo
  • Build integrations with your favorite engines and tools
  • Contribute documentation pages to help other developers use Unity Catalog
  • Write a blog about your Unity Catalog use case and share it with fellow developers and industry experts

Join our Slack community to get started. See you there!