My blog
Data discovery at Uber: The continued success of Databook

Data discovery at Uber: The continued success of Databook

How Uber's in-house platform powers discovery, exploration, and knowledge at scale.

Published on by Anton Vasetenkov

Uber operates the world's largest ride hailing network serving just under 1,000 metropolitan areas spread across 70 countries.

As companies like Uber continue to grow and expand their operations, so do the amounts of data and associated metadata that they produce on a daily basis. These innovative technology firms put a lot of focus on their data and strive to enable their growing data analytics teams to easily find the data that they need.

To facilitate internal data discovery, Uber built its own dataset search and management tool called Databook. A single interface into the company's metadata graph, Databook indexes hundreds of thousands of datasets, millions of columns and fields, and hundreds of thousands of other data entities such as dashboards and pipelines.

Databook's features and architecture

At a high level, Databook ingests metadata from various sources—primary data storages, services, and crawlers—and makes it accessible to end users via a unified search interface. Users can search for indexed data entities which are updated in real-time and view additional signals such as usage statistics and quality trends.

Since its launch in 2016, the Databook platform has changed significantly to provide better flexibility and extensibility as well as an improved user experience. Overall, "Databook 2.0" includes a number of improvements and helps users cut through the noise while allowing them to comb through every detail when necessary.

Bottom line

Centralised data catalogues are essential for companies that are faced with the ever increasing volumes of distributed and complex data. Uber's Databook provides a unified view of its data ecosystem and continues to evolve and grow with the company.

By powering scalable data discovery and exploration, Databook helps Uber better manage and utilise its own data assets and ensures the global success of the data-driven enterprise.

See also

Document understanding: Modern techniques and real-world applications
How document understanding helps bring order to unstructured data.
Navigating unstructured data: The rise of question answering
Question answering technologies are key to efficiently dealing with overwhelming amounts of unstructured data.
What does a knowledge engineer do?
An overview of knowledge engineering and the core competencies and responsibilities of a knowledge engineer.
Linked data for the enterprise: Focus on Bayer's corporate asset register
An overview of COLID, the data asset management platform built using semantic technologies.
How a custom solution helps Facebook's engineers discover the data they need
The story of Nemo, Facebook's internal data discovery engine.

Thanks for stopping by my digital playground! If you want to say hi, you can reach out to me on LinkedIn or via email. I'm always keen to chat and connect.

If you really-really like my work, you can support me by buying me a coffee.