Everything You Need To Know About Analytics Catalogs, Data Catalogs, And Metrics Stores In One Easy Cheat-Sheet
Three technologies that are being talked about but mistakenly intertwined and overlapped.
1. Data Catalog:
What it is:
A data catalog is a centralized repository that contains metadata about data assets within an organization. It serves as a comprehensive inventory of available data sources, datasets, databases, tables, files, and other data-related resources. The catalog provides information such as data descriptions, data lineage, data quality, usage statistics, business terms and access permissions. The primary purpose of a data catalog is to enable data discovery, facilitate data governance, and improve data collaboration across teams. Content producers (e.g.: Data Analysts and Data scientists) are the primary consumers for this service.
What it is not:
A repository for all things upstream like Power BI files, Tableau Workbooks, Notebooks or report and dashboard definitions. All the data used in semantic layers, business definitions and other analytical artifacts should have lineage traceable via a Data Catalog. An exception to this is where data products are produced from other source data, in these cases that definition is required to trace lineage back fully.
2. Metrics Store:
What it is:
A metrics store is a specialized storage system designed to be an additional, intermediate area between the data source (database, warehouse, file) and other upstream systems, esp. BI/analytics solutions. These repositories contain definitions of the underlying data and form a semantic or business layer to promote content users to use common ways of using, accessing, and manipulating (e.g.: calculations and normalizations). Content producers are the primary consumers for this service.
What it is not:
A repository for data or analytics assets. Its job is to make upstream reports, dashboards, and visualization creation easier with reusable business and calculation definitions.
3. Analytics Catalog:
What it is:
An analytics catalog contains the metadata associated with analytical assets and artifacts. It provides a centralized repository for storing and organizing, analytical reports, dashboards, visualizations, and other analytics-related objects from various locations and vendors. The analytics catalog helps data analysts, data scientists, business users and all consumers discover and access analytical assets, understand their context and business logic, and promote collaboration and reuse of analytical work within the organization. It also helps Analytics and BI teams get a better understanding of usage of usability to help focus their efforts.
What it is not:
It is not another Business Intelligence tool. It does not require access to data or replication of data. It is not a technology used to define metrics outside of other analytics systems in use.
In summary:
– Data Catalog: Contains metadata about data assets (datasets, databases, files) to facilitate data discovery and data governance.
– Metrics Store: Contains business ready definitions of data to facilitate data consumption in analytics tools.
– Analytics Catalog: Focuses on metadata related to analytical assets, reports, and dashboards, to support analytics collaboration and reuse.
While there may be some overlap in functionalities (like they all have search and they all live in the world of Analytics), these three components serve different purposes and cater to different aspects of data management and analytics within an organization.