What is reference data and why is it important?

For business and technical users alike, reference data impacts daily operations. In order to optimize data use and availability, organizations need to know what reference data is, what it is not (i.e. master data), why it is important, and how to efficiently manage it with technology.

In his book Managing Reference Data in Enterprise Databases, Malcolm Chisholm, a world-renowned data management thought leader, defines reference data as “any data used solely to categorize other data found in a database, or solely for relating data in a database to information beyond the boundaries of the enterprise.”

Reference data carries meaning. It establishes permissible values, facilitates consistency, and maps internal data against external data and/or standards. Although it represents a small share of total data volume, reference data represents 25% to 50% of tables in databases and affects reporting accuracy and data governance.

Find out how Froedtert & the Medical College of Wisconsin used Collibra to improve data governance and make quick, effective data-driven decisions. Read their Customer Story and discover what we can do for your organization.

Examples of reference data

Many reference data assets are maintained by standards bodies like ISO or by industry consortia. Some examples are:

Reference data characterizes data and relates data to information in both internal and external databases. It can be as simple as specifying that all customer phone numbers must be ten digits in a customer relationship management (CRM) tool. These defined sets rarely change and data users consistently use them in lookup tables, drop down lists or pre-filled forms.

However, not all code sets are so cut and dry. Take something like a country code, again seemingly simple, but even the International Organization for Standardization (ISO) defines codes for countries in different ways under ISO-3166:

Reference data can also change over time, so organizations need to continuously refresh and manage data to maintain quality. For instance, country codes change an average of 3-5 times per year, and currency codes change an average of 5 to 10 times per year.

Organizations use, customize and extend numerous existing industry ontologies to meet changing needs over time; as a result, they need to maintain consistency with the original standards to prevent drift from the external semantics. Any inconsistencies can impair decision making and diagnoses, and incur liability. To avoid these inconsistencies and minimize the consequences of poor reference data management, organizations need to make use of robust governance practices and policies.

What is reference data vs. master data?

A common misconception is that reference data and master data are identical, but they are two different types of data.

Reference data is the data used to define and classify other data. Master data is the data about business entities, such as customers and products. Master data provides the context needed for business transactions.

While both reference data and master data management provide context for business activities, their usage and implementation can help define their differences. First, domain and subject matter experts curate, centrally administer and publish reference to downstream systems. Reference data often drives control logic. It categorizes data into groups before data consumers analyze them, sometimes to unify external and internal data, and other times to classify it into buckets for analysis.

In a succinct sense, reference data are sets of values or classification schemas that are referred to by systems, applications, data stores, processes, and reports, as well as by transactional and master records.

On the other hand, master data describes the people, places and things involved in an organization’s business. Organizations use master data to apply quality rules, manage their transaction structure data and enterprise structure data to create a single golden record.

Why is reference data important?

Reference affects every part of the organization because it helps provide context to data. It affects data quality and in turn, data usability. Efficient reference data management is necessary for organizations aiming to achieve Data Intelligence.

Reference data use cases

Organizations use reference data to address a number of use cases. For example:

Consequences of poor reference data management

Misalignment of data and manual management of reference data poses many challenges and real consequences, such as:

How do organizations manage reference data?

A reference data management tool is a mechanism that defines business processes around reference data and helps data stewards populate and manage it over time. Such a tool

Required capabilities for managing reference data

In order to effectively manage reference data, organizations need a suite of capabilities. An efficient reference data management solution must manage complex relationships across the enterprise. Organizations must invest in a data governance solution with native reference data management and additional lineage, stewardship and workflow capabilities features to resolve inconsistencies in the data:

Managing reference data with Collibra

Many organizations use Collibra to manage their reference data. By leveraging Collibra’s products and capabilities like Data Governance, Collaborative Workflows, Data Stewardship, and more, our customers manage their reference data in the context of other initiatives and achieve data intelligence, all from one platform.

Learn how organizations use reference data management to get the most out of their data assets.

Dorian Allen

Product Marketing Analyst

Dorian J. Allen is a product marketer with experience in enterprise technology and consumer packaged goods. He holds a BA from Dartmouth College.