Get a Grip on Data Quality Dimensions

Data Quality Dimensions Wordle

Data quality dimensions are some of the most used terms when explaining why data quality is important, what data quality issues can be and how you can measure data quality. Ironically, we sometimes use the same data quality dimension term for two different things or use two different data quality dimension terms for the same thing. Some of the troubling terms are:

Validity / Conformity – same same but different

Validity is most often used to describe if data filled in a data field obeys a required format or are among a list of accepted values. Databases are usually well in doing this like ensuring that an entered date has the day-month-year sequence asked for and is a date in the calendar or to cross check data values against another table and see if the value exist there.

The problems arise when data is moved between databases with different rules and when data is captured in textual forms before being loaded into a database.

Conformity is often used to describe if data adheres to a given standard, like an industry or international standard. This standard may due to complexity and other circumstances not or only partly be implemented as database constraints or by other means. Therefore, a given piece of data may seem to be a valid database value but not being in compliance with a given standard.

For example, the code value for a colour being “0,255,0” may be the accepted format and all elements are in the accepted range between 0 and 255 for a RGB colour code. But the standard for a given product colour may only allow the value “Green” and the other common colour names and “0,255,0” will when translated end up as “Lime” or “High green”.

Accuracy / Precision – true, false or not sure

The difference between accuracy and precision is a well-known statistical subject.

In the data quality realm accuracy is most often used to describe if the data value corresponds correctly to a real-world entity. If we for example have a postal address of the person “Robert Smith” being “123 Main Street in Anytown” this data value may be accurate because this person (for the moment) lives at that address.

But if “123 Main Street in Anytown” has 3 different apartments each having its own mailbox, the value does not, for a given purpose, have the required precision.

If we work with geocoordinates we have the same challenge. A given accurate geocode may have the sufficient precision to tell the direction to the nearest supermarket is, but not precise enough to know in which apartment the out-of-milk smart refrigerator is.

Timeliness / Currency – when time matters

Timeliness is most often used to state if a given data value is present when it is needed. For example, you need the postal address of “Robert Smith” when you want to send a paper invoice or when you want to establish his demographic stereotype for a campaign.

Currency is most often used to state if the data value is accurate at a given time – for example if “123 Main Street in Anytown” is the current postal address of “Robert Smith”.

Uniqueness / Duplication – positive or negative

Uniqueness is the positive term where duplication is the negative term for the same issue.

We strive to have uniqueness by avoiding duplicates. In data quality lingo duplicates are two (or more) data values describing the same real-world entity. For example, we may assume that

  • “Robert Smith at 123 Main Street, Suite 2 in Anytown”

is the same person as

  • “Bob Smith at 123 Main Str in Anytown”

Completeness / Existence – to be, or not to be

Completeness is most often used to tell in what degree all required data elements are populated.

Existence can be used to tell if a given dataset has all the needed data elements for a given purpose defined.

So “Bob Smith at 123 Main Str in Anytown” is complete if we need name, street address and city, but only 75 % complete if we need name, street address, city and preferred colour and preferred colour is an existent data element in the dataset.

Data Quality Management 

Master Data Management (MDM) solutions and specialized Data Quality Management (DQM) tools have capabilities to asses data quality dimensions and improve data quality within the different data quality dimensions.

Check out the range of the best solutions to cover this space here on the list.

4 MDM Definitions: Which One is the Best?

What is Master Data Management (MDM)? How can we define MDM?

Well, as with everything in life there are varying and competing definitions. Below you can find 4 different definitions:

Wikipedia: In business, Master data management (MDM) is a method used to define and manage the critical data of an organization to provide, with data integration, a single point of reference. In computing, a master data management tool can be used to support master data management by removing duplicates, standardizing data (mass maintaining), and incorporating rules to eliminate incorrect data from entering the system in order to create an authoritative source of master data. Master data are the products, accounts and parties for which the business transactions are completed.

MDM Wordle

Gartner: Master data management (MDM) is a technology-enabled discipline in which business and IT work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise’s official shared master data assets. Master data is the consistent and uniform set of identifiers and extended attributes that describes the core entities of the enterprise including customers, prospects, citizens, suppliers, sites, hierarchies and chart of accounts.

SearchDataManagement: Master data management (MDM) is a comprehensive method of enabling an enterprise to link all of its critical data to a common point of reference. When properly done, MDM improves data quality, while streamlining data sharing across personnel and departments. In addition, MDM can facilitate computing in multiple system architectures, platforms and applications.

Techopedia: Master data management (MDM) refers to the management of specific key data assets for a business or enterprise. MDM is part data management as a whole but is generally focused on the handling of higher level data elements, such as broader identity classifications of people, things, places and concepts.

Your definition: Which one of the four above-mentioned definitions do you prefer? Or is there a much better fifth one?