Jumpstart Your MDM / PIM / DQM Solution Selection

The solution selection service on this site started 3 months ago as told here.

Since then Master Data Management (MDM) and Product Information Management (PIM) solutions have been joined by Data Quality Management (DQM) solutions, where some of the most innovative DQM solutions have joined the listing on this site.

More than 50 requesters have provided information about the context, scope and requirements of their intended solution and based on that received a report telling:

  • Your solution listWhich solution that is the best fit for a direct proof of concept
  • Which 3 solutions that are the best fit for a shortlist of solutions
  • Which 7 solutions that are the best fit for a longlist of solutions

Depending on your organization’s rules and the circumstances of your solution selection this report is aimed to jumpstart your selection process using one of the above selections.

The requesters of this report that have given feedback have provided positive responses as told in the post about the First Experiences with the MDM / PIM Solution Selection Service.

The service is still free. Start here.

Get a Grip on Data Quality Dimensions

Data Quality Dimensions Wordle

Data quality dimensions are some of the most used terms when explaining why data quality is important, what data quality issues can be and how you can measure data quality. Ironically, we sometimes use the same data quality dimension term for two different things or use two different data quality dimension terms for the same thing. Some of the troubling terms are:

Validity / Conformity – same same but different

Validity is most often used to describe if data filled in a data field obeys a required format or are among a list of accepted values. Databases are usually well in doing this like ensuring that an entered date has the day-month-year sequence asked for and is a date in the calendar or to cross check data values against another table and see if the value exist there.

The problems arise when data is moved between databases with different rules and when data is captured in textual forms before being loaded into a database.

Conformity is often used to describe if data adheres to a given standard, like an industry or international standard. This standard may due to complexity and other circumstances not or only partly be implemented as database constraints or by other means. Therefore, a given piece of data may seem to be a valid database value but not being in compliance with a given standard.

For example, the code value for a colour being “0,255,0” may be the accepted format and all elements are in the accepted range between 0 and 255 for a RGB colour code. But the standard for a given product colour may only allow the value “Green” and the other common colour names and “0,255,0” will when translated end up as “Lime” or “High green”.

Accuracy / Precision – true, false or not sure

The difference between accuracy and precision is a well-known statistical subject.

In the data quality realm accuracy is most often used to describe if the data value corresponds correctly to a real-world entity. If we for example have a postal address of the person “Robert Smith” being “123 Main Street in Anytown” this data value may be accurate because this person (for the moment) lives at that address.

But if “123 Main Street in Anytown” has 3 different apartments each having its own mailbox, the value does not, for a given purpose, have the required precision.

If we work with geocoordinates we have the same challenge. A given accurate geocode may have the sufficient precision to tell the direction to the nearest supermarket is, but not precise enough to know in which apartment the out-of-milk smart refrigerator is.

Timeliness / Currency – when time matters

Timeliness is most often used to state if a given data value is present when it is needed. For example, you need the postal address of “Robert Smith” when you want to send a paper invoice or when you want to establish his demographic stereotype for a campaign.

Currency is most often used to state if the data value is accurate at a given time – for example if “123 Main Street in Anytown” is the current postal address of “Robert Smith”.

Uniqueness / Duplication – positive or negative

Uniqueness is the positive term where duplication is the negative term for the same issue.

We strive to have uniqueness by avoiding duplicates. In data quality lingo duplicates are two (or more) data values describing the same real-world entity. For example, we may assume that

  • “Robert Smith at 123 Main Street, Suite 2 in Anytown”

is the same person as

  • “Bob Smith at 123 Main Str in Anytown”

Completeness / Existence – to be, or not to be

Completeness is most often used to tell in what degree all required data elements are populated.

Existence can be used to tell if a given dataset has all the needed data elements for a given purpose defined.

So “Bob Smith at 123 Main Str in Anytown” is complete if we need name, street address and city, but only 75 % complete if we need name, street address, city and preferred colour and preferred colour is an existent data element in the dataset.

Data Quality Management 

Master Data Management (MDM) solutions and specialized Data Quality Management (DQM) tools have capabilities to asses data quality dimensions and improve data quality within the different data quality dimensions.

Check out the range of the best solutions to cover this space here on the list.

Data Matching vs Deduplication

The two terms data matching and deduplication are often used synonymously.

In the data quality world deduplication is used to describe a process where two or more data records, that describes the same real-world entity, are merged into one golden record. This can be executed in different ways as told in the post Three Master Data Survivorship Approaches.

Data matching can be seen as an overarching discipline to deduplication. Data matching is used to identify the duplicate candidates in deduplication. Data matching can also be used to identify matching data records between internal and external data sources as examined in the post Third-Party Data Enrichment in MDM and DQM.

As an end-user organization you can implement data matching / deduplication technology from either pure play Data Quality Management (DQM) solution providers or through data management suites and Master Data Management (MDM) solutions as reported in the post DQM Tools In and Around MDM Tools.

When matching internal data records against external sources one often used approach is utilizing the data matching capabilities at the third-party data provider. Such providers as Dun & Bradstreet (D&B), Experian and others offer this service in addition to offering the third-party data.

To close the circle, end-user organizations can use the external data matching result to improve the internal deduplication and more. One example is to apply a matched duns-numbers from D&B for company records as a strong deduplication candidate selection criterium. In addition, such data matching results may often result not in a deduplication, but in building hierarchies of master data.

Data Matching Deduplication

Third-Party Data Enrichment in MDM and DQM

An often-requested capability in Master Data Management (MDM) and Data Quality Management (DQM) is data enrichment from – and verification against – third-party data providers. The data providers can be government data providers, commercial data providers and open data providers.

The two most common used scenarios are:

  • Data enrichment from – and verification against – business directories
  • Verification against – and enrichment from – address directories

Business directory integration

Integration with business directories is done with party master data as B2B customers and suppliers. The aim is often to enrich already gathered internal master data with external data such as:

  • Industry sector codes as SIC or NACE codes
  • Company family trees
  • Credit worthiness supporting data

Sometimes you may also want to (conditionally) overwrite – or supplement – internal gathered data such as:

  • Company name
  • Addresses
  • Phone numbers

You may also want to verify that a business exists and catch when a business dissolve.

Integration can be done with:

  • Global business directories, where Dun & Bradstreet is the most prominent. The advantage here is a uniform integration point and data structure.
  • National directories for each country often supplied by a government body. The advantage here is localized data fit for national requirements and optimal freshness.

Address verification

Verifying a postal address – and translating it into a standard format – is done with location master data that most often are part of party master with emphasis on B2C customer data.

Also, in this case there are global versus national options.

Some MDM / DQM providers have their own global services. Examples are Informatica, who acquired the service called Address Doctor, and IBM. Other MDM / DQM providers utilize the service called Loqate. The advantage here is a uniform integration point and data structure.

In many countries there are also national services that provides richer and localized data with optimal freshness. The richness may be multi-language versions, granular structures feasible in that country and property data such as which kind of building that exist on that address.

A common enrichment type is also getting the geocodes related to a postal address.

Your requirements

Your prioritization of business directory integration and address verification is part of the selection criteria here on the site in the Select your solution service.

Thrid party

Five Essential MDM / PIM Capabilities

Many of the recent posts here on the blog have been around some of the most essential capabilities that Master Data Management (MDM) and Product Information Management (PIM) solutions are able to provide.

Five MDM PIM CapabilitiesData Matching

Having the ability to match and link master data records that are describing the same real-world entity is probably most useful in MDM and in the context of party master data. However, there are certainly also scenarios where product master data must be matched. While identifying the duplicates is hard enough, there must also be functionality to properly settle the match as explained in the post Three Master Data Survivorship Approaches.

Workflow Management

While the first MDM / PIM solutions emphasized on storing “a single source of truth” for master data, most tools today also provide functionality for processing master data. This is offered through integrated workflows as examined in the post Master Data Workflow Management.

Hierarchy Management

Master data comes in hierarchies (and even graphs). Examples are company family trees, locations and product classifications as told in the post Hierarchy Management in MDM and PIM.

Handling Multiple Cultures

If your solution will be implemented across multiple countries – and even in countries with multiple languages – you must be able to manage versions of master data and product information in these languages and often also represented in multiple alphabets and script systems. This challenge is described in the post Multi-Cultural Capabilities in MDM, PIM and Data Quality Management.

Reference Data Management

The terms master data and reference data are sometimes used synonymously. The post What is Reference Data Management (RDM)? is about what is usually considered special about reference data. Some MDM (and PIM) solutions also encompasses the handling of reference data.

The Capabilities That You Need

The above-mentioned capabilities are just some of the requirements you can mark in a service that can draft a list of MDM/PIM/DQM tools that are most relevant for you. Try it here: Select your solution.