Interenterprise MDM and Data Quality Management

When working with Data Quality Management (DQM) within Master Data Management (MDM) there are three kinds of data to consider:

1st-party data is the data that is born and managed internally within the enterprise. This data has traditionally been in focus of data quality methodologies and tools with the aim of ensuring that data is fit for the purpose of use and correctly reflects the real-world entity that the data is describing.  

3rd-party data is data sourced from external providers who offers a set of data that can be utilized by many enterprises. Examples a location directories, business directories as the Dun & Bradtstreet Worldbase and public national directories and product data pools as for example the Global Data Synchronization Network (GDSN).

Enriching 1st-party data with 3rd-party is a mean to ensure namely better data completeness, better data consistency, and better data uniqueness.

2nd-party data is data sourced directly from a business partner. Examples are supplier self-registration, customer self-registration and inbound product data syndication. Exchange of this data is also called interenterprise data sharing.

The advantage of using 2nd-party in a data quality perspective is that you are closer to the source, which all things equal will mean that data better and more accurately reflects the real-world entity that the data is describing.

In addition to that, you will also, compared to 3rd-party data, have the opportunity to operate with data that exactly fits your operating model and make you unique compared to your competitors.

Finally, 2nd-party data obtained through interenterprise data sharing, will reduce the costs of capturing data compared to 1st-party data, where else the ever-increasing demand for more elaborate high-quality data in the age of digital transformation will overwhelm your organization.    

The Balancing Act

Getting the most optimal data quality with the least effort is about balancing the use of internal and external data, where you can exploit interenterprise data sharing via interenterprise MDM through combining 2nd-party and 3rd-party data in the way that makes most sense for your organization.

Interenterprise MDM is an emerging discipline in the data management world and one of the topics you can find on the Resource List at this site.

What is MDM? – and the Adjacent Disciplines?

This site is list of solutions for MDM and the disciplines adjacent to MDM. As always, it is good to have a definition of what we are talking about. So, here are some definitions of MDM and an Introduction to 9 adjacent disciplines:

Def MDM

MDM: Master Data Management can be defined as a comprehensive method of enabling an enterprise to link all of its critical data to a common point of reference. When properly done, MDM improves data quality, while streamlining data sharing across personnel and departments. In addition, MDM can facilitate computing in multiple system architectures, platforms and applications. You can find the source of this definition and 3 other – somewhat similar – definitions in the post 4 MDM Definitions: Which One is the Best?

The most addressed master data domains are parties encompassing customer, supplier and employee roles, things as products and assets as well as location.

Def PIM

PIM: Product Information Management is a discipline that overlaps MDM. In PIM you focus on product master data and a long tail of specific product information – often called attributes – that is needed for a given classification of products.

Furthermore, PIM deals with how products are related as for example accessories, replacements and spare parts as well as the cross-sell and up-sell opportunities there are between products.

PIM also handles how products have digital assets attached.

This data is used in omni-channel scenarios to ensure that the products you sell are presented with consistent, complete and accurate data. Learn more in the post Five Product Information Management Core Aspects.

Def DAM

DAM: Digital Asset Management is about handling extended features of digital assets often related to master data and especially product information. The digital assets can be photos of people and places, product images, line drawings, certificates, brochures, videos and much more.

Within DAM you are able to apply tags to digital assets, you can convert between the various file formats and you can keep track of the different format variants – like sizes – of a digital asset.

You can learn more about how these first 3 mentioned TLAs are connected in the post How MDM, PIM and DAM Stick Together.

Def DQM

DQM: Data Quality Management is dealing with assessing and improving the quality of data in order to make your business more competitive. It is about making data fit for the intended (multiple) purpose(s) of use which most often is best to achieved by real-world alignment. It is about people, processes and technology. When it comes to technology there are different implementations as told in the post DQM Tools In and Around MDM Tools.

The most used technologies in data quality management are data profiling, that measures what the data stored looks like, and data matching, that links data records that do not have the same values, but describes the same real world entity.

Def RDM

RDM: Reference Data Management encompass those typically smaller lists of data records that are referenced by master data and transaction data. These lists do not change often. They tend to be externally defined but can also be internally defined within each organization.

Examples of reference data are hierarchies of location references as countries, states/provinces and postal codes, different industry code systems and how they map and the many product classification systems to choose from.

Learn more in the post What is Reference Data Management (RDM)?

Def CDI

CDI: Customer Data Integration is considered as the predecessor to MDM, as the first MDMish solutions focused on federating customer master data handled in multiple applications across the IT landscape within an enterprise.

The most addressed sources with customer master data are CRM applications and ERP applications, however most enterprises have several of other applications where customer master data are captured.

You may ask: What Happened to CDI?

Def CDP

CDP: Customer Data Platform is an emerging kind of solution that provides a centralized registry of all data related to parties regarded as (prospective) customers at an enterprise.

In that way CDP goes far beyond customer master data by encompassing traditional transaction data related to customers and the emerging big data sources too.

Right now, we see such solutions coming both from MDM solution vendors and CRM vendors as reported in the post CDP: Is that part of CRM or MDM?

Def ADM

ADM: Application Data Management is about not just master data, but all critical data that is somehow shared between personel and departments. In that sense MDM covers all master within an organization and ADM covers all (critical) data in a given application and the intersection is looking at master data in a given application.

ADM is an emerging term and we still do not have a well-defined market – if there ever will be one – as examined in the post Who are the ADM Solution Providers?

Def PXM

PXM: Product eXperience Management is another emerging term that describes a trend to positioning PIM solutions away from the MDM flavour and more towards digital experience / customer experience themes.

In PXM the focus is on personalization of product information, Search Engine Optimization and exploiting Artificial Intelligence (AI) in those quests.

Read more about it in the post What is PxM?

Def PDS

PDS: Product Data Syndication connects MDM, PIM (and other) solutions at each trading partner with each other within business ecosystems. Product data syndication is often the first wave of encompassing interenterprise data sharing. You can get the details in the post What is Product Data Syndication (PDS)?

Duplicates vs Nodes in MDM Hierarchies

Identification of duplicate records is a core capability in both Data Quality Management (DQM) and in Master Data Management (MDM).

When you inspect records identified as duplicate candidates, you will often have to decide if they describe the same real-world entity or if they describe two real-world entities belonging to the same hierarchy.

Instead of throwing away the latter result, this link can be stored in the MDM hub as well as a relation in a hierarchy (or graph) and thus support a broader range of operational and analytic purposes.

Individual Persons and Households

In business-to-consumer (B2C) scenarios a key challenge is to have 360 degree view of private customers either as individual persons or a household with a shared economy.

Here you must be able to distinguish between the individual person, the household and people who just happen to live at the same postal address. The location hierarchy plays a role in solving this case. This quest includes having precise addresses when identifying units in large buildings and knowing the kind of building. The probability of two John Smith records being the same person differs if it is a single-family house address or the address of a nursing home.

Companies / Organizations in Company Family Trees

In business-to-business (B2B) scenarios a key challenge is to have 360 degree view of these customers. Similar 360 scenarios exist with suppliers and other business partners.

Organizations can belong to a company family tree. A basic representation for example used in the Dun & Bradstreet Worldbase is having branches at a postal address. These branches belong a legal entity with a headquarter at a given postal address, where there may be other individual branches too. Each legal entity in an enterprise may have a national ultimate mother. In multinational enterprises, there is a global ultimate mother. Public organizations have similar often very complex trees.

Products by Variant and Sourcing

Products are also formed in hierarchies. The challenge is to identify if a given product record points to a certain level in the bottom part of a given product hierarchy. Products can have variants in size, colour and more. A product can be packed in different ways. The most prominent product identifier is the Global Trade Identification Number (GTIN) which occur in various representations as for example the Universal Product Code (UPC) popular in North America and European (now International) Article Number (EAN) popular in Europe. These identifiers are applied by each producer (and in some cases distributor) at the product packing variant level.

Another uniqueness issue for products is around what is called multi-sourcing, being that the same product from the same original manufacturer can be sourced through more than one supplier each with their pricing, discount model, terms of delivery and terms of payment.

Solutions Available

When looking for a solution to support you in this conundrum the best fit for you may be a best-of-breed Data Quality Management (DQM) tool and/or a capable Master Data Management (MDM) platform.

This Disruptive MDM / PIM /DQM List has the most innovative candidates here.

How to Detect and Deal with Duplicates

In the data management world, duplicates are two (or more) records with different values but describing the same real-world entity. The most common occurrence is probably having two records describing the same person as for example:

  • Bob Smith at 1 Main Str in Anytown
  • Robert Smith at One Main Street in Any Town

Having duplicates is a well-known pain-point in business scenarios and efforts to remove that pain-point are going on around the clock in organizations across the planet.

Build or Buy?

Some efforts are done by building duplicate detection procedures in-house and some are done by buying a tool for that.

Home-grown solutions often rely on publicly available algorithms like edit distance and soundex. However, they are small efforts against a huge problem.

When trying to detect duplicates you will run into false positives that are results that indicate that two (or more) records describe the same real-world entity – but they do not as exemplified in the post Famous False Positives. Moreover, there could be heaps of false negatives, which are records that do describe the same real-world entity, but that are not detected by the algorithm.

Enhanced Approaches

Using Machine Learning (ML) and Artificial Intelligence (AI) to avoid false positives and false negatives has been in use for years in deduplication and the underlying data matching. With the recent rise of ML/AI this approach has been more common. Today we see data matching tools relying heavily on ML/AI approaches.

Another enhanced approach you can find in tools on the market is utilizing external data in the quest for detecting duplicates. This way of overcoming the obstacles is described in the post Using External Data in Data Matching.

A huge number of false negatives is besides limitations in comparing and detecting the similarity between records with possible duplicates also based on the ability to having the right records up for comparison. If you have more than a few thousand records in play you need an initial candidate selection procedure in place as pondered in the post Candidate Selection in Deduplication.

Going All the Way

When you have found your matching records the next question is what to do with them? This encompasses to possibly form a golden record and/or to place results in master data hierarchies. How you can do that was elaborated in the post Deduplication as Part of MDM.

All in all it is not advisable to do all this at home (in-house in an organization) as there are tools on the market build upon years of experience with solving the issues and going all the way.

Find the most innovative candidates here.

The Rise of Interenterprise MDM

The recent Gartner Magic Quadrant for Master Data Management Solutions has this strategic planning assumption:

By 2023, organizations with shared ontology, semantics, governance and stewardship processes to enable interenterprise data sharing will outperform those that don’t.

Interenterprise data sharing must be leveraged through interenterprise MDM, where master data are shared between many companies as for example in supply chains. The evolution of interenterprise MDM and the current state of the discipline was touched in the post MDM Terms In and Out of The Gartner 2020 Hype Cycle.

In the 00’s the evolution of Master Data Management (MDM) started with single domain / departmental solutions dominated by Customer Data Integration (CDI) and Product Information Management (PIM) implementations. These solutions were in best cases underpinned by third party data sources as business directories as for example the Dun & Bradstreet (D&B) world base and second party product information sources as for example the GS1 Global Data Syndication Network (GDSN).

In the previous decade multidomain MDM with enterprise-wide coverage became the norm. Here the solution typically encompasses customer-, vendor/supplier-, product- and asset master data. Increasingly GDSN is supplemented by other forms of Product Data Syndication (PDS). Third party and second party sources are delivered in the form of Data as a Service that comes with each MDM solution.

In this decade we will see the rise of interenterprise MDM where the solutions to some extend become business ecosystem wide, meaning that you will increasingly share master data and possibly the MDM solutions with your business partners – or else you will fade in the wake of the overwhelming data load you will have to handle yourself.