What is Data Matching and Deduplication?

The two terms data matching and deduplication are often used synonymously.

In the data quality world deduplication is used to describe a process where two or more data records, that describes the same real-world entity, are merged into one golden record. This can be executed in different ways as told in the post Three Master Data Survivorship Approaches.

Data matching can be seen as an overarching discipline to deduplication. Data matching is used to identify the duplicate candidates in deduplication. Data matching can also be used to identify matching data records between internal and external data sources as examined in the post Third-Party Data Enrichment in MDM and DQM.

As an end-user organization you can implement data matching / deduplication technology from either pure play Data Quality Management (DQM) solution providers or through data management suites and Master Data Management (MDM) solutions as reported in the post DQM Tools In and Around MDM Tools.

When matching internal data records against external sources one often used approach is utilizing the data matching capabilities at the third-party data provider. Such providers as Dun & Bradstreet (D&B), Experian and others offer this service in addition to offering the third-party data.

To close the circle, end-user organizations can use the external data matching result to improve the internal deduplication and more. One example is to apply a matched duns-numbers from D&B for company records as a strong deduplication candidate selection criterium. In addition, such data matching results may often result not in a deduplication, but in building hierarchies of master data.

Data Matching and Deduplication

This site has a list of the most innovative providers of data matching and deduplication tools stretching from best-of-breed solutions for Articficial Intelligence (AI) underpinned data matching and deduplication specialists to Master Data Management (MDM) solutions that include data matching and deduplication capabilities. Check the list here.

What is a Golden Record within Data Management?

The term golden record is a core concept within Master Data Management (MDM) and Data Quality Management (DQM). A golden record is a representation of a real world entity. This representation may be compiled from multiple different representations of that entity in a single or in multiple different databases within the enterprise system landscape.

A golden record is optimized towards meeting data quality dimensions as:

  • Being a unique representation of the real world entity described
  • Having a complete description of that entity covering all purposes of use in the enterprise
  • Holding the most current and accurate data values for the entity described

In Multidomain MDM we work with a range of different entity types as party (with customer, supplier, employee and other roles), location, product and asset. The golden record concept applies to all of these entity types, but in slightly different ways.

Party Golden Record

Having a golden record that facilitates a single view of a customer is probably the most known example of using the golden record concept. Managing customer records and dealing with duplicates of those is the most frequent data quality issue around.

If you are not able to prevent duplicate records from entering your MDM world, which is the best approach, then you have to apply data matching capabilities. When identifying a duplicate you must be able to intelligently merge any conflicting views into a golden record as examined in the post Three Master Data Survivorship Approaches.

In lesser degree we see the same challenges in getting a single view of suppliers and you ultimately will want to have a single view on any business partner, also where the same real world entity have both customer, supplier and other roles to your organization.

There are party identification systems available. Most countries have national ID systems for both citizens (however in most countries mostly restricted to public administration) and organizations. There is Legal Entity Identifier (LEI) concept slowly penetrating in financial services. Also, there are commercial organization identifiers as the Duns Number available.

Location Golden Record

Having the same location only represented once in a golden record and applying any party, product and asset record, and ultimately golden record, to that record may be seen as quite academic. Nevertheless, striving for that concept will solve many data quality conundrums.

Location management have different meanings and importance for different industries. One example is that a brewery makes business with the legal entity (party) that owns a bar, café, restaurant. However, even though the owner of that place changes, which happens a lot, the brewery is still interested in being the brand served at that place. Also, the brewery wants to keep records of logistics around that place and the historic volumes delivered to that place. Utility and insurance are other examples of industries where the location golden record (should) matter a lot.

Knowing the properties of a location also supports the party deduplication process. For example, if you have two records with the name “John Smith” on the same address, the probability of that John Smith being the same real world entity is dependent on whether that location is a single-family house or a nursing home.

Location identification concepts revolves around postal adresses, which are fluffy and varies in format by country, and geocoding systems as latitude/longitude, UTM coordinates, WGS coordinates and more.

Golden Records

Product Golden Record

Product Information Management (PIM) solutions became popular with the raise of multi-channel where having the same representation of a product in offline and online channels is essential. The self-service approach in online sales also drew the requirements of managing a lot more product attributes than seen before, which again points to a solution of handling the product entity centralized.

In large organizations that have many business units around the world you struggle with having a local view and a global view of products. A given product may be a finished product to one unit but a raw material to another unit. Even a global SAP rollout will usually not clarify this – rather the contrary.

While third party reference data helps a lot with handling golden records for party and location, this is lesser the case for product master data. Classification systems and data pools do exist, but will certainly not take you all the way. With product master data you must rely more on second party master data meaning sharing product master data within the business ecosystems where you operate.

The none-profit organization GS1 has done a lot in implementing the Global Trade Item Number (GTIN) based on the Universal Product Code (UPC) and the European Article Number (EAN) concept. However there are still some challenges in this concept around packaging levels and more.

Asset (or Thing) Golden Record

In asset master data management you also have different purposes where having a single view of a real world asset helps a lot. There are namely financial purposes and logistic purposes that have to aligned, but also a lot of others purposes depending on the industry and the type of asset.

With the raise of the Internet of Things (IoT) we will have to manage a lot more assets (or things) than we usually have considered. When a thing (a machine, a vehicle, an appliance) becomes intelligent and now produces big data, master data management and indeed multi-domain master data management becomes imperative.

You will want to know a lot about the product model of the thing in order to make sense of the produced big data. For that, you need the product (model) golden record. You will want to have deep knowledge of the location in time of the thing. You cannot do that without the location golden records. You will want to know the different party roles in time related to the thing. The owner, the operator, the maintainer. If you want to avoid chaos, you need party golden records.

Tools That Can Help

This site has a list of innovative MDM and DQM solution that can help you mastering golden records. Check out the list here.

What is Multi-Domain MDM?

Multi-domain Master Data Management is usually perceived as the union of Customer MDM, Supplier MDM and Product MDM. It is. And it is much more than that.

Customer MDM is typically about federating the accounts receivable in the ERP system(s) and the direct and prospective accounts in the CRM system(s). Golden records are formed through deduplication of multiple representations of the same real-world entity.

Supplier (or vendor) MDM is typically about federating the accounts payable in the ERP system(s) and the existing and prospective accounts in the SRM system(s). A main focus is on the golden records and the company family tree they are in.

Product MDM has a buy-side and a sell-side.

On the buy-side MDM is taking care of trading data for products to resell, in manufacturing environments also the trading data for raw materials and in some cases also for parts to be used in Maintenance, Repair and Operation (MRO). The additional long tail of product specifications may in resell scenarios be onboarded in an embedded/supplementary Product Information Management (PIM) solution.

On the sell-side the trading data are handled for resell products and in manufacturing environments the finished products. The additional long tail of product specifications may be handled in an embedded/supplementary Product Information Management (PIM) solution.

What is multidomain MDM

Multidomain MDM does this in a single solution / suite of solutions. And much more as for example:

  • Supplier contacts can be handled in a generic party master data structure.
  • Customer contacts can be handled in a generic party master data structure
  • Besides the direct accounts in CRM the indirect accounts and contacts can in the party master data structure too. Examples of such parties are:
    • Influencers in the form of heath care professionals in life science.
    • Influencers in the form of architects and other construction professionals in building material manufacturing.
    • End consumers in many supply chain B2B2C scenarios.
  • Employee records can be handled in a generic party master data structure. The roles of sales representatives and their relation to customers, influencers, product hierarchies and location hierarchies can be handled as well as purchase responsibles and their relation to suppliers, influencers, product hierarchies and location hierarchies can be handled.
  • The relation between suppliers and product hierarchies and location hierarchies cand be handled.
  • The relation between customers and end consumers and the product hierarchies and location hierarchies can be handled.
  • Inbound product information feeds from suppliers can be organized and optimized through Product Data Syndication (PDS) solutions.
  • The relation between customer preferences and product information can be handled in Product eXperience Management (PXM) solutions.
  • Outbound product information feeds to resellers can be organized and optimized through Product Data Syndication (PDS) solutions.

This site has a list of the most innovative solutions that can either be your multi-domain solution or supplement other solutions as a best-of-breed component. Check the list here.

Party MDM

The Master Data Management (MDM) market has traditionally been divided into Customer MDM and Product MDM – with Vendor/Supplier MDM as a rarer third option.

However, from being an academic notion we see more and more implementations where the MDM solution is build as a Party MDM solution, where the party entity encompass customer, vendor/supplier, other business partners, internal business units and any other party entity that matters to the sell, buy and make side of the enterprise.

Party MDM

The party MDM concept will also encompass the employees (and contractors) in the business units – which can be seen as Human Resource MDM – as well as the contacts at B2B customers, vendors/suppliers and other business partners.

There are many drivers for building this model.

One example is that many enterprises, especially large corporations, has an intersection of customers and vendors/suppliers. This case was examined in the post How Bosch is Aiming for Unified Partner Master Data Management.

Then there is the good old question: “What is a customer?”. In many business scenarios there are more than direct customers that matters in marketing and selling. In manufacturing, including life science, there are B2B2C chains. In these and other industries there are influencers that matters. In life science that is healthcare professionals. In building materials that is for example architects and other construction professionals.

In banking the term counterparty is used to cover both direct customers and other parties that are referred to in the service delivery. In education there are teachers and students. In public administration there are citizens.

Practically all organizations have more parties than customers and vendors/suppliers involved in the operating model and therefore their descriptions must sooner or later be handled as master data in a unified Party MDM model. This will underpin the digital transformation that is on the agenda in every organization these days.

Popular Entries on The Resource List

This site has a list of white papers, ebooks, reports and webinars from solution and service providers.

The aim is to give inspiration for organizations having the quest to implement or upgrade their Master Data Management (MDM), Product Information Management (PIM) and/or Data Quality Management (DQM) capability.

The list has now been online in a month and it is time to look at which entries that until now have been the most popular in terms of click through. These are:

ROI of MDM, PIM and DQM

Exploring The ROI of PIM and MDMHave you ever wondered how to effectively evaluate the return on investment (ROI) of a Product Information Management (PIM) and Master Data Management (MDM) implementation? Then, take a look at some real-life examples. Download the Enterworks ebook on Exploring The ROI of PIM and MDM.

MDM, PIM and DQM market overview

The State of Product Information Management 2020Get an overview of why PIM solutions are implemented in more and more organizations, which capabilities a 2020 PIM solution needs to cover, where the market is heading and who the PIM vendors in the market are and how this affect your purchase of PIM. Download the Dynamicweb PIM white paper The State of Product Information Management 2020. 

MDM, PIM and DQM implementation

virtual-conference-webcast-revConferences cancelled? Stuck working from home? Bring the conferences to you with an virtual MDM conference. Don’t miss this must see 6 week live webcast series and hear what other companies are doing in the world of MDM along with best practices and workshops by industry experts.. Register for this Enterworks webcast series at the Everything Master Data Management (MDM) Virtual Conference.

Extended MDM

Intelligent Data Hub - Taking MDM to the Next LevelMDM solutions have been instrumental in solving core data quality issues in a traditional way, focusing primarily on simple master data entities such as customer or product. Organizations now face new challenges with broader and deeper data requirements to succeed in their digital transformation. Help your organization through a successful digital transformation while taking your MDM initiative to the next level. Download the Semarchy white paper Intelligent Data Hub – Taking MDM to the Next Level.

Data Quality

4 Keys to Unlocking Data Quality with MDMBusinesses today face a rapidly growing mountain of content and data. Mastering this content can unlock a whole new level of Business Intelligence for your organization and impact a range of data analytics. It’s also crucial for operational excellence and digital transformation. Download the 1WorldSync and Enterworks ebook on 4 Keys to Unlocking Data Quality with MDM.

Next To Come

More resources from solution and service vendors are on the way. Additionally, there will also be a Case Story List with success stories from various industries. Stay tuned.

If you have comments, suggestions and/or entries to be posted (yes, there is a very modest fee), then get in touch here: