Duplicates vs Nodes in MDM Hierarchies

Identification of duplicate records is a core capability in both Data Quality Management (DQM) and in Master Data Management (MDM).

When you inspect records identified as duplicate candidates, you will often have to decide if they describe the same real-world entity or if they describe two real-world entities belonging to the same hierarchy.

Instead of throwing away the latter result, this link can be stored in the MDM hub as well as a relation in a hierarchy (or graph) and thus support a broader range of operational and analytic purposes.

Individual Persons and Households

In business-to-consumer (B2C) scenarios a key challenge is to have 360 degree view of private customers either as individual persons or a household with a shared economy.

Here you must be able to distinguish between the individual person, the household and people who just happen to live at the same postal address. The location hierarchy plays a role in solving this case. This quest includes having precise addresses when identifying units in large buildings and knowing the kind of building. The probability of two John Smith records being the same person differs if it is a single-family house address or the address of a nursing home.

Companies / Organizations in Company Family Trees

In business-to-business (B2B) scenarios a key challenge is to have 360 degree view of these customers. Similar 360 scenarios exist with suppliers and other business partners.

Organizations can belong to a company family tree. A basic representation for example used in the Dun & Bradstreet Worldbase is having branches at a postal address. These branches belong a legal entity with a headquarter at a given postal address, where there may be other individual branches too. Each legal entity in an enterprise may have a national ultimate mother. In multinational enterprises, there is a global ultimate mother. Public organizations have similar often very complex trees.

Products by Variant and Sourcing

Products are also formed in hierarchies. The challenge is to identify if a given product record points to a certain level in the bottom part of a given product hierarchy. Products can have variants in size, colour and more. A product can be packed in different ways. The most prominent product identifier is the Global Trade Identification Number (GTIN) which occur in various representations as for example the Universal Product Code (UPC) popular in North America and European (now International) Article Number (EAN) popular in Europe. These identifiers are applied by each producer (and in some cases distributor) at the product packing variant level.

Another uniqueness issue for products is around what is called multi-sourcing, being that the same product from the same original manufacturer can be sourced through more than one supplier each with their pricing, discount model, terms of delivery and terms of payment.

Solutions Available

When looking for a solution to support you in this conundrum the best fit for you may be a best-of-breed Data Quality Management (DQM) tool and/or a capable Master Data Management (MDM) platform.

This Disruptive MDM / PIM /DQM List has the most innovative candidates here.

Leave a comment