Modern Data Quality at Scale using Digna

Today’s guest blog post is from Marcin Chudeusz of DEXT.AI. a company specializing in creating Artificial Intelligence-powered Software for Data Platforms.

Have you ever experienced the frustration of missing crucial pieces in your data puzzle? The feeling of the weight of responsibility on your shoulders when data issues suddenly arise and the entire organization looks to you to save the day? It can be overwhelming, especially when the damage has already been done. In the constantly evolving world of data management, where data warehouses, data lakes, and data lakehouses form the backbone of organizational decision-making, maintaining high-quality data is crucial. Although the challenges of managing data quality in these environments are many, the solutions, while not always straightforward, are within reach.

Data warehouses, data lakes, and lakehouses each encounter their own unique data quality challenges. These challenges range from integrating data from various sources, ensuring consistency, and managing outdated or irrelevant data, to handling the massive volume and variety of unstructured data in data lakes, which makes standardizing, cleaning, and organizing data a daunting task.

Today, I would like to introduce you to Digna, your AI-powered guardian for data quality that’s about to revolutionize the game! Get ready for a journey into the world of modern data management, where every twist and turn holds the promise of seamless insights and transformative efficiency.

Digna: A New Dawn in Data Quality Management

Picture this: you’re at the helm of a data-driven organization, where every byte of data can pivot your business strategy, fuel your growth, and steer you away from potential pitfalls. Now, imagine a tool that understands your data and respects its complexity and nuances. That’s Digna for you – your AI-powered guardian for data quality.

Goodbye to Manually Defining Technical Data Quality Rules

Gone are the days when defining technical data quality rules was a laborious, manual process. You can forget the hassle of manually setting thresholds for data quality metrics. Digna’s AI algorithm does it for you, defining acceptable ranges and adapting as your data evolves. Digna’s AI learns your data, understands it, and sets the rules for you. It’s like having a data scientist in your pocket, always working, always analyzing.

Figure 1: Learn how Digna’s AI algorithm defines acceptable ranges for data quality metrics like missing values. Here, the ideal count of missing values should be between 242 and 483, and how do you manually define technical rules for that?

Seamless Integration and Real-time Monitoring

Imagine logging into your data quality tool and being greeted with a comprehensive overview of your week’s data quality. Instant insights, anomalies flagged, and trends highlighted – all at your fingertips. Digna doesn’t just flag issues; it helps you understand them. Drill down into specific days, examine anomalies, and understand the impact on your datasets.

Whether you’re dealing with data warehouses, data lakes, or lakehouses, Digna slips in like a missing puzzle piece. It connects effortlessly to your preferred database, offering a suite of features that make data quality management a breeze. Digna’s integration with your current data infrastructure is seamless. Choose your data tables, set up data retrieval, and you’re good to go.

Figure 2: Connect seamlessly to your preferred database. Select specific tables from your database for detailed analysis by Digna.

Navigate Through Time and Visualize Data Discrepancies

With Digna, the journey through your data’s past is as simple as a click. Understand how your data has evolved, identify patterns, and make informed decisions with ease. Digna’s charts are not just visually appealing; they’re insightful. They show you exactly where your data deviated from expectations, helping you pinpoint issues accurately.

Read also: Navigating the Landscape – Moden Data Quality with Digna

Digna’s Holistic Observability with Minimal Setup

With Digna, every column in your data table gets attention. Switch between columns, unravel anomalies, and gain a holistic view of your data’s health. It doesn’t just monitor data values; it keeps an eye on the number of records, offering comprehensive analysis and deep insights with minimal configuration. Digna’s user-friendly interface ensures that you’re not bogged down by complex setups.

Figure 3: Observe how Digna tracks not just data values but also the number of records for comprehensive analysis. Transition seamlessly to Dataset Checks and witness Digna’s learning capabilities in recognizing patterns.

Real-time Personalized Alert Preferences

Digna’s alerts are intuitive and immediate, ensuring you’re always in the loop. These alerts are easy to understand and come in different colors to indicate the quality of the data. You can customize your alert preferences to match your needs, ensuring that you never miss important updates. With this simple yet effective system, you can quickly assess the health of your data and stay ahead of any potential issues. This way, you can avoid real-life impacts of data challenges.

Watch the product demo

Kickstart your Modern Data Quality Journey

Whether you prefer inspecting your data directly from the dashboard or integrating it into your workflow, I invite you to commence your data quality journey. It’s more than an inspection; it’s an exploration—an adventure into the heart of your data with a suite of features that considers your data privacy, security, scalability, and flexibility.

Automated Machine Learning

Digna leverages advanced machine learning algorithms to automatically identify and correct anomalies, trends, and patterns in data. This level of automation means that Digna can efficiently process large volumes of data without human intervention, erasing errors and increasing the speed of data analysis.

The system’s ability to detect subtle and complex patterns goes beyond traditional data analysis methods. It can uncover insights that would typically be missed, thus providing a more comprehensive understanding of the data.

This feature is particularly useful for organizations dealing with dynamic and evolving data sets, where new trends and patterns can emerge rapidly.

Domain Agnostic

Digna’s domain-agnostic approach means it is versatile and adaptable across various industries, such as finance, healthcare, and telcos. This versatility is essential for organizations that operate in multiple domains or those that deal with diverse data types.

The platform is designed to understand and integrate the unique characteristics and nuances of different industry data, ensuring that the analysis is relevant and accurate for each specific domain.

This adaptability is crucial for maintaining accuracy and relevance in data analysis, especially in industries with unique data structures or regulatory requirements.

Data Privacy

In today’s world, where data privacy is paramount, Digna places a strong emphasis on ensuring that data quality initiatives are compliant with the latest data protection regulations.

The platform uses state-of-the-art security measures to safeguard sensitive information, ensuring that data is handled responsibly and ethically.

Digna’s commitment to data privacy means that organizations can trust the platform to manage their data without compromising on compliance or risking data breaches.

Built to Scale

Digna is designed to be scalable, accommodating the evolving needs of businesses ranging from startups to large enterprises. This scalability ensures that as a company grows and its data infrastructure becomes more complex, Digna can continue to provide effective data quality management.

The platform’s ability to scale helps organizations maintain sustainable and reliable data practices throughout their growth, avoiding the need for frequent system changes or upgrades.

Scalability is crucial for long-term data management strategies, especially for organizations that anticipate rapid growth or significant changes in their data needs.

Real-time Radar

With Digna’s real-time monitoring capabilities, data issues are identified and addressed immediately. This prompt response prevents minor issues from escalating into major problems, thus maintaining the integrity of the decision-making process.

Real-time monitoring is particularly beneficial in fast-paced environments where data-driven decisions need to be made quickly and accurately.

This feature ensures that organizations always have the most current and accurate data at their disposal, enabling them to make informed decisions swiftly.

Choose Your Installation

Digna offers flexible deployment options, allowing organizations to choose between cloud-based or on-premises installations. This flexibility is key for organizations with specific needs or constraints related to data security and IT infrastructure.

Cloud deployment can offer benefits like reduced IT overhead, scalability, and accessibility, while on-premises installation can provide enhanced control and security for sensitive data.

This choice enables organizations to align their data quality initiatives with their broader IT and security strategies, ensuring a seamless integration into their existing systems.

Conclusion

Addressing data quality challenges in data warehouses, lakes, and lakehouses requires a multifaceted approach. It involves the integration of cutting-edge technology like AI-powered tools, robust data governance, regular audits, and a culture that values data quality.

Digna is not just a solution; it’s a revolution in data quality management. It’s an intelligent, intuitive, and indispensable tool that turns data challenges into opportunities.

I’m not just proud of what we’ve created at DEXT.AI; I’m most excited about the potential it holds for businesses worldwide. Join us on this journey, schedule a call with us, and let Digna transform your data into a reliable asset that drives growth and efficiency.

Cheers to modern data quality at scale with Digna!

This article was written by Marcin Chudeusz, CEO and Co-Founder of DEXT.AI.  a company specializing in creating Artificial Intelligence-powered Software for Data Platforms. Our first product, Digna offers cutting-edge solutions through the power of AI to modern data quality issues.

Contact me to discover how Digna can revolutionize your approach to data quality and kickstart your journey to data excellence.

Modern Data Quality: Navigating the Landscape

Today’s guest blog post is from Marcin Chudeusz of DEXT.AI. a company specializing in creating Artificial Intelligence-powered Software for Data Platforms.

Data quality isn’t just a technical issue; it’s a journey full of challenges that can affect not only the operational efficiency of an organization but also its morale. As an experienced data warehouse consultant, my journey through the data landscape has been marked with groundbreaking achievements and formidable challenges. The latter, particularly in the realm of data quality in some of the most data-intensive industries: banks, and telcos, have given me profound insights into the intricacies of data management. My story isn’t unique in data analytics, but it highlights the evolution necessary for businesses to thrive in the modern data environment.

Let me share with you a part of my story that has shaped my perspective on the importance of robust data quality solutions.

The Daily Battles with Data Quality

In the intricate data environments of banks and telcos, where I spent much of my professional life, data quality issues were not just frequent; they were the norm.

The Never-Ending Cycle of Reloads

Each morning would start with the hope that our overnight data loads had gone smoothly, only to find that yet again, data discrepancies necessitated numerous reloads, consuming precious time and resources. Reloads were not just a technical nuisance; they were symptomatic of deeper data quality issues that needed immediate attention.

Delayed Reports and Dwindling Trust in Data

Nothing diminishes trust in a data team like the infamous phrase “The report will be delayed due to data quality issues.” Stakeholders don’t necessarily understand the intricacies of what goes wrong—they just see repeated failures. With every delay, the IT team’s credibility took a hit.

Team Conflicts: Whose Mistake Is It Anyway?

Data issues often sparked conflicts within teams. The blame game became a routine. Was it the fault of the data engineers, the analysts, or an external data source? This endless search for a scapegoat created a toxic atmosphere that hampered productivity and satisfaction.

Read: Why Data Issues Continue to Create Conflicts and How to Improve Data Quality.

The Drag of Morale

Data quality issues aren’t just a technical problem; they’re a people problem. The complexity of these problems meant long hours, tedious work, and a general sense of frustration pervading the team. The frustration and difficulty in resolving these issues created a bad atmosphere and made the job thankless and annoying.

Decisions Built on Quicksand

Imagine making decisions that could influence millions in revenue based on faulty reports. We found ourselves in this precarious position more often than I care to admit. Discovering data issues late meant that critical business decisions were sometimes made on unstable foundations.

High Turnover: A Symptom of Data Discontent

The relentless cycle of addressing data quality issues began to wear down even the most dedicated team members. The job was not satisfying, leading to high turnover rates. It wasn’t just about losing employees; it was about losing institutional knowledge, which often exacerbated the very issues we were trying to solve.

The Domino Effect of Data Inaccuracies

Metrics are the lifeblood of decision-making, and in the banking and telecom sectors, year-to-month and year-to-date metrics are crucial. A single day’s worth of bad data could trigger a domino effect, necessitating recalculations that spanned back days, sometimes weeks. This was not just time-consuming—it was a drain on resources amongst other consequences of poor data quality.

The Manual Approach to Data Quality Validation Rules

As an experienced data warehouse consultant, I initially tried to address these issues through the manual definition of validation rules. We believed that creating a comprehensive set of rules to validate data at every stage of the data pipeline would be the solution. However, this approach proved to be unsustainable and ineffective in the long run.

The problem with manual rule definition was its inherent inflexibility and inability to adapt to the constantly evolving data landscape. It was a static solution in a dynamic world. As new data sources, data transformations, and data requirements emerged, our manual rules were always a step behind, and keeping the rules up-to-date and relevant became an arduous and never-ending task.

Moreover, as the volume of data grew, manually defined rules could not keep pace with the sheer amount of data being processed. This often resulted in false positives and negatives, requiring extensive human intervention to sort out the issues. The cost and time involved in maintaining and refining these rules soon became untenable.

Comparison between Human, Rule, and AI-based Anomaly Detection
Comparison between Human, Rule, and AI-based Anomaly Detection

Embracing Automation: The Path Forward

This realization was the catalyst for the foundation of dext.ai. Danijel (Co-founder at Dext.ai) and I combined our AI and IT Know-How to create AI-powered software for Data Warehouses. This led to our first product Digna, we needed intelligent, automated systems that could adapt, learn, and preemptively address data quality issues before they escalated. By employing machine learning and automation, we could move from reactive to proactive, from guesswork to precision.

Automated data quality tools don’t just catch errors—they anticipate them. They adapt to the ever-changing data landscape, ensuring that the data warehouse is not just a repository of information, but a dependable asset for the organization.

Today, we’re pioneering the automation of data quality to help businesses navigate the data quality landscape with confidence. We’re not just solving technical issues; we’re transforming organizational cultures. No more blame games, no more relentless cycles of reloads—just clean, reliable data that businesses can trust.

In the end, navigating the data quality landscape isn’t just about overcoming technical challenges; it’s about setting the foundation for a more insightful, efficient, and harmonious future. This is the lesson my journey has taught me, and it is the mission that drives us forward at dext.ai.

This article was written by Marcin Chudeusz, CEO and Co-Founder of DEXT.AI.  a company specializing in creating Artificial Intelligence-powered Software for Data Platforms. Our first product, Digna offers cutting-edge solutions through the power of AI to modern data quality issues. 

Contact us to discover how Digna can revolutionize your approach to data quality and kickstart your journey to data excellence.

What You Should Know About Master Data Management

Today’s guest blog post is from Benjamin Cutler of Winpure. In here Benjamin goes through a few things that you in a nutshell should know about master data management.

People

People have multiple phone numbers and multiple email addresses and in 2022 there must be several decades of historic contact information available for any one person. Most of us move at least once, every few years. Sometimes we go by different nicknames in different situations, some people even change their names. We hold different titles throughout the course of our careers and we change companies every few years. Only a few people in our lives know exactly how to get a hold of us, at any given time. Many of us change vehicles just as often as we change our hair color. Many of us are employees, most of us are also customers, many of us are spouses and sometimes we are grandparents, parents, aunts, uncles, and children at the same time. Sometimes we’re out enjoying ourselves and sometimes we just want to be left alone. We each have unique interests and desires, but we also have many things in common with other groups of people.

Products

Products have many different descriptions, they come in many different variations, different sizes, different colors, and different packaging materials. Similar products are often manufactured by different manufacturers, and they can be purchased from many different commercial outlets, at different price points. Any one product on the market at any one time will likely be available in several variations, but that product will also likely change over time as the manufacturer makes improvements. Products can be purchased therefore they can also be sold. They can also be returned or resold to other buyers, so there are different conditions and ways to determine product value. There are SKU and UPC numbers and other official product identification and categorization systems including UNSPSC and others, but none of them speak the same language.

Companies

Companies are made up of many different people who come and go over time. The company may change names or change ownership. It may have multiple locations which means multiple addresses and phone numbers, and they probably offer many different ways to contact them. Depending on where you look, there are probably more than a dozen different ways to find their contact information, but only some of those company listings will be correct. Companies have tax IDs and Employer IDs and DUNS IDs in the US, and there are many different systems worldwide.

Addresses

Addresses are the systems we use to identify locations. Each country and territory has its own system so each system is different. In the US we use premise numbers, street names with and without street prefixes and suffixes, we use unit numbers, states, counties, cities, towns and 5 and 9 digital numerical postal codes. Addresses and address systems can change over time, and they are inherently one of the most inconsistent forms of identification. Addresses are usually riddled with errors, misspellings, different structures and formatting, and they can be very difficult to work with. What makes this even more difficult is that the same address represented in multiple internal business systems will often be represented differently, and will rarely match the way the same address is represented externally.

Data

Data is a digital description of all of these things. Data usually comes in columns and rows and all shapes and sizes. Data about these things is captured, stored in business systems and it’s used to get work done. Need to call a contact? Check your contact data. Need to know a company’s billing address? Check your company data. Need to know something about a product? Check your product information. Need to know something about where your customers live and work or where to deliver the product? Check your address information. But here’s the thing: the information rarely matches from system to system and it’s very hard to keep up to date. This is especially difficult for a few reasons. Internally your company probably has many different business systems and many different ways of storing and representing these things, so it rarely matches internally, plus, the way that your company stores and represents this information will almost never match external information. How can you know the best way to contact your customer who has multiple phone numbers and multiple email addresses? If you’re searching some external system for updated information about some product or contact and the information doesn’t match, how do you find the new information? How can you know if your own information is correct and up to date? How can you scale your efforts to communicate with hundreds or thousands of customers at a time, communicating information that is specifically relevant for each of them? If the information doesn’t match or is not correct, how can you know who is who?

Relationships

The relationships across people, other groups of people, products, other groups of products, companies, other groups of companies, addresses, and other addresses, is where the rubber hits the road. Business value comes from connecting companies and products or services with other people and companies, and other products and services, at scale. Customers purchasing products might be interested in purchasing related products. Customers often buy things based on location. Companies selling to customers might be able to sell more, if they target similar customers in similar locations. Products and services also sell well based on location, and companies can optimize sales territories and delivery routes based on the relative proximity to other locations.

People and Technology

The people and technology between all of this, finds it difficult to keep up. People do things one by one and we’re good with ambiguity. We program computers and business systems to do things faster. Computers do things programmatically and very quickly but they’re not good with ambiguity. People can see similarity between things that are similar, but computers and business systems cannot. People might be good with troubleshooting and critical thinking, but computers and business systems are not. A computer program might be able to find the same customer in multiple systems and might be able to update that customer’s information all at once, but how can you know if the new information is the best information? Knowing that your customer probably has multiple phone numbers and multiple addresses and multiple nicknames, how can you know which information is correct? Doing this at scale can be very, very difficult.

In Conclusion

Master Data Management is very difficult but it’s fundamental in scaling your business. People can sell products door-to-door, but data and technology allow us to market, sell, deliver, and service our products and services, to tens and hundreds of thousands of people in milliseconds, regardless of the distance. Most organizations still view data as a cost of doing business but with the right investments in people, process, technology, and in data management, we can scale as worldwide organizations.

Popular Entries on The Resource List

This site has a list of white papers, ebooks, reports and webinars from solution and service providers.

The aim is to give inspiration for organizations having the quest to implement or upgrade their Master Data Management (MDM), Product Information Management (PIM) and/or Data Quality Management (DQM) capability.

The list has now been online in a month and it is time to look at which entries that until now have been the most popular in terms of click through. These are:

ROI of MDM, PIM and DQM

Exploring The ROI of PIM and MDMHave you ever wondered how to effectively evaluate the return on investment (ROI) of a Product Information Management (PIM) and Master Data Management (MDM) implementation? Then, take a look at some real-life examples. Download the Enterworks ebook on Exploring The ROI of PIM and MDM.

MDM, PIM and DQM market overview

The State of Product Information Management 2020Get an overview of why PIM solutions are implemented in more and more organizations, which capabilities a 2020 PIM solution needs to cover, where the market is heading and who the PIM vendors in the market are and how this affect your purchase of PIM. Download the Dynamicweb PIM white paper The State of Product Information Management 2020. 

MDM, PIM and DQM implementation

virtual-conference-webcast-revConferences cancelled? Stuck working from home? Bring the conferences to you with an virtual MDM conference. Don’t miss this must see 6 week live webcast series and hear what other companies are doing in the world of MDM along with best practices and workshops by industry experts.. Register for this Enterworks webcast series at the Everything Master Data Management (MDM) Virtual Conference.

Extended MDM

Intelligent Data Hub - Taking MDM to the Next LevelMDM solutions have been instrumental in solving core data quality issues in a traditional way, focusing primarily on simple master data entities such as customer or product. Organizations now face new challenges with broader and deeper data requirements to succeed in their digital transformation. Help your organization through a successful digital transformation while taking your MDM initiative to the next level. Download the Semarchy white paper Intelligent Data Hub – Taking MDM to the Next Level.

Data Quality

4 Keys to Unlocking Data Quality with MDMBusinesses today face a rapidly growing mountain of content and data. Mastering this content can unlock a whole new level of Business Intelligence for your organization and impact a range of data analytics. It’s also crucial for operational excellence and digital transformation. Download the 1WorldSync and Enterworks ebook on 4 Keys to Unlocking Data Quality with MDM.

Next To Come

More resources from solution and service vendors are on the way. Additionally, there will also be a Case Story List with success stories from various industries. Stay tuned.

If you have comments, suggestions and/or entries to be posted (yes, there is a very modest fee), then get in touch here:

 

Interview with FX Nicolas, VP of Products at Semarchy

This site is a presentation of the best available Master Data Management (MDM), Product Information Management (PIM) and Data Quality Management (DQM) solutions. However, behind the technology there are people who is working hard to bring the best tools to live, break into the market and cover new land.

Semarchy was one of the first disruptive MDM solutions to join the list and FX was one of the first employees to join Semarchy.

FX NicolasFX, what was your path into Master Data Management?

It started with Data Integration. Back in 2000, we designed a data integration product called “Sunopsis”. It was the first solution to use an E-LT architecture. In fact, we created that acronym as a “smart data geek” joke, and it is now an established marketing buzzword. The product was acquired by Oracle and is still on the market under the Oracle Data Integrator name. I was Product Manager for Sunopsis and Oracle Data Integrator. As such I was exposed to the challenges of integration (real-time vs. batch, EAI vs. ETL, data quality, performance, etc.). Governance and MDM were not yet trendy terms, but we had the idea of a technical “Active Data Hub”, managed at the integration layer, to share high quality data between systems.

Semarchy was founded in 2011. How did the MDM market look like then?

A vast plain with two large circles of massive monoliths, namely the Customer Data Hub (CDH) and Product Information Management (PIM) verticals. Some of them were ironically sold by the same vendors who had failed to properly manage customer and product data in operational systems (CRMs, ERPs, etc.). When we looked at the market, we realized that there was room for domain-agnostic platforms to support customer, product, and every other domain.

The market was also a graveyard of failed projects. The reasons why so many projects ended up there were suspected (gigantic project scopes with insanely large timeframes, lack of agility and business involvement, etc.) but never clearly stated.

You have been in the forefront of introducing Semarchy. What have been the most difficult challenges in breaking into the market?

Building the platform was a challenge, but education was the hardest part. When you tell people that you know why they have failed or will fail, and you have a solution for a better outcome, they are not really willing to listen. We had to educate people with sound messages: “Yes sir, data quality is part of master data management”, “Start small and grow your initiative”, “Involve the business all along”.

Another challenge was the multiple number of shiny new trends and buzzwords popping up in the data space. Big data, cloud, graph, digital transformation, and now AI? Pick your favorite! Data management, governance, quality or workflows look very dull in comparison. The good thing is good practitioners know that these are prerequisites to get things done correctly.

Now that Semarchy has become an established player on the MDM market, what is the next move?

Since our inception, we’ve always believed in a single platform to solve all master data management issues. This all-in-one solution is still a dream for most companies, who struggle with four or five tools to manage their data. We are now ahead of that with our next move: extending our platform to be the end-to-end Intelligent Data Hub™. This includes new capabilities such as:

  • Data Discovery: Profiling data sources and learning about existing critical data assets.
  • Integration of any applications and leveraging any data source or service to enhance the enterprise data.
  • Governing the data hub by defining and enforcing business terms, processes, rules, policies, etc.
  • Managing data using apps designed for data champions and business users, with built-in data quality, match and merge, workflows, generated from the governance definitions and decisions.
  • Measuring the efficiency of the operations and the relevance of the governance choices using dashboards, KPIs and metrics based on data from the data hub or from external data sources.

Can you tell something more about how the Intelligent Data Hub is extending the MDM concept?

MDM is mainly about managing the domains core data assets (reference, customer, product, etc..) with data quality, match/merge and stewardship workflows. The data hub extends this idea in multiple directions:

  • It extends the scope of data available via the data hub beyond core master data, for example by eventually including transactional data and interactions to provide 360° views.
  • It takes an end-to-end approach for the data management initiative: from data governance, data onboarding with data discovery, profiling and cataloguing, down to the assessment of the value delivered with dashboards and KPIs.
  • It transparently opens the initiative to the whole enterprise. All business users become full members of the initiative via the data governance, data management and measurement channels. In short, the Intelligent Data Hub transforms every stakeholder in the organization into a data champion.

If you should have done something differently in Semarchy’s route to where you are now, what would that have been?

Go to the cloud from the beginning! Our platform is now available on major cloud platforms. If I had to do it again, I would have shipped the first version on premises *and* in the cloud.