In the last article we discussed the different dimensions to measure data quality in.
There are no right and wrong measures to choose, the measures should fit the use case.
In this article we are looking at the holistic view of Data Quality.
In a simplified world there are four important steps to the data:
- Data Creation – Data is created in a system,that system can be a masterdata system handling descriptive data such as item information, customer information, etc. It can also be a transactional system such as a Point of Sales system where transactions are generated. We call this system a Source system.
- Data Integration – Data is sent from the Source system to other systems that wants to use the data. Preferrably in a one-to-many manner where the data is sent once from the Source system and received by many consuming systems.
- Data & Analytics Platform – Data is gathered in a data platform. It is modelled according to business rules and it is made available for analysis, models, ai, etc
- Executing system – Data is gathered from several Source systems and in some cases from the Data & Analytics platform and then an action is taken on that data. The action may be to generate an offer to a customer, create an order for logistics, create an invoice, etc. Something that has an effect in the business. In many cases a system plays a role as a source system as well as being an executing system. Thus sending and receiving data.
For any given dataflow this model works, since it has a beginning, a middle and an end. But when observed in totality it the data acts more like a spiderweb where everything is connected.
Back to the Data Quality!
When addressing data quality it is preferred to fix any data quality issues as early in the chain as possible. Especially since a fault in the source system may affect many different systems down the line.
However the data quality issues may be hard to find in the source system but rather show up when using the data together with data from other systems. Therefore it is common to have data quality checks in many places in the chain.
But the fixes should always be done as early in the chain as possible!
So, in summary. It is very important to take the whole picture into consideration rather than only focusing on your point in the chain.
In order to get there it is important to have good communication between the different teams to understand priorities, needs and limitations.