Why Data Quality is key to MSME Digital Transformation?


Data Quality

Introduction of Data Quality

Data Quality helps in building MSME unit’s Digital Transformation strategy.

As well as it helps in driving MSME organization’s growth.

Moreover it acts as a catalyst for the MSME to become future ready.

Further the Data Quality journey does not end with formulation of

recommendations.

In fact it closes the loop by providing feedback to the system for improving the

overall Data Quality.

My name is P. K. Saxena and I am CTO of Sofcon Systems India Pvt. Ltd.

Sofcon Systems India is having 27+ years of experience in the field of Automation

Solutions.

And now it has started helping their clients in their migration path to Industry 4.0.

I have taken reference of a White Paper in this article.

This white paper has been written by Dinesh Mohata (TCS) and his two

colleagues.

I have given their details in Resources section at the end of this article.

In fact Dinesh has described about a 12-point formula in his whitepaper.

Moreover it helps assess the quality of a data set from its raw form to the final

output.

Challenge posed by Data

It is a known fact that Data is a key growth enabler in today’s fast-paced Digital

Transformation journey of a MSME unit.

As a result we also know that information is accessible to all in the current digital

era.

As we pump data into systems every second.

Thereby it causes generations of Petabytes of data every day.

Consequently this large amount of data brings considerable challenges in

maintaining Data Quality.

There goes a saying-

 “If you can’t measure it, you can’t manage it”.

I am giving a quotation by late William Edwards Deming, an American statistician

known as the guru of quality control.

Therefore just having data is not enough.

Basically we need quality data.

But it poses a great challenge in front of us.

Let us take a deep dive in this article and see how we can overcome it.

Assessment of Data Quality

Of course assessing data quality is important.

But how can we assess Data Quality?

Also during this assessment we need to consider the impact of technical and

business attributes of data.

In particular we can assign with our data:

  1. Consistency /
  2. Accuracy /
  3. Completeness /
  4. Timeliness /
  5. Relevance / and
  6. Business implication for MSME organization.

Therefore, we need to take a holistic view of data to arrive at recommendations to

improve Data Quality.

Indeed we can use this amalgamation of technical and business attributes of data

to arrive at a Data Quality score.

Why do we need it?

Generally data tells a story in a MSME ecosystem.

Moreover the story of the lifecycle of a tuple or the data record undergoes

multiple transformations and enhancements in the ERP system.

Before we move forward let us explore about tuple.

What is tuple?

In computer science, a tuple is an ordered, finite sequence of elements.

Generally Tuples are used to group together related data.

Moreover they are similar to lists or arrays.

But they often have a fixed size and can hold elements of different data types.

Further the number of elements in a tuple is its arity.

Also Tuples are commonly used in programming languages.

As well as this term tuple is used in various contexts:

  • Mathematics:

In mathematics, a tuple is an ordered list of elements.

For example, a 3-tuple might look like (a, b, c), where “a,” “b,” and “c” are

elements of the tuple.

 In programming, we often use tuples to represent a collection of heterogeneous

(differently typed) elements.

Unlike lists or arrays, tuples are typically immutable, meaning and we can not

change their size and contents after creation.

Here is an example of a tuple in Python:

In this example, my_tuple is a tuple containing an integer, a string, and a

floating-point number.

  • Databases:

In the context of databases, a tuple is a row of a table, and it represents a single

record with multiple fields or attributes.

Importance of tuple

In short Tuples are versatile and find applications in various domains due to their

ability to represent ordered collections of elements.

Also we use tuples in different ways depending on the programming language or

context in which we employ them.

Basically tuple produces information which tells the user purpose of the said

dataset in the ERP system.

For example a classic case is the retail ERP system.

It maintains a transactional record of the stock in hand of an SKU (Stock Keeping

Unit) at any given instance.

In this place, this information helps in planning the replenishment of correct levels

of stock.

Moreover it allows buyers to go for on-time procurement.

Also it helps merchandisers to set the right price for their inventory.

Lastly it enables the supply chain manager to move goods across locations timely.

If we decode information properly then, the SKU data story helps to provide

answers of following 4 questions:

  1. What is the stock at the store at this point? /
  2. When was the inventory updated? /
  3. Where is it being kept? /
  4. How is the stock moving? /

But what happens if the data parameters in this journey get corrupted?

Will the data narrate the ‘correct’ story?

Will the planners ‘interpret’ it correctly?

And will they be able to ‘plan’ the inventory accurately?

Dinesh Mohata’s research in his whitepaper had found that the existing data

quality assessment tools talked only about the technical parameters.

Hence he commented in his white paper that there was a need to view data

holistically, incorporating both technical and business aspects.

In fact we should note that the Data Quality journey does not end with

recommendations.

Indeed it has to be a closed loop journey by providing feedback to the system.

So that it can improve the overall Data Quality.

How does it work?

Accordingly Dinesh Mohata in his whitepaper decided to interpret data quality in

following two dimensions –

  1. Technical attributes / and
  2. Business implications.

Technical attributes are Industry-led rules that focus on the five vital technical

parameters

  1. Consistency, /
  2. Accuracy, /
  3. Completeness, /
  4. Timeliness, / and
  5. Relevance.

While Business implications focus on Business Rule Violations (BRV).

The BRV measures the degree to which the given data deviates from the rule and

helps in determining the business impact on Data Quality.

Dinesh had used the novel approach of assigning applicability (TAI – Technical

Applicability Index) of a rule to the relevant data quality attributes

(e.g., consistency, accuracy, etc.).

Also he had derived the Data Quality Score (DQS) using TAI and BRV for

the given rule.

The Algorithm

Dinesh had devised a 12-point formula (see table below) for the data quality

assessment method.

Further he called his formula as the Data Quality Assessment and

Recommendations Tool (DQART). 

Key findings

To test the 12-point DQART formula, Dinesh applied it to the merchandising

system of a leading retailer.

In fact the objective of the exercise was to investigate the data quality of the

enterprise.

While the use case was specific to the retail industry.

Yet the foundational precepts of the formula was also relevant to other sectors.

Generally retail organizations depend on multiple systems to deliver goods from

the supplier to the customer.

Moreover items are the foundational building block for a retailer.

Therefore, data quality principles must be rigorously applied while creating an

item.

For instance, often, item descriptions carry special characters (%, ^, &,*, $, #),

which violate the item creation protocol.

Such item descriptions with deviations could have a ripple effect across the

system.

And it can cause delays in data processing.

As well as it can impact customer experience.

Further it can affect output of the decision support systems.

Generally retailers choose uniform prices across multiple differentiators such as

color, size, etc.

Moreover research shows that uniform pricing implementation across

differentiation gets compromised during item maintenance.

Also it impacts the data integrity of an item with multiple prices across different

variants and affects the overall customer experience.

Recommendations

From the above figure, Dinesh in his whitepaper could conclude that the

‘kids wear’ department was creating items with ‘special characters’.

As well as he found that it was violating a business rule of setting consistent

prices across multiple differentiation.

Hence he could conclude that to mitigate any future deviations, the IT team

should advise the department to avoid using special characters.

Moreover they should do the same across variants.

Alternatively, the IT team should eliminate all special symbols during

  • The initial upload /
  • Item creation process.

Conclusion

From the white paper written by Dinesh Mohata and his colleagues, we can

conclude that the journey of a data set is fascinating.

Dinesh could map in his whitepaper the journey of real data sets from their raw

form to the quality output in candid form.

Moreover description of this journey went through multiple insights of how data

quality could be improved.

His team measured data quality not only on

  • Accuracy, /
  • Consistency, /
  • Integrity, /
  • Timeliness, / and
  • Relevance

But also on the importance of Business violations of the multiple rules applied to

the data set.

His 12-point formula drilled down to the minutest details of the data across the

five critical Technical Attributes.

As well as it had put them in a flavor of the Business imperative of data.

Further the 12-point formula provided a detailed series of steps to assess the data

quality from a holistic point of view.

The recommendations highlighted the deviations across multiple data attributes.

Also they provided information on the leading practices to be followed.

Finally we can assert that this DQART tool can be deployed on-premises and on

the cloud.

Also we can build an AI/ML framework to implement the data recommendations

to create a closed-loop system.

So that it can point out the nuances such as subtle differences or distinctions

within a particular situation, concept, or expression.

As well as it and can cause further corrections in the system.

I hope you found this article useful.

Digital Prabhat

Resources:

A whitepaper by Dinesh Mohata / Monodip Chakravarty / Agniv Chakraborty


Leave a Reply

Your email address will not be published. Required fields are marked *

Verified by MonsterInsights