The dangers of dirty data — How to do a quick and easy spot-check

作为CPO,数据不是我的责任或优先事项” … Yes it is!

数据should be everyone’s responsibility, from the bottom of an organisation right to the very top. Currently, across many organisations, data is the responsibility of a person or department, and everyone trusts them to make sure the data is accurate.

But they are specialists in data, analytics and coding, not procurement. They don’t have the experience to know when DHL should be classified as a courier or warehousing/logistics services, or what is direct, indirect or tail spend is and its importance or priority.

They can apply the business rules you request, automate the process or follow your guidance but they don't know if the rules are working properly or not. That’s when the procurement team need to take up the baton and verify and QA the data. It’s also important that any errors are fed back to the data team so that rules/scripts/codes can be amended in order to avoid repeatedly correcting the same mistakes.

But what is dirty data?

它可以定义根据wh非常不同o you speak to. At its most basic level, dirty data is anything incorrect. In detail, it could be misspelt vendors, incorrect Invoice descriptions, missing product codes, lack of standard units of measure (e.g. later, l, litres), currency issues, duplicate invoices or incorrect/partially classified data. All very familiar to most who work in procurement.

肮脏数据的后果是什么?

There are a number of areas that could be affected by this, the most significant being reporting and decision making. You get regular dashboards from your team, and these are used to make decisions such as cost savings, supplier negotiations, supplier rationalisation or forecasting.

I often refer to the real-lifeIBM例如,有2.5万英镑的支出被归类为清洁。当它的价值较低时,它可能会滑到净值之下,但人们正在对这种错位的支出做出决定。

A more subtle example is DHL, which provides a range of services from postal and courier up to warehousing and distribution. If you’re a manufacturer and some of your warehousing spend is incorrectly classified as courier services, this could have a huge impact when looking at cost savings, supplier negotiations, monitoring contract compliance or forecasts for the year.

数据集越大,这些错误分类就越容易隐藏在显而易见的地方,除非有人意外发现它们,这可能会对需求规划、预算、销售、营销和财务决策等方面产生连锁反应。

技术implementation can also be affected. Data preparation or cleansing before the implementation of any new software or system is an area that’s often neglected, and by the time it’s discovered there are errors in the data, staff have lost faith in using the software, are disengaged, claim it doesn’t work, or they don’t trust it because “it’s wrong.”

At this point, it either costs a lot of money to fix and you have to hope staff will adopt the software again, or the project is abandoned. In either case, this can take months and cost tens of thousands in abandoned software or reparation work.

You might also be considering AI, some form of automation, or a third-party supplier that offers this service. As with technology implementation, this can potentially cause lots of problems. The datamust在用于任何类型的人工智能或自动化之前,必须进行清洁和准备。

想想IBM的例子,每个季度数据都会自动刷新,清洗分类为25000英镑,然后在下一个季度为75000英镑,只有当值变得重要时,才会有人注意到问题。到目前为止,有多少决策是基于这些错误信息做出的?

How can I fix this and ensure data accuracy?

There is no quick fix, magic button, or software that can resolve these issues. To improve data accuracy, the initial piece of work has to be done by a human – the automation or system/software implementation will inevitably fail without it. Get everyone at every level to engage and take responsibility for your organisation's data, and communicate/share when things are wrong and need amended.

Not an easy task, but if your team understands the impact the data they work on has within the organisation, and that it’s not just the responsibility of “Bob in the corner” or “The IT department” it makes all the difference.

Consistency is also extremely important. Define rules and processes, classification is very subjective and quite often there’s more than one right answer. As long as everyone’s working to the same standards, it’s much easier to change if it’s wrong later on.

Maintain your data. If it’s not maintained it will slowly become unusable over time. Either monthly or quarterly depending on the volume is recommended to keep on top of any issues, otherwise you’ll have to pay a large sum to fix the same issues all over again.

Spot check your data regularly, regardless of who you are. Using the guide below you can easily and quickly spot-check your organisation's data without any experience.

How to spot check your data

  1. 选择数据并创建透视表。选择“供应商名称”或“标准化”,如果可用,以及分类级别。
  2. Change the report layout to show in tabular form, this will list by supplier, by classification. From this you’ll be able to pick out any lines that stand out.
  3. If you have a supplier with a large number of rows, you can view it separately by copying the data into a new tab and creating a new pivot table from that.

Try it for yourself.

数据accuracy is an investment, not a cost. Address the issues at the beginning - while it might seem like a costly exercise, you will undoubtedly spend less than if you have a to resolve an issue further down the line with a time-consuming and costly data clean-up operation.

This post comes courtesy of Susan Walsh, aka The Classification Guru, find her在这里.

免责声明:本文中表达的观点是作者的观点。

Share on Procurious

First Voice

  1. Steph Shrader:

    这是一篇精彩的文章。我对这个很感兴趣,希望能在我目前的职位上得到帮助。我会通过LinkedIn联系。

Discuss this:

Your email address will not be published.

This site uses Akismet to reduce spam.了解如何处理评论数据.