Dirty data: What you need to do to clean up your data

Data has become an incredibly powerful and sought after tool for businesses, but how do you keep it clean once you have it?

The most powerful commodity of society in 2017 isn’t water or energy; information has, and always will, hold all the power and is sought after by every business in the world. Data can be used to create new leads, drive insights and initiate innovation and those who don’t respect its power will fall by the wayside.

However, businesses not only need to understand the value in information, they need to keep it clean and look after it.

New research from DMC Software found that only 43 per cent of businesses store customer info in a dedicated CRM, with five per cent of businesses keeping paper records, and 11 per cent use email software to store data, with few companies actually taking care of how they maintain data.

Learn five steps for cleaning dirty data

Dirty data is a factor that you must account for if you hope to reap the benefits of data driven decision making. Simple errors like duplicate order entries can greatly skew the results of investigations. Given the growing number of data sources and the new found concentration on using it to develop and drive business and marketing strategy, the need for clean data is becoming increasingly important. While there are some things you can do continuously to ensure that your data remains usable, like active daily maintenance, we will relay five things to do directly preceding any data analysis.

Where does it begin?

Before getting to methods for helping to clean up the information, it’s important to know some of the most common quality issues in systems:

Missing customer records like post code or addresses
Multiple representations
Data exists outside reasonable ranges ex: dates and currency
The same customer has multiple entries in the CRM

The path to clean info

If you are dealing with relatively small data sets and a small number of sources, it may be possible for the developer to use scripts or ETL tools to create a uniform view.

To complete the ETL process you will need to:

Identify and remove duplicates
Convert numbers to a consistent representation
Convert dates and times to a consistent representation
Remove case sensitivity, or make it consistent throughout
Normalise spelling for a given dictionary (i.e. U.S. English vs. British English)

Owen Gough

Owen Gough

Owen Gough is a reporter for SmallBusiness.co.uk. He has a background in small business marketing strategies and is responsible for writing content on subjects ranging from small business finance to technology...

Related Topics