Who is Responsible for Quality Data?
Data is growing more crucial to business logic than ever before. According to Salesforce, dirty data costs businesses about $700 billion a year. This is why it's integral that organizations maintain the quality of the data they provide.
Clean, quality data that provides actionable results is great for keeping a business working efficiently. But we need to ask the question of who's responsible for making sure data is clean.
What is Data Quality?
Or more importantly to start, what's NOT data quality? What we are NOT talking about are the security concepts around data known as the CIA triad. Covering confidentiality (data is protected from unauthorized access), integrity (data is trustworthy and cannot be unintentionally altered), and availability (data is, well, available, regardless of demand on the network and protected by a disaster recovery plan).
Our question surrounding data quality is based on the nitty-gritty around what data looks like as employees work with it day in and day out. More generally, as Wikipedia puts it, "data is deemed of high quality if it correctly represents the real-world construct to which it refers."
For example, do all shipping addresses have both a house number and street? If not, that's dirty data. Does an inventory list correctly match what's actually in the warehouse? If not, that's dirty data. Or do the sales leads have every required field? If not, then that data is dirty.
Dirty data means that things can go wrong. A package could be returned as undeliverable on a half address. Inventory could run dry during a busy season. Or salespeople could be stuck with names but no phone numbers to cold call. Maybe dirty data isn't all that bad after all…
Bottom line, bad data equals wasted time, frustrated customers — and potentially bad decisions.
Who Gets Data Dirty?
Surely we can pin this on one team or another, right? Someone must take the fall for all this dirty data. Certainly a single team alone can the quality of our data firmly into the ground, right? Spoiler alert. Dirty data is often a team failure — from the salespeople who handle entering data to the developers who build systems that validate data.
As data flows through a company, every touchpoint has the potential to corrupt data. Let's look at every team who touches your team's data:
Are the marketers and salespeople responsible? The point of data entry is a substantial sore spot for dirty data. Experian conducted a study that found human Kill influencing over 60 percent of data quality issues. Salespeople and marketers need to be careful about entering the data they're scavenging.
Items that are outdated, incomplete, or even self-reported can cloud what could be a clear picture. That's a lot of misspelled names, wrong zip codes, bounced emails, and misdirected ad campaigns. Dirty data can cost an organization a lot of opportunities and revenue.
Are the analysts responsible? An analyst interprets data and turns it into information which can offer ways to improve a business. Data analysts should be able to report what's been found in a study that's understood by the entire business and relevant colleagues.
But a piece of getting this job done is through data modeling, which, if done poorly, can throw a big wrench in business projects, goals, and processes. As more companies automate the process of dealing with data, it's important for analysts to question the results being pulled. If the wrong decision is based off bad data or a generated assumption, the results might not be pretty.
Are the devs responsible? Certainly, devs are never to blame. After all, look at all the people who touched the data. But that's not the case. Having clean data requires a clean tool to gather said data. Every contact you enter should have a valid email address — that's a no-brainer. You shouldn't be able to create an opportunity for the 19th calendar year, only the 2,019th one. If your devs are letting these dumb instances slip through an otherwise-ironclad CRM, that's where your problem lies.
Are the DBAs responsible? Database admins should be the most responsible for keeping the data committed to the database clean, right? At the very least, a good DBA has the expertise to implement data analytic processes to look for duplicate records and reconcile them.
Between keeping data entry between fields consistent and ensuring proper team members can access specific data, DBAs really are the key holders of the datum castle. Ask any DBA though and they'll probably be quick to quote some profound maxim like "garbage in, garbage out."
Consider Your Org as a Whole
So who's to blame? The other guy? Nope, most everyone involved with your data dirties it up. Everyone's a stakeholder in this game.
Unfortunately, the longer you ignore the data problem, the worse it will be to clean up. Even if your data going forward is squeaky clean, there's still a pile of dirty data to deal with.
Some companies will apply raw manpower to the problem: A team of temps or interns to go through it line by line and scrub. They might even consider using a software service that applies machine learning to find trends and patterns then clean up outliers. These are sometimes monumental — or impossible — undertakings. More often, companies start from scratch or put an asterisk beside the suspect bucket of data.
Ultimately, the cleanliness of your data can make or break the company. So it might be wise to hire someone who can keep your team in line when it comes to data. Data quality engineers are responsible for coordinating parties, testing and deploying procedures that promote data quality, and working with ETL tools to ensure data meets established specs.
Data engineers are also in charge of both implementing and understanding of data governance standards. It's a super specific, super hairy job — but could give your organization some clarity.
Clean Up Your Act
Everyone's both the problem and the solution. Experts can help, but entropy will tend to take over given enough time. It's going to be a constant battle for those that choose to take a stand against dirty data, but it's in the company's best interests for everyone to put in the extra effort. Everyone stands to benefit from reduced frustration, stress, and roadblocks.