Importance of Data Quality and Factors that Impact it
Note: This write up addresses importance of Data Quality in multiple areas within an organization from IT, OSS/BSS perspective. This text focuses just on Data Quality. It’s doesn’t address Data Availability and Data Security.
Data has always been important. Nowadays, with digitization and transformations in living style, data is becoming key factor to make decisions automatically on the fly. This availability of data and connectivity across multiple platforms brings efficiency and creativity in complete business flow.
A quick refresh of difference between data and information will help in better understanding. Data is collection of raw facts, figures, and statements. Once this data is organized, it becomes information. That means information is result of processed or organized data.
Data quality is key element in any organization irrespective of company size and business domain. It doesn’t matter if the company has thousands of employees or just few. Similarly it makes no effect if you are dealing in agriculture, education, financial or ICT domain. Of course, type of data would vary on basis of businesses but importance remains the same.
Did you ever imagine if the transaction data in banks is incorrect? What if you are not able to track your transactions and even your current balance is changed to zero? OR if you are a student and systems containing grades and results in your university gets some issues in data quality and you get wrong grades in your transcript.
Above examples are enough for us to realize the importance of data quality by now. Obviously that doesn’t happen normally because these organizations make sure that backups are taken and corrupted data is corrected automatically using multiple techniques/tools available in market.
Systems capabilities and functionalities are important, but having correct data is the key element. Bringing best system in the world and putting wrong data in it will not bring any fruitful results.
Below are some major factors that may impact Data Quality along with their solutions.
Data is updated in every business. It can be entered via multiple interfaces like web-forms, web-APIs or middleware. Any changes in these may impact data structure. If any changes to these interfaces are made without any formal intimation to other teams/departments, there can be a huge gap in data structure over time.
Frequent analysis (manual or automatic) can also be very useful to address this concern. We suggest a weekly / monthly quality check of data to make sure that data in organization has good quality. There are many tools available, which would help to identify and make sure that data is consistent as per defined business cases.
Data quality is badly impacted if there are not enough validations applied in any business. Let’s assume if someone puts phone number in email field and there is no proper validation. Missing data validation is also one of main factors because of which dynamic websites are breached.
There are many tools/libraries to make sure that provided data is validated before putting it in databases.
Change in data in inevitable. If these changes are not properly recorded or managed, they may impact data quality badly. This happens when there are no formal processes defined within an organization.
Best solution to control, validate and maintain quality of data is introduction of business processes in organizations. Processes ensure that all steps are taken well to ensure data consistency. Any changes or omission of data are recording and managed through business processes which increases data quality and results in business growth.
Responsibility is another main factor in data quality. Many organizations focus on just few resources dedicated for data warehouse and its maintenance. Those resources are just concerned about data availability, but without knowing business details they might not be able to verify the quality and consistency of data.
Different team members should also be held responsible to own that data from respective technology or domain. This not just creates a sense of responsibility but also helps in keeping the data streamlined.
Visibility of Data
Practically speaking data is not something which only impacts one domain, department or team; output data from one team will be input for another team. But in many cases it has been observed that one team doesn’t provide other team enough data which can impact performance of that team or may lead to wrong decision, which will have effect on overall business.
Making data visible to other departments / teams also enhances data quality. If there are some issues within data, one team can highlight and others can fix those data discrepancies. Though there could be some sensitive data related to salaries of employees, bank and other details. Security of such data also needs to be considered as per industry best practices.
Changing Business Requirements
Business requirements changes from time to time as per needs. These changes impact systems, processes and data. Changing data structure to cater new business requirements may cause data inconsistency with previous data.
These can be handled with mature system design.
System design can also be considered as huge stake holder in quality of data. You must be wondering, how? If system is hard-coded or badly designed it’ll impact not just system but also the data health.
During system design all requirements should be closely analyzed. It should be considered that system needs to be flexible which may support future business cases without major impact on data structure and existing data.
Quality of data is a prominent item to ensure strong, mature and smart business. It will not be incorrect to say that
“having wrong data could be worse than having no data at all.”
Comments and Suggestions
Did we missed some other factors that may impact data quality? Please feel free to give suggestion and comments.