Enterprises, large or small, struggle to maintain the quality of ever-growing volumes of data required for smooth functioning. Data quality management does not mean only periodically sniffing and weeding out bad data. There is an inherent need for the good business sense to incorporate data quality as an integral component of streamlining and integration of processes. Outdated or incorrect data can lead to major blunders in business decisions.
Many strategies have been adopted by companies for efficient data quality management. A focused approach towards data governance and data management can have far-reaching benefits. A proactive approach towards controlling, monitoring, and driving data quality is the key, rather than reacting to data failures or addressing detected data anomalies.
Some of the key strategies are listed below:
Data’s main purpose is to fuel business. Rather than letting IT hold the reins of data quality, the business units that are the prime users of this data, are better equipped to define the data quality parameters. If business intelligence is closely linked with the underlying data, there are better chances of adopting effective methodologies that would focus on the business-critical data on priority.
These are roles carved out specifically to define the owners of data quality. The data stewards are the leaders who control data integrity in the system. It is imperative that the data stewards are selected from within the business units since they understand how data translates into the specific business needs of their group. By holding LOB’s accountable for data, there are better chances of good quality data generation at the source and in the scope of normal business conduct.
This group has representations from all business functions, data stakeholders, and IT. Data stewards could be closely linked with or be members of this board. The board ensures that similar approaches and policies as regards data quality are adopted company-wide and cuts horizontally through all functions of the organization. The board meets periodically to define new data quality targets, drives measurements, and analyzes the status of data quality within various business units.
Data within the company is a financial asset, so it makes sense to have checks and balances to ensure that the data entering the systems is of acceptable quality. Also, each time this data is retrieved or modified, it has a potential risk of losing its ‘accuracy’. Bad data can travel downstream and pollute subsequent data stores and hence impact business. Building an intelligent virtual firewall can ensure detection and blocking of bad data at the point where it enters the system. Corrupt data detected automatically by the firewall is either sent back to the original source for rectification, or possible adjustments made before letting it pass into the enterprise’s environment.
Data quality management is a cyclic process that involves logical step-by-step implementation. Such quantifiable steps can help in standardizing solid data management practices that can be deployed in incremental cycles to integrate higher levels of data quality techniques into the enterprise’s architecture.
The best practices are categorized in successive phases listed below:
This essentially means subjecting the company’s data stores to a detailed inspection, to be able to ascertain the data quality issues within its environment. An independent focused assessment of the quality of data is of prime importance to identify how poor quality data hampers the business goals. It provides a reference point to invest and plan in data quality improvements and also measure the outcomes of successive improvements.
The data assessment must be guided by an impact analysis of data on the business. The business-criticality of data must be an important parameter in defining the scope and priority of the data to be assessed. This top-down approach can be complemented by the bottom-up strategy of data profiling based assessment which can identify anomalies in data and then map these anomalies to the potential impact on business goals. This correlation provides a basis for the measurement of data quality and its linkage to impact on business.
This phase must be completed with a formal report which clearly lists down the findings. The report can be circulated amongst stakeholders, decision-makers and hence drive data quality improvement actions.
The result of the data assessment report helps to narrow down the scope to identify critical data elements. The attributes and dimensions for measuring the quality of such data, defining the units of the measurements, and laying down the acceptable thresholds for these metrics are the basis of implementing improvement processes. Attributes such as completeness, consistency, timeliness have been defined that act as an input to deciding the tools and techniques that should be deployed for achieving the desired levels of quality. Data validity rules are specified based on these metrics. This can help to press data controls into the functions that acquire or modify the data within the data lifecycle.
In turn, data quality scorecards and dashboards can be defined for each business unit derived from these metrics and their thresholds. These scores can be captured, stored, and periodically updated to monitor the improvement.
Focus on building the functionality takes precedence over data quality during any application development or system upgrade. The metrics defined above can be used to integrate data quality targets into the system development life cycle, inbuilt as mandatory requirements for each phase of the development. Data quality analysts need to identify the data requirements for each application. A thorough traversal of the data flow within each application gives insight into the probable insertion points for data inspection and control routines. These requirements must be added to the system’s functional requirements, for seamless incorporation into the development cycle, thus validating data at the point of introduction into the system.
Data that is shared between the data providers and consumers must be under contractual agreements that clearly define the acceptable levels of quality. The data metrics can be incorporated into these contracts in the form of performance SLA’s.
Defining data standards as well as commonly agreed data formats helps in the smooth flow of data from one business to another. The meta-data can be placed under a repository subjected to an active data center management that would ensure that data is represented in a fashion that is agreeable and beneficial to both the collaborating sides. The gap analysis and alignment of the business needs of both parties are done by this data control center. Data quality inspections can be done manually or through automated routines to ascertain the working levels. Workflows can be defined for periodically monitoring the data and taking remedial actions accordingly, based on the expectations of SLA targets and the specified actions if those SLA’s are not met.
When data is found to be below the expected levels, the remedial actions should be subjected to effective data quality tracking mechanisms much like the defect tracking systems in software development. Reporting data defects and tracking the actions thereon can help to feed performance reports. A root-cause analysis done on each reported data error gives direct feedback for understanding the flaws in the business's processes.
IN addition to the above, proactive data cleansing and process remedial cycles must be carried out from time to time, to identify and catch more data errors that may have been introduced in spite of strict quality controls.
Data quality can be ensured at peak or near-peak levels by engaging effective data management tools to facilitate and provide a sound framework to implement data quality measurement, monitoring, and subsequent improvements. The quality management solution selected must be one that closely aligns with the unique business objectives of the enterprise. Data quality goals and management plans need to be co-owned by producers, consumers, business application designers, developers, and business leads. Data quality, after all, is a joint responsibility. Having top-level data entry processes in place is essential for ensuring this.