In today’s data driven business environment, bad data can lead to bad decisions. As such, automating data cleansing is an essential step towards providing the clean and trusted data for timely business decisions.
This course helps individuals recognize, identify, and automate cleansing of quality problems.
The Cost of Bad Data
The course starts with specific horror stories and continues with common statistics on the impacts of bad data. The group discusses their own bad data stories. This sets the stage for the importance of data quality.
Data Quality Taxonomy
Learners are introduced to a taxonomy for understanding the type of data quality problems. Hands on interaction with data sets will help solidify the connection of these terms with common data quality problems.
Data Quality Metrics
Methods to measure data quality will be reviewed.
Open Source and Commercial Tools
Review available open source options. Examine analyst reports on commercial data quality tools, and understand what is being offered by some of the leading commercial vendors. Followed by a group discussion of their experiences.
Discussing profiling methods, and reviewing a profiling report will emphasize what can be discovered by data profiling. Utilizing a profile tool in a hands-on lab, learners will profile data and identify potential problems.
From basic standardization to data enhancement, we will first discuss methods for data cleansing and then implement some of these methods in a hands-on lab.
Monitoring Data Quality
The work of keeping data clean is in general a continual process. After discussing methods for tracking automated data cleansing routines, learners will implement a report for monitoring data quality in a hands-on lab.
At the end of this program, learners will be able to:
- Adapt these approaches to recognize and automate remediation of common data quality problems.
- Maximize the effectiveness of their data governance by helping select and utilize tools for data quality.