Kimball indicated in 2009 that, “In today’s environment, most organizations should use a vendor-supplied ETL tool as a general rule.” Most organization now recognize the ROI of commercial ETL tools far surpasses that of hand coding.
This course applies Kimball’s reasoning to big data, allowing one to see why commercial Big Data Fabrics surpass hand coding for big data jobs.
The Cost of Hand Coding
Our discussion starts with common statistics on the negative impacts of hand coding. This sets the stage for the importance of commercial tools that support continuous integration and continuous development in a big data environment.
Big Data Fabric Taxonomy
Learners are introduced to the common components and architecture of a Big Data Fabric. Hands on interaction with data sets will help solidify the connection of these terms with solving big data problems.
Open Source and Commercial Tools
Review available open source options. Examine analyst reports on commercial Big Data Fabric tools, and understand what is being offered by some of the leading commercial vendors. Followed by a group discussion of their experiences.
Big Data Integration
Review of big data methods for dealing with batch and streaming data from course Data 107. Examine how these methods are implemented in hands on exercises using Talend.
Big Data Quality
Review of methods for dealing with data quality problems from course Data 77. Examine how these methods are implemented in a big data environment with hands on exercises using Talend.
Machine Learning with Big Data
Review of data science modeling methods, including some from course Data 99 and Data 98. Hands on exercises will deal with such data models available as components in Talend.
At the end of this program, learners will be able to:
- Adapt these approaches to recognize and deal effectively with common big data integration and big data quality problems.
- Implement existing machine learning components in a big data environment.
- Maximize the effectiveness of their big data platform by helping select and utilize a commercial big data fabric.