Course Description

The ability to work with big data can produce high business value capabilities (e.g. recommendation engines, predictive maintenance). Hadoop is a frequent enterprise choice for a big data platform. In this class we will look at on-premise Hadoop platforms to store structured and unstructured data as well as to run data science models.

Course Outline

  • What is Big Data
    • We review why big data is of interest and what features help identify big data.
  • Why Work with Big Data
    • Demo of Machine Learning to illustrate what is possible with big data.
  • Big Data Terminology
    • Learners are introduced to terms and concepts that makeup a big data platform. Hands on interaction with data sets will help solidify the connection of these terms with big data integration and data science jobs.
  • Open Source and Commercial Platforms
    • Review available open source options. Examine analyst reports on commercial big data platforms, review architectures to understand what is being offered by the leading commercial vendors. Followed by a group discussion of learner experiences with Hadoop.
  • Big Data Integration
    • Discuss hand coding and browser-based tools offered by Cloudera and MapR to integrate data onto one node virtual sandboxes. Hands on exercises will involve batch and real time integration of structured and unstructured data on both platforms.
  • Running in Spark
    • Further discussion on why we are interested in Spark (e.g. Spark jobs being up to 100x speed increase over traditional systems). Hands on exercises involving a couple introductory Spark jobs.

Learner Outcomes

At the end of this program, learners will be able to: 

  • Define in more detail Big Data, Hadoop, and situations where Hadoop is an appropriate tool.
  • Help select and utilize on-premise Hadoop platforms for big data work.
  • Utilize Cloudera and MapR tools for common big data integration.
  • Build basic Spark applications.


Predictive and/or Prescriptive course(s) will help in understanding the potential of Spark modeling; however, these are not required.


Enroll Now - Select a section to enroll in

Classroom: Instructor Led
M, T
5:30PM to 9:00PM
Oct 07, 2019 to Oct 08, 2019
Schedule and Location
Contact Hours
Delivery Options
Course Fee(s)
Tuition non-credit $695.00
Section Notes

Parking and refreshments are provided.

Enrollment Deadline is Monday, September 30, 2019 at 5 PM.  Beyond this date, please call 314-935-4444 to register.


A full refund will be given when a registrant cancels more than five business days prior to the start of the class.  Cancellations received within 5 business days of the start of the class and no-shows will be billed in full.  Another person may be substituted at any time at no additional charge. 

Required fields are indicated by .