ETL Training

ETL Architecture and Design

This 2-day course is presented by industry leader and co-author of the ETL Toolkit Joe Caserta covering ETL Architecture and Design. This course features a new Big Data ETL module. Learn how to manage structured and unstructured data in a big data ecosystem.

The data warehouse is best known for taking disparate and disordered transaction data and presenting it in a cohesive, orderly way for the business to make intelligent, fact-based decisions. The data warehouse ETL designer is charged with the task of applying a set of consistent techniques for delivering conformed dimensional data. Precisely designing and building reusable processes to extract, clean, conform and deliver dimensional data is the foundation for a successful, reduced cost, data warehouse implementation.

Objectives

This course teaches the practical detailed steps of data warehouse ETL including extracting, cleaning, conforming, and delivering data and its associated metadata. You’ll learn the design and development, architecture, operations and management aspects of the ETL process. Upon completion you’ll have learned each of the steps, and sub-steps, required to successfully obtain, prepare and publish data in a dimensional data warehouse.

Audience

This course is for anyone interested in learning the latest ETL techniques for designing and managing data warehouses and ETL processes. Beginner, intermediate and experienced data warehouse practitioners, data architects, DBA’s and ETL designers & developers will all benefit from this course.

Content

This workshop offers an in-depth understanding of extract, transform and load (ETL) techniques essential for building dimensional data warehouses. It focuses on proven methods and best practices to successfully implement, manage and maintain the most challenging task of any data warehouse project – the ETL.

Functional Practices

  • Planning and designing your ETL system
  • Choosing the appropriate architecture
  • Managing the implementation
  • Managing the day to day operations
  • Building the development/test/production suite of ETL processes
  • Building a data cleaning subsystem
  • Understanding the tradeoffs of various staging data structures, including flat files, normalized schemas, XML, and dimensional schemas
  • Analyzing and extracting source data
  • Creating the logical data mapping

Technical Practices

  • Structuring the data into dimensional schemas for the most effective delivery to end users
  • Conforming heterogeneous data from multiple sources into standardized dimension tables and fact tables
  • Building ETL modules for handling the three distinct types of slowly changing dimensions (SCDs)
  • Building ETL modules for multi-valued dimensions and hierarchical dimensions
  • Running high-performance surrogate key pipelines
  • Loading the three fundamental fact table grains – transaction, periodic snapshot and accumulating snapshot
  • Handling late arriving dimensions and facts
  • Optimizing ETL processes to fit into highly constrained load windows
  • Structuring and presenting metadata
  • Converting batch and file-oriented processes into continuously streaming real-time ETL systems
  • Managing structured and unstructured data in a big data ecosystem

Materials

In addition to the course material all students receive a copy of Ralph Kimball and Joe Caserta’s book, The Data Warehouse ETL Toolkit (Wiley, 2004).

Instructor

The ETL Training course is designed and presented by Joe Caserta