Course Information
Course Overview
Build a production-ready Lakehouse on AWS (S3, Glue, Athena, Lake Formation) — plus Orchestration, Data Quality & AI.
Build portfolio-grade AWS Cloud projects that mirror real data teams.
This course is 100% hands-on. You’ll design and operate a production-style Data Lakehouse on AWS, enforce Data Governance with Lake Formation, stand up a Redshift Serverless warehouse with SCD2, run a Batch Ops simulation (break/fix/backfill), and prepare AI/ML-ready datasets—exactly how modern orgs work.
You will use S3, Glue (PySpark), Athena, Lake Formation, Glue Catalog, Apache Iceberg, Redshift Serverless (external & managed tables), IAM, Lambda, DynamoDB, CloudWatch/CloudTrail—with a focus on cost, reliability, and auditability.
What you’ll build (5 connected projects)
Project 1 — Lakehouse on AWS: S3 + Apache Iceberg
Land RAW to S3, transform with Glue, publish Iceberg bronze/silver, implement partitioning & schema evolution, and gate publishes with data quality checks.Project 2 — Data Governance with Lake Formation
Enforce tag-based policies (LF-Tags), column masking and row-level filters (Data Cells Filters). Prove access in Athena (Analyst vs Scientist). Add lightweight audit.Project 3 — Data Warehouse on Redshift Serverless (External + SCD2)
Expose Iceberg via external tables, build star schema (facts/dims), implement SCD2 with MERGE, and tune performance/cost (sort/dist keys, WLM/workgroup choices).Project 4 — A Day in the Life of a Data Engineer (Batch Ops Simulation)
Orchestrate ingest → DQ → publish, handle schema change / late data, rerun safely, backfill last N days, and write a clear incident postmortem.Project 5 — AI/ML Readiness & Serving
Curate ML-friendly/feature-like tables, ensure reproducible training sets using Iceberg snapshots/time travel, and (optional) integrate SageMaker/Athena for model workflows.
Course Content
- 2 section(s)
- 15 lecture(s)
- Section 1 Project 1 — Data Lakehouse on AWS: S3 + Apache Iceberg
- Section 2 Project 2 — DATA Governance with AWS Lake Formation
What You’ll Learn
- Design an AWS Data Lakehouse with S3 + Glue + Iceberg + Athena + Glue Catalog, Apply Data Governance using Lake Formation: LF-Tags/TBAC, PII masking, row-level security, Build a Redshift Serverless warehouse: external tables over Iceberg, star schema, SCD2 with MERGE, Operate batch pipelines: orchestrate runs, handle break/fix, idempotent replays, and backfills, Validate data with quality checks and use auditing/lineage (Lambda+DynamoDB, CloudWatch/CloudTrail), Produce ML-ready datasets and reproducible training views via Iceberg snapshots/time travel
Skills covered in this course
Reviews
-
jj .thomas
this what exactly i was looking for, excellent