Course Information
Course Overview
Apache Iceberg,Snowflake, Data Lake / Data Lakehouse , Data Engineering, Hands-on
This course is broadly divided into 8 sections,
Why Iceberg:
This will help you understand the significance of Iceberg and the challenges associated with traditional data warehouse architectures.
Iceberg environment setup:
We’ll set up a Spark environment with Iceberg in GitHub Codespaces. This will serve as a playground where you can run Iceberg commands and experiment hands-on.
Parquet file format:
We’ll dive deep into the Parquet file format to build a strong foundation. Understanding Parquet is essential because Iceberg is built on top of Apache Parquet and leverages its structure for efficient storage and querying.
Iceberg features:
We’ll explore key Iceberg features such as hidden partitioning, schema evolution, and time travel to understand how it addresses common limitations in traditional data lakes.
Iceberg concepts:
We’ll explore concepts like Copy-on-Write (COW), Merge-on-Read (MOR), and snapshot isolation to gain a deeper, more concrete understanding of how Iceberg manages data and ensures consistency.
Iceber with snowflake:
We’ll configure Iceberg with Snowflake and explore how Iceberg integrates with it, helping us understand the foundational concepts of using Iceberg within the Snowflake ecosystem.
Datalake with snowflake Iceberg:
We’ll build a sample data lake using Snowflake Iceberg and also demonstrate how to query Iceberg tables from Spark for cross-platform interoperability.
By the end of this course, you’ll have a solid understanding of the Iceberg table format—its advantages, use cases, and how to build an efficient data lake using Iceberg.
Course Content
- 8 section(s)
- 50 lecture(s)
- Section 1 Before we start
- Section 2 Why Iceberg
- Section 3 Iceberg environment setup
- Section 4 Parquet file format
- Section 5 Iceberg features
- Section 6 Iceberg concepts
- Section 7 Iceberg with snowflake
- Section 8 Datalake with snowflake iceberg
What You’ll Learn
- Iceberg fundamentals
- Problem with current datawarehouses
- Create datalake using snowflake and iceberg
- Understand parquet file format