Udemy

Data Engineering Vol2 AWS : Data Processing - Spark & Kafka

Enroll Now
  • 70 Students
  • Updated 2/2026
4.9
(12 Ratings)
CTgoodjobs selects quality courses to enhance professionals' competitiveness. By purchasing courses through links on our site, we may receive an affiliate commission.

Course Information

Registration period
Year-round Recruitment
Course Level
Study Mode
Language
English
Taught by
Soumyadeep Dey
Rating
4.9
(12 Ratings)

Course Overview

Data Engineering Vol2 AWS : Data Processing - Spark & Kafka

Batch & Stream Processing using Spark (PySpark) and Kafka on AWS (EMR & Databricks)

This is Volume 2 of Data Engineering course. In this course I will talk about Open Source Data Processing technologies - Spark and Kafka, which are the most used and most popular data processing frameworks for Batch & Stream Processing. In this course you will learn Spark from Level 100 to Level 400 with real-life hands on and projects. I will also introduce you to Data Lake on AWS (that is S3) & Data Lakehouse using Apache Iceberg.


I will use AWS as the hosting platform and talk about AWS Services - EMR, S3 and MSK. I will cover Databricks as Spark hosting platform. I will also show you Spark integration with other services like AWS RDS (MySQL or PostgreSQL) and Redshift.


You will get opportunities to do hands-on using large datasets (100 GB - 300 GB or more of data). This course will provide you hands-on exercises that match with real-time scenarios like Spark batch processing, stream processing, performance tuning, streaming ingestion, Window functions, ACID transactions on Iceberg etc.

Some other highlights:

  • 10 Projects with different datasets. Total dataset size of 250 GB or more.

  • Other technologies covered - EC2, EBS, VPC and IAM.

  • Optional Python videos

  • Optional AWS and SQL Essentials videos


I will conclude the Data Engineering course with Volume 3, in which, I will be covering the following Topics.

  • Flink

  • Apache Airflow

  • Apache Pinot

  • AWS Kinesis

Please provide feedback and suggestions if you want me to add any other topics.

Course Content

  • 20 section(s)
  • 193 lecture(s)
  • Section 1 Introduction to Data Engineering Volume 2
  • Section 2 Big Data Processing
  • Section 3 Introduction to Spark
  • Section 4 Knowing Spark - Up Close Part 1
  • Section 5 Spark Transformation & Action - Part 1
  • Section 6 Spark Partitions - Input, Shuffle & Output
  • Section 7 Knowing Spark - Up Close Part 2
  • Section 8 Transformation & Action Part 2 + Spark Functions
  • Section 9 Knowing Spark - Up Close Part 3
  • Section 10 Hosting Platforms - AWS EMR (Elastic MapReduce)
  • Section 11 PROJECT ASSIGNMENT 5 & 6 (20GB + 35GB) - Power Grid Analysis, Customer 360 Ana
  • Section 12 Spark SQL
  • Section 13 Data Lakehouse using Open Table Format (OTF) - Iceberg
  • Section 14 PROJECT ASSIGNMENT 8- End-to-end Lakehouse (Iceberg) Architecture Implementation
  • Section 15 Apache Kafka - The Streaming Ingestion
  • Section 16 Spark Streaming - Stream Processing using Spark
  • Section 17 PROJECT 9 - Real Time Vehicle Route Analysis
  • Section 18 AWS Lambda for Data Processing
  • Section 19 (Optional) AWS Essentials
  • Section 20 (Optional) SQL Essentials for Data Engineering

What You’ll Learn

  • Deep dive on Spark and Kafka using AWS EMR, Databricks, MSK, Understand Data Engineering (Volume 2) on AWS using Spark and Kafka, Batch and Stream processing using Spark and Kafka, Production level projects and hands-on to help candidates provide on-job-like training, Get access to datasets of size 100 GB - 200 GB and practice using the same, Learn Python for Data Engineering with HANDS-ON (Functions, Arguments, OOP (class, object, self), Modules, Packages, Multithreading, file handling etc., Learn SQL for Data Engineering with HANDS-ON (Database objects, CASE, Window Functions, CTE, CTAS, MERGE, Materialized View etc.), AWS Data Analytics services - S3, EMR, Databricks, MSK


Reviews

  • l
    linda haybard
    5.0

    Excellent. Excellent. Excellent.

  • A
    Amar Sharma
    5.0

    Data engineering is an invaluable skill to acquire in today’s evolving tech landscape. I have found the perfect Udemy course to help me upskill. Thank you for the great content.

  • R
    Ria
    5.0

    Excellent course.

Start FollowingSee all

We use cookies to enhance your experience on our website. Please read and confirm your agreement to our Privacy Policy and Terms and Conditions before continue to browse our website.

Read and Agreed