Udemy

Data Engineering using Kafka and Spark Structured Streaming

Enroll Now
  • 5,395 Students
  • Updated 12/2024
4.2
(312 Ratings)
CTgoodjobs selects quality courses to enhance professionals' competitiveness. By purchasing courses through links on our site, we may receive an affiliate commission.

Course Information

Registration period
Year-round Recruitment
Course Level
Study Mode
Duration
9 Hour(s) 35 Minute(s)
Language
English
Rating
4.2
(312 Ratings)

Course Overview

Data Engineering using Kafka and Spark Structured Streaming

A comprehensive Data Engineering course on building streaming pipelines using Kafka and Spark Structured Streaming

As part of this course, you will be learning to build streaming pipelines by integrating Kafka and Spark Structured Streaming. Let us go through the details about what is covered in the course.

  • First of all, we need to have the proper environment to build streaming pipelines using Kafka and Spark Structured Streaming on top of Hadoop or any other distributed file system. As part of the course, you will start with setting up a self-support lab with all the key components such as Hadoop, Hive, Spark, and Kafka on a single node Linux-based system.

  • Once the environment is set up you will go through the details related to getting started with Kafka. As part of that process, you will create a Kafka topic, produce messages into the topic as well as consume messages from the topic.

  • You will also learn how to use Kafka Connect to ingest data from web server logs into Kafka topic as well as ingest data from Kafka topic into HDFS as a sink.

  • Once you understand Kafka from the perspective of Data Ingestion, you will get an overview of some of the key concepts of related Spark Structured Streaming.

  • After learning Kafka and Spark Structured streaming separately, you will build a streaming pipeline to consume data from Kafka topic using Spark Structured Streaming, then process and write to different targets.

  • You will also learn how to take care of incremental data processing using Spark Structured Streaming.

Course Outline

Here is a brief outline of the course. You can choose either Cloud9 or GCP to provision a server to set up the environment.

  • Setting up Environment using AWS Cloud9 or GCP

  • Setup Single Node Hadoop Cluster

  • Setup Hive and Spark on top of Single Node Hadoop Cluster

  • Setup Single Node Kafka Cluster on top of Single Node Hadoop Cluster

  • Getting Started with Kafka

  • Data Ingestion using Kafka Connect - Web server log files as a source to Kafka Topic

  • Data Ingestion using Kafka Connect - Kafka Topic to HDFS a sink

  • Overview of Spark Structured Streaming

  • Kafka and Spark Structured Streaming Integration

  • Incremental Loads using Spark Structured Streaming

Udemy based support

In case you run into technical challenges while taking the course, feel free to raise your concerns using Udemy Messenger. We will make sure that issue is resolved in 48 hours.

Course Content

  • 11 section(s)
  • 113 lecture(s)
  • Section 1 Introduction
  • Section 2 Getting Started with Kafka
  • Section 3 Data Ingestion using Kafka Connect
  • Section 4 Overview of Spark Structured Streaming
  • Section 5 Kafka and Spark Structured Streaming Integration
  • Section 6 Incremental Loads using Spark Structured Streaming
  • Section 7 Setting up Environment using AWS Cloud9
  • Section 8 Setting up Environment - Overview of GCP and Provision Ubuntu VM
  • Section 9 Setup Single Node Hadoop Cluster
  • Section 10 Setup Hive and Spark
  • Section 11 Setup Single Node Kafka Cluster

What You’ll Learn

  • Setting up self support lab with Hadoop (HDFS and YARN), Hive, Spark, and Kafka, Overview of Kafka to build streaming pipelines, Data Ingestion to Kafka topics using Kafka Connect using File Source, Data Ingestion to HDFS using Kafka Connect using HDFS 3 Connector Plugin, Overview of Spark Structured Streaming to process data as part of Streaming Pipelines, Incremental Data Processing using Spark Structured Streaming using File Source and File Target, Integration of Kafka and Spark Structured Streaming - Reading Data from Kafka Topics

Reviews

  • A
    Ashish Lilha
    5.0

    Very Good and Structured aproach

  • W
    Willy Miguel Herrera
    3.0

    go

  • S
    Sandilyan Gurunath
    4.0

    repeatedly telling the same thing again and again.

  • D
    Deepak Makwana
    5.0

    It's really a great course for learning Kafka

Start FollowingSee all

We use cookies to enhance your experience on our website. Please read and confirm your agreement to our Privacy Policy and Terms and Conditions before continue to browse our website.

Read and Agreed