CTgoodjobs - Azure Databricks and Spark SQL (Python)

Save Course Compare

Course Information

Registration period

Year-round Recruitment

Course Level

Short Course

Study Mode

Online

Language

English

Taught by

Malvik Vaghadia

Certificate

Available
*The delivery and distribution of the certificate are subject to the policies and arrangements of the course provider.

Rating

4.6

(2,940 Ratings)

4 views

Course Overview

Your Hands-On Guide to Databricks Data Engineering with PySpark and Spark SQL, including a 4-Part Course Project

[ This course has been completely refreshed with 17 hours of brand-new content (Sept 2025)]

I’m Malvik Vaghadia, a Data Engineer and Architect with nearly 15 years of professional experience. I’ve worked on multiple large-scale lakehouse implementations and consulted for enterprise clients. As an instructor, I’ve taught 200,000+ students worldwide and hold a 4.6+ instructor rating. Since launching this course, it has become one of Udemy’s best-sellers in the Databricks category, and this new version (Sept 2025) has been completely rebuilt with 17 hours of brand-new content.

Why Learn Databricks

Databricks is recognised as a Leader in the Gartner Magic Quadrant for Data & AI platforms. It has become the go-to lakehouse platform for modern data engineering, enabling organisations to build, orchestrate, and optimise pipelines at scale. By mastering Databricks, you’ll be learning one of the most in-demand skills in today’s data landscape.

Course Delivery Style

This course is designed with the right balance of theory, hands-on coding, and practical projects. Every concept is explained clearly, then demonstrated live in Databricks, and reinforced with a multi-phase, end-to-end project that you’ll build step by step. You’ll also get all course notebooks as downloadable materials, containing the full code, step-by-step documentation, and extra resources so you can follow along easily.

Curriculum Highlights:

Four Part Course Project: End-to-end NYC Taxi project and further pipeline builds across multiple parts as you develop your knowledge.
Foundations: What data engineering is, why Databricks, the Spark architecture, PySpark, and the Lakehouse.
Azure setup: Account creation, resources, role-based access control, naming conventions, and cost management.
Databricks setup: Creating and configuring a workspace, navigating the UI, and handling personal email restrictions.
Databricks notebooks and workspace: Markdown, comments, organising objects, mixing languages, and notebook tips.
Databricks compute: Clusters, DBU pricing, runtimes, serverless vs all-purpose compute, instance pools, and SQL warehouses.
Spark SQL (Python): Writing Spark SQL code using both SQL syntax and DataFrame APIs, reading/writing different file formats, defining schemas, and managing tables and views.
PySpark Transformations: Column operations, functions, filtering, sorting, joining, aggregations, pivots, and conditional logic.
Medallion architecture: Bronze, Silver, and Gold layers explained and implemented.
Delta Lake: Transaction log, schema enforcement and evolution, time travel, and DML operations (MERGE, UPDATE, DELETE).
Workflows and jobs: Passing parameters, handling failures, concurrency, conditional tasks, and monitoring.
Git & local development: VS Code setup, linking with GitHub, repos, and workflow best practices.
Functions and modularization: Creating and importing Python modules, UDFs, and project structuring.
Unity Catalog & governance: Metastores, securable objects, workspace roles, external locations, and permissions.
Streaming & Lakeflow pipelines: Structured Streaming concepts, Auto Loader, watermarking, triggers, and the new Lakeflow (DLT) pipeline model.
Performance: Lazy evaluation, explain plans, caching, shuffles, broadcast joins, partitioning, Z-ORDER, and Liquid Clustering.
Automation & CI/CD: Programmatic interaction with Databricks, CLI demo, and high-level CI/CD overview.

By the end of the course, you’ll have both the knowledge and confidence to design, build, and optimise production-grade data pipelines on Databricks.

See more details

Course Content

52 section(s)
357 lecture(s)

Section 1 Course Introduction
Section 2 Azure Set Up
Section 3 Databricks Set Up
Section 4 Databricks Notebooks and Workspace Objects
Section 5 Databricks Compute
Section 6 ⚠️ Course Materials - Important!
Section 7 Getting Started with Spark SQL (Python)
Section 8 Reading and Writing Data with Spark SQL (Python)
Section 9 Managed Tables & Views, and SQL
Section 10 Creating DataFrames from Python Objects
Section 11 DataFrame Transformations - Column Operations
Section 12 DataFrame Transformations - PySpark Functions
Section 13 DataFrame Transformations - Combining Datasets
Section 14 DataFrame Transformations - Filtering and Sorting
Section 15 DataFrame Transformations - Grouping
Section 16 DataFrame Transformations - Conditional Expressions
Section 17 Medallion Architecture
Section 18 NYC Taxi Course Project - Part 1
Section 19 Deep Dive into Delta Lake
Section 20 Slowly Changing Dimensions Type 2
Section 21 Lakeflow Jobs (formlerly Databricks Jobs)
Section 22 Performance Considerations
Section 23 NYC Taxi Course Project - Part 2
Section 24 Local Development and Version Control with GitHub
Section 25 Functions
Section 26 NYC Taxi Course Project - Part 3
Section 27 Unity Catalog, Workspace Administration and External Locations
Section 28 NYC Taxi Course Project - Part 4
Section 29 Introduction to Spark Structured Streaming
Section 30 Lakeflow Declarative Pipelines (formerly Delta Live Tables)
Section 31 Databricks Automation Tools and Introduction to CI/CD (Theory)
Section 32 Congratulations on Completing the Course
Section 33 [LEGACY] Course Overview / Introduction to Spark and Databricks
Section 34 [LEGACY] Azure and Databricks Set Up
Section 35 [LEGACY] Reading and Writing Data
Section 36 [LEGACY] Data Analysis and Transformation with SparkSQL
Section 37 [LEGACY] Utilising the Medallion Architecture in Databricks
Section 38 [LEGACY] Challenge Section: Customer Orders
Section 39 [LEGACY] Visualizations and Dashboards
Section 40 [LEGACY] Accessing Data from Azure Data Lake Storage (ADLS) with Databricks
Section 41 [LEGACY] Hive Metastore, Databases, Tables and Views
Section 42 [LEGACY] Challenge Section: Employees
Section 43 [LEGACY] Databricks Data Lakehouse / Delta Lake
Section 44 [LEGACY] Modularize Code and Link Notebooks
Section 45 [LEGACY] Challenge Section: Health Updates
Section 46 [LEGACY] Spark Structured Streaming and Auto Loader
Section 47 [LEGACY] Delta Live Tables
Section 48 [LEGACY] Databricks Jobs
Section 49 [LEGACY] Access Control Lists (ACLs)
Section 50 [LEGACY] Databricks CLI (Command Line Interface)
Section 51 [LEGACY] Source Control with Databricks Repos and Azure DevOps
Section 52 [LEGACY] CI/CD with Databricks

See more details

What You’ll Learn

How to use Databricks to build and run data engineering workflows
The principles of the Lakehouse architecture with Delta Lake
How to process data with Spark SQL and PySpark
Best practices for Databricks compute, jobs, and orchestration
How to apply governance with Unity Catalog and manage secure access
Working with streaming pipelines using Structured Streaming and Lakeflow
Applying concepts to real-world projects with modular code and version control
Real World Scenarios

See more details

Reviews

C
Chirag V
5.0
I really learnt a lot from the course, this could be a really course for someone looking from basic understanding to intermediate level of learning. Rest advance level can be gained through professional experience. This is a great course to start with for all the aspiring data engineers out there. Thank you, Malvik for the fresh course created with the latest UI on Azure Databricks platform. Once again, Great course to learn from!
S
Stefan Moser
5.0
very nice course, i love the new content from sep 25
J
Justin Proctor
2.5
I did not like that I had to watch the legacy content as well to be able to complete the course.
A
Aneta
5.0
Everything is explained clearly and legibly. The instructor speaks clearly and at an appropriate speed, which also helps. I really recommend it so far.

Udemy

Azure Databricks and Spark SQL (Python)

Course Information

Course Overview

Course Content

What You’ll Learn

Reviews

Start FollowingSee all

Start FollowingSee all

Courses that Might Interest You

Learning Insight

Media Coverage

Udemy

Azure Databricks and Spark SQL (Python)

Course Information

Course Overview

Course Content

What You’ll Learn

Related Fields of Study

Reviews

Start FollowingSee all

Start FollowingSee all

Courses that Might Interest You

Learning Insight

Media Coverage

Udemy

Azure Databricks and Spark SQL (Python)

Free eNewsletter