Udemy

AI Vision Systems for Self-Driving Cars in Production on AWS

Enroll Now
  • 70 Students
  • Updated 3/2026
5.0
(03 Ratings)
CTgoodjobs selects quality courses to enhance professionals' competitiveness. By purchasing courses through links on our site, we may receive an affiliate commission.

Course Information

Registration period
Year-round Recruitment
Course Level
Study Mode
Duration
6 Hour(s) 18 Minute(s)
Language
English
Taught by
Patrik Szepesi
Rating
5.0
(03 Ratings)

Course Overview

AI Vision Systems for Self-Driving Cars in Production on AWS

Computer Vision on AWS: SageMaker, Rekognition, ViTs and Meta's Segment Anything Model for Detection+Segmentation + Math

Building a successful computer vision product—especially for self-driving car perception—starts with two things: strong foundations and real, scalable systems.

In this course, you’ll learn how to build your own autonomous driving–style vision pipeline using Meta’s Segment Anything Model (SAM), Vision Transformers (ViTs), and AWS Rekognition—while actually understanding the math and intuition behind how these models work.

We begin by exploring Vision Transformers from the ground up, focusing on clear, intuitive explanations of patch embeddings, attention mechanisms, and model representations. You’ll see the underlying mathematics of attention, embeddings, and similarity—and how these ideas translate into the perception capabilities modern self-driving stacks rely on. From there, we dive into Meta’s SAM architecture, explaining how prompts, embeddings, and mask decoding work together to produce high-quality segmentation results—again connecting the math to the behavior you observe, without treating the model as a black box.

You’ll then see how these open-source models fit into real-world self-driving perception workflows. We integrate AWS Rekognition for high-level detection and metadata extraction, and combine it with SAM to create automated, pixel-level labeling pipelines—the kind used to scale dataset creation for autonomous driving. Throughout, you’ll learn how model outputs (scores, embeddings, masks) relate to the underlying objectives and representations that make the pipeline reliable.

A strong emphasis is placed on visualization and practical understanding. You’ll inspect masks, bounding boxes, confidence signals, embeddings, and failure cases, and learn how mathematical concepts translate directly into model behavior you can observe, debug, and improve—critical when building perception systems for safety-sensitive applications like self-driving cars.

By the end of the course, you won’t just know how to run SAM or call an AWS API. You’ll understand why the models work, how to combine managed cloud services with open-source research, and how to think like someone building a real computer vision startup focused on scalable autonomous vehicle perception—not just a demo.

This course is ideal if you want to go beyond surface-level tutorials and gain a clear, intuitive understanding of modern computer vision systems—from the math behind Transformers and segmentation to production-grade perception pipelines used in autonomous driving.

Course Content

  • 8 section(s)
  • 50 lecture(s)
  • Section 1 What We Are Building
  • Section 2 Mathematics behind Vision Transformers
  • Section 3 Mathematics Behind Meta's SAM(Segment Anything Model)
  • Section 4 Setting up Our AWS Environment
  • Section 5 Setting up Open Source Models Like Meta's SAM
  • Section 6 Visualizing our Outputs
  • Section 7 Saving Results to S3
  • Section 8 Testing + Setup

What You’ll Learn

  • Build an end-to-end auto-labeling pipeline using Segment Anything (SAM) for large-scale image datasets, Understand how Vision Transformers (ViTs) work internally, including patch embeddings and self-attention, Explain the core mathematics behind SAM, including mask decoding and prompt conditioning, Run GPU-accelerated segmentation workloads efficiently using modern deep-learning stacks, Compare SAM ViT-B, ViT-L, and ViT-H models and choose the right one for cost, speed, and accuracy, Integrate AWS Rekognition for high-level object detection and metadata extraction, Combine AWS Rekognition outputs with SAM masks to create precise, pixel-level labels, Visualize segmentation masks, bounding boxes, and confidence scores for model debugging, Analyze trade-offs between open-source CV models and managed cloud services, Image Segmentation, How to Use Open Source Models in AWS Sagemaker, Optimize performance and memory usage when running SAM on large images, Use AWS-based pipelines to scale computer-vision workloads reliably, Bridge the gap between theory (math + models) and practical production pipelines, AWS Rekognition, Object Detection


Reviews

  • S
    Stephen Zhang
    5.0

    this is so amazing so far. So much depth, yet clarity

Start FollowingSee all

We use cookies to enhance your experience on our website. Please read and confirm your agreement to our Privacy Policy and Terms and Conditions before continue to browse our website.

Read and Agreed