CTgoodjobs - Math Behind LLMs, Transformers and Modern Computer Vision

Save Course Compare

Course Information

Registration period

Year-round Recruitment

Course Level

Short Course

Study Mode

Online

Duration

6 Hour(s) 56 Minute(s)

Language

English

Taught by

Patrik Szepesi

Rating

4.5

(1,084 Ratings)

8 views

Course Overview

From Multi Head Attention and Embeddings to Transformers, Vision Transformers, Modern Image Segmentation + LLMs and SAM

Welcome to Math Behind LLMs, Transformers and Modern Computer Vision, a rigorous deep dive into the mathematical foundations powering today’s most advanced AI systems.

This course is designed for learners who want more than intuition. We derive and analyze the core equations behind Large Language Models, Vision Transformers, and modern image segmentation systems.

You will begin with tokenization and embedding mathematics, understanding how raw text becomes high-dimensional vector representations through algorithms like WordPiece. From there, we mathematically unpack the heart of transformer architectures: query, key, and value matrices, attention score computation, scaling behavior, and multi-head attention.

We examine attention masks, contextual encoding, and positional encodings — including the sine and cosine formulations that preserve sequence structure. You’ll build strong geometric intuition around vectors, dot products, cosine similarity, and dense embeddings.

The course then expands beyond language.

You’ll compare Convolutional Neural Networks with Vision Transformers, analyze quadratic attention operations, and walk through the complete Vision Transformer pipeline from patch embeddings to final predictions.

In an advanced section, we dissect the mathematics behind Meta’s Segment Anything Model (SAM). You will explore prompt encoders, self-attention, cross-attention between prompts and images, attention score computation in segmentation models, and how these systems are trained at scale.

By the end of this course, you won’t just understand how transformers work — you will understand why they work at the equation level across language and vision.

If you aim to build deep technical mastery and develop the mathematical intuition required for cutting-edge AI research and engineering, this course will elevate your expertise.

See more details

Course Content

6 section(s)
47 lecture(s)

Section 1 Course Overview
Section 2 Tokenization and Multidimensional Word Embeddings
Section 3 Positional Encodings
Section 4 Attention Mechanism and Transformer Architecture
Section 5 Mathematics Behind Vision Transformers
Section 6 Mathematics Behind Meta's SAM(Segment Anything Model)

See more details

What You’ll Learn

Mathematics Behind Large Language Models, Modern Image Segmentation, Positional Encodings, Compare CNNs and Vision Transformers mathematically, Compute prompt self-attention and image–prompt cross-attention, Multi Head Attention, Query, Value and Key Matrix, Attention Masks, Masked Language Modeling, Dot Products and Vector Alignments, Nature of Sine and Cosine functions in Positional Encodings, How models like ChatGPT work under the hood, Bidirectional Models, Context aware word representations, Vision Transformers, Word Embeddings, How dot products work, Modern Computer Vision, Understand quadratic complexity in Vision Transformers, Matrix multiplication, Programatically Create tokens, Derive self-attention, multi-head attention, and cross-attention from scratch, Analyze the full Vision Transformer pipeline, Break down the mathematics of Meta’s Segment Anything Model (SAM), Understand prompt encoders in modern segmentation models

See more details

Skills covered in this course

Reviews

A
Abrar Noor
5.0
The teaching method and explanations were excellent! Hope to learn more things from Patrik Szepesi!!
T
Thomas Schmidt
5.0
Der Kurs hat sehr gut die technischen und mathematischen Grundlagen erläutert wie Transformer funktioniert. Ich habe jetzt die Zusammenhänge verstanden und die Logik des Systems. Vielen Dank für diesen sehr guten Kurs.
R
Robert Oved
5.0
Well explained concepts. The subject in most cases would be hard to grasp but this course breaks it down layer by layer making it a great introduction to the topic of large language models.
A
Albin Morisseau
5.0
Très bon niveau de détails dans les explications. J'ai beaucoup aimé les visualisations qui apportent vraiment quelque chose en plus

Udemy

Math Behind LLMs, Transformers and Modern Computer Vision

Course Information

Course Overview

Course Content

What You’ll Learn

Skills covered in this course

Reviews

Start FollowingSee all

Start FollowingSee all

Courses that Might Interest You

Learning Insight

Media Coverage

Udemy

Math Behind LLMs, Transformers and Modern Computer Vision

Course Information

Course Overview

Course Content

What You’ll Learn

Skills covered in this course

Related Fields of Study

Reviews

Start FollowingSee all

Start FollowingSee all

Courses that Might Interest You

Learning Insight

Media Coverage

Udemy

Math Behind LLMs, Transformers and Modern Computer Vision

Free eNewsletter