Products
Features
YouTube Video Summarizer
Summarize YouTube videos
Web & PDF Highlighter
Highlight web pages & PDFs
Chat with PDF
Ask any PDF questions with AI
Ask AI Clone
Chat with your highlights & memories
Audio Transcriber
Transcribe audio files to text
Glasp Reader
Read and highlight articles
Kindle Highlight Export
Export your Kindle highlights
Idea Hatch
Hatch ideas from your highlights
Integrations
Obsidian Plugin
Notion Integration
Pocket Integration
Instapaper Integration
Medium Integration
Readwise Integration
Snipd Integration
Hypothesis Integration
Apps & Extensions
Chrome Extension
Safari Extension
Edge Add-ons
Firefox Add-ons
iOS App
Android App
Discover
Discover
Ideas
Discover new ideas and insights
Articles
Curated articles and insights
Books
Book recommendations by great minds
Posts
Essays and notes from readers
Quotes
Inspiring quotes collection
Videos
Curated videos and summaries
Explore Glasp
Glasp Newsletter
Weekly insights and updates
Glasp Talk
Interview series with great minds
Glasp Blog
Latest news and articles
Glasp Use Cases
Learn how others use Glasp
Build & Support
Glasp API
Access Glasp's API for developers
MCP Connector
Connect Glasp to Claude & ChatGPT
Community
Glasp Reddit Community
Students
Student discount and benefits
FAQs
Frequently Asked Questions
AboutPricing
DashboardLog inSign up

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

59.9K views
•
May 25, 2023
by
Umar Jamil
YouTube video player
Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

TL;DR

Learn how to build a Transformer model from scratch in PyTorch for language translation tasks, including creating the model, training it, and visualizing attention scores.

Transcript

hello guys welcome to another episode about the Transformer in this episode we will be building the Transformer from scratch using pytorque so coding it from zero we will building the model and we will also build the code for training it for inferencing and for visualizing the attention scores stick with me because it's going to be a long video but... Read More

Key Insights

  • 📝 The Transformer model is being built from scratch using PyTorch, starting with the input embeddings and then moving on to positional encoding.
  • 🔍 The input embeddings convert the original sentence into a vector of 512 dimensions using a mapping between numbers and vectors.
  • 🔤 The positional encoding conveys information about the position of each word in the sentence.
  • 🔢 The multi-head attention is a crucial part of the model that calculates the relationship between different words in the sentence.
  • 🔄 Residual connections are used to skip connections between different sub-layers of the model, aiding in the flow of information.
  • 🌐 The model is designed to handle translation tasks, such as translating English to Italian.
  • 💻 The training code is built using PyTorch and includes the creation of the data set, tokenizer, and the use of hugging face libraries.
  • 🎯 The completed Transformer model is capable of translating sentences from one language to another with the help of tokenization and attention mechanisms.

Install to Summarize YouTube Videos and Get Transcripts

Explore YouTube Video Summarizer or Get YouTube Transcript Extractor

Questions & Answers

Q: What is the purpose of the input embeddings and how are they created?

The input embeddings convert the original sentence into a vector of 512 dimensions and are created by mapping each word in the vocabulary to a unique number and then converting these numbers into embedding vectors using the embedding layer provided by PyTorch.

Q: How are the positional encodings generated in the Transformer model?

The positional encodings convey the positional information of each word in the sentence and are created using a formula that includes special values based on the position of each word. These position encodings are added to the input embeddings.

Q: What is the role of the multi-head attention in the Transformer model?

The multi-head attention allows the model to capture relationships between different words within the same sentence. It consists of three matrices (query, key, and value) that are multiplied together to calculate attention scores, which are used to weight the importance of each word when generating the output.

Q: How is the training process for the Transformer model explained in the content?

The content provides details on data preprocessing, including tokenization, creating input tensors, and splitting data into training and validation sets. It also covers training the model using standard machine learning techniques and visualizing attention scores to evaluate translation performance.

Q: What is the purpose of the projection layer in the Transformer model?

The projection layer converts the output of the multi-head attention, which is a sequence by D_model matrix, into a sequence by vocabulary_size matrix, effectively predicting the most likely word for each position in the output sentence.

Q: How are the encoder and decoder blocks combined together to create the Transformer model?

The encoder and decoder blocks are combined by connecting the output of the encoder to the input of the decoder. This allows the decoder to use information learned by the encoder to generate the output sentence.

Q: What is the role of the positional encoding in the Transformer model?

The positional encoding provides information about the position of each word in the sentence to the model. It ensures that the model understands the sequential ordering of words in a sentence and helps it learn dependencies between words.

Summary & Key Takeaways

  • The content explains how to build a Transformer model from scratch using PyTorch for language translation.

  • It covers the process of creating the input embeddings, positional encoding, encoder and decoder blocks, and the projection layer.

  • The content also includes details on training the model, including data preprocessing and visualization of attention scores.


Read in Other Languages (beta)

English

Share This Summary 📚

Summarize YouTube Videos and Get Video Transcripts with 1-Click

Download browser extensions on:

Try YouTube Summary with ChatGPT & Claude or YouTube Transcript Generator

Explore More Summaries from Umar Jamil 📚

Variational Autoencoder - Model, ELBO, loss function and maths explained easily! thumbnail
Variational Autoencoder - Model, ELBO, loss function and maths explained easily!
Umar Jamil
What Is the Transformer Model and Its Advantages Over RNNs? thumbnail
What Is the Transformer Model and Its Advantages Over RNNs?
Umar Jamil
Attention is all you need (Transformer) - Model explanation (including math), Inference and Training thumbnail
Attention is all you need (Transformer) - Model explanation (including math), Inference and Training
Umar Jamil
Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math thumbnail
Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math
Umar Jamil
Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW) thumbnail
Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)
Umar Jamil

Summarize YouTube Videos and Get Video Transcripts with 1-Click

Download browser extensions on:

Try YouTube Summary with ChatGPT & Claude or YouTube Transcript Generator

Apps & Extensions

  • Chrome Extension
  • Safari Extension
  • Edge Add-ons
  • Firefox Add-ons
  • iOS App
  • Android App

Key Features

  • YouTube Video Summarizer
  • Web & PDF Summarizer
  • Web & PDF Highlighter
  • Chat with PDF
  • Ask AI Clone
  • Audio Transcriber
  • Glasp Reader
  • Kindle Highlight Export
  • Idea Hatch

Integrations

  • Obsidian Plugin
  • Notion Integration
  • Pocket Integration
  • Instapaper Integration
  • Medium Integration
  • Readwise Integration
  • Snipd Integration
  • Hypothesis Integration

More Features

  • APIs
  • MCP Connector
  • Blog & Post
  • Embed Links
  • Image Highlight
  • Personality Test
  • Quote Shots

Company

  • About us
  • Blog
  • Community
  • FAQs
  • Job Board
  • Newsletter
  • Pricing
Terms

•

Privacy

•

Guidelines

© 2026 Glasp Inc. All rights reserved.