Products
Features
YouTube Video Summarizer
Summarize YouTube videos
Web & PDF Highlighter
Highlight web pages & PDFs
Chat with PDF
Ask any PDF questions with AI
Ask AI Clone
Chat with your highlights & memories
Audio Transcriber
Transcribe audio files to text
Glasp Reader
Read and highlight articles
Kindle Highlight Export
Export your Kindle highlights
Idea Hatch
Hatch ideas from your highlights
Integrations
Obsidian Plugin
Notion Integration
Pocket Integration
Instapaper Integration
Medium Integration
Readwise Integration
Snipd Integration
Hypothesis Integration
Apps & Extensions
Chrome Extension
Safari Extension
Edge Add-ons
Firefox Add-ons
iOS App
Android App
Discover
Discover
Ideas
Discover new ideas and insights
Articles
Curated articles and insights
Books
Book recommendations by great minds
Posts
Essays and notes from readers
Quotes
Inspiring quotes collection
Videos
Curated videos and summaries
Explore Glasp
Glasp Story
How we grew from 0 to 3 million users
Glasp Newsletter
Weekly insights and updates
Glasp Talk
Interview series with great minds
Glasp Blog
Latest news and articles
Glasp Use Cases
Learn how others use Glasp
Build & Support
Glasp API
Access Glasp's API for developers
MCP Connector
Connect Glasp to Claude & ChatGPT
Community
Glasp Reddit Community
Students
Student discount and benefits
FAQs
Frequently Asked Questions
AboutPricing
DashboardLog inSign up

DeepMind's AI Learns Object Sounds | Two Minute Papers #224

20.3K views
•
January 30, 2018
by
Two Minute Papers
YouTube video player
DeepMind's AI Learns Object Sounds | Two Minute Papers #224

TL;DR

This video discusses the development of an AI that can match video and audio and identify the source of sounds in a video, using unsupervised training and cross-modal retrieval.

Transcript

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. This work is about creating an AI that can perform audio-visual correspondence. This means two really cool tasks: One, when given a piece of video and audio, it can guess whether they match each other. And two, it can localize the source of the sounds heard in the video. Hm-... Read More

Key Insights

  • 🎮 The AI can determine if video and audio match each other and locate the source of sounds in a video.
  • 🧑‍🦽 The entire network is trained from scratch, without the need for manual labeling or instructions.
  • 😵 Cross-modal retrieval allows the AI to find related images or sounds based on a given input.
  • 💦 The AI's architecture and results are compared to a previous work called Look, Listen & Learn.
  • 🎮 Deep learning algorithms can process and understand video and audio signals using the same network architecture.
  • 👍 The AI's ability to produce a distance metric between video and audio proves advantageous for various tasks.
  • ❓ The results obtained by the AI offer both verification and potential debate.

Install to Summarize YouTube Videos and Get Transcripts

Explore YouTube Video Summarizer or Get YouTube Transcript Extractor

Questions & Answers

Q: How does the AI determine if video and audio match each other?

The AI uses a distance metric to encode the distance between the video and audio signals. A small distance signifies a match, while a large distance indicates a mismatch.

Q: What is cross-modal retrieval?

Cross-modal retrieval allows the AI to find images or sounds that are similar to a given input sound, or vice versa. It helps in connecting different modalities and finding related content.

Q: How is the AI trained?

The AI is trained in an unsupervised manner, where it learns from a dataset without any additional labels or instructions. It uses the information available in the video and audio signals to find associations.

Q: What are the applications of this AI technology?

This AI technology can have applications in various fields, such as video and audio editing, content creation, automatic video captioning, and multimedia search engines.

Summary & Key Takeaways

  • The video explains that a new AI has been created that can determine if video and audio match each other, as well as localize the source of sounds in the video.

  • The AI is trained from scratch and can perform cross-modal retrieval, finding pictures or sounds similar to a given input sound.

  • The training is unsupervised, meaning the AI learns without additional labels or instructions.


Read in Other Languages (beta)

English

Share This Summary 📚

Summarize YouTube Videos and Get Video Transcripts with 1-Click

Download browser extensions on:

Try YouTube Summary with ChatGPT & Claude or YouTube Transcript Generator

Explore More Summaries from Two Minute Papers 📚

This Neural Network Learned The Style of Famous Illustrators thumbnail
This Neural Network Learned The Style of Famous Illustrators
Two Minute Papers
OpenAI’s DALL-E 3-Like AI For Free, Forever! thumbnail
OpenAI’s DALL-E 3-Like AI For Free, Forever!
Two Minute Papers
Is Visualizing Light Waves Possible? ☀️ thumbnail
Is Visualizing Light Waves Possible? ☀️
Two Minute Papers
This Adorable Baby T-Rex AI Learned To Dribble 🦖 thumbnail
This Adorable Baby T-Rex AI Learned To Dribble 🦖
Two Minute Papers
How to Create Virtual Worlds with AI thumbnail
How to Create Virtual Worlds with AI
Two Minute Papers
How Does the Material Point Method Enhance Simulations? thumbnail
How Does the Material Point Method Enhance Simulations?
Two Minute Papers

Summarize YouTube Videos and Get Video Transcripts with 1-Click

Download browser extensions on:

Try YouTube Summary with ChatGPT & Claude or YouTube Transcript Generator

Apps & Extensions

  • Chrome Extension
  • Safari Extension
  • Edge Add-ons
  • Firefox Add-ons
  • iOS App
  • Android App

Key Features

  • YouTube Video Summarizer
  • Web & PDF Summarizer
  • Web & PDF Highlighter
  • Chat with PDF
  • Ask AI Clone
  • Audio Transcriber
  • Glasp Reader
  • Kindle Highlight Export
  • Idea Hatch

Integrations

  • Obsidian Plugin
  • Notion Integration
  • Pocket Integration
  • Instapaper Integration
  • Medium Integration
  • Readwise Integration
  • Snipd Integration
  • Hypothesis Integration

More Features

  • APIs
  • MCP Connector
  • Blog & Post
  • Embed Links
  • Image Highlight
  • Personality Test
  • Quote Shots
  • Open Graph Checker

Company

  • About us
  • Our Story
  • Blog
  • Community
  • FAQs
  • Job Board
  • Newsletter
  • Pricing
Terms

•

Privacy

•

Guidelines

© 2026 Glasp Inc. All rights reserved.