Products
Features
YouTube Video Summarizer
Summarize YouTube videos
Web & PDF Highlighter
Highlight web pages & PDFs
Chat with PDF
Ask any PDF questions with AI
Ask AI Clone
Chat with your highlights & memories
Audio Transcriber
Transcribe audio files to text
Glasp Reader
Read and highlight articles
Kindle Highlight Export
Export your Kindle highlights
Idea Hatch
Hatch ideas from your highlights
Integrations
Obsidian Plugin
Notion Integration
Pocket Integration
Instapaper Integration
Medium Integration
Readwise Integration
Snipd Integration
Hypothesis Integration
Apps & Extensions
Chrome Extension
Safari Extension
Edge Add-ons
Firefox Add-ons
iOS App
Android App
Discover
Discover
Ideas
Discover new ideas and insights
Articles
Curated articles and insights
Books
Book recommendations by great minds
Posts
Essays and notes from readers
Quotes
Inspiring quotes collection
Videos
Curated videos and summaries
Explore Glasp
Glasp Newsletter
Weekly insights and updates
Glasp Talk
Interview series with great minds
Glasp Blog
Latest news and articles
Glasp Use Cases
Learn how others use Glasp
Build & Support
Glasp API
Access Glasp's API for developers
MCP Connector
Connect Glasp to Claude & ChatGPT
Community
Glasp Reddit Community
Students
Student discount and benefits
FAQs
Frequently Asked Questions
AboutPricing
DashboardLog inSign up

This AI Sings | Two Minute Papers #230

43.8K views
•
February 22, 2018
by
Two Minute Papers
YouTube video player
This AI Sings | Two Minute Papers #230

TL;DR

This paper introduces an AI vocoder that can generate realistic singing from MIDI and lyrics inputs, offering advantages in generation times and training data requirements.

Transcript

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. This work is about building an AI vocoder that is able to synthesize believable singing from MIDI and lyrics as inputs. But first, what is a vocoder? It works kinda like this. Fellow Scholars who are fans of Jean-Michel Jarre's music are likely very familiar with this effect... Read More

Key Insights

  • ⌛ The AI vocoder synthesizes singing from MIDI and lyrics inputs by separating pitch and timbre components, offering advantages in generation times and training data requirements.
  • 🛩️ The algorithm utilizes a modified WaveNet architecture with 2-by-1 dilated convolutions, enabling training on small datasets.
  • 💯 Mean opinion scores demonstrate that the AI vocoder outperforms previous methods in creating realistic singing.
  • 🎹 MIDI inputs can be easily created using a midi master keyboard or digital audio workstation programs, enhancing accessibility in the synthesis process.

Install to Summarize YouTube Videos and Get Transcripts

Explore YouTube Video Summarizer or Get YouTube Transcript Extractor

Questions & Answers

Q: How does the AI vocoder generate singing from MIDI and lyrics inputs?

The AI vocoder separates the pitch and timbre components of the voice, using MIDI data to determine the pitch and lyrics text to generate the words. It then synthesizes the singing by combining these elements.

Q: What are the advantages of using the AI vocoder over other methods?

The AI vocoder offers faster generation times, approximately 10-15 times real-time. Additionally, it requires a modest amount of training data, making it feasible to train on smaller datasets.

Q: How does the modified WaveNet architecture contribute to the AI vocoder?

The AI vocoder uses a modified WaveNet architecture with 2-by-1 dilated convolutions. This allows for an exponential growth in the receptive field of the model, while keeping the parameter count low.

Q: How does the AI vocoder compare to other methods in terms of creating authentic singing?

Mean opinion scores indicate that the AI vocoder performs well in generating singing that sounds genuine. It falls between previous works and reference singing footage, showcasing its effectiveness.

Summary & Key Takeaways

  • The AI vocoder can synthesize singing from MIDI and lyrics inputs, separating pitch and timbre components to generate waveforms.

  • The algorithm uses a modified WaveNet architecture with 2-by-1 dilated convolutions, enabling training on small datasets.

  • Mean opinion scores indicate that the new method outperforms previous works in creating authentic human-like singing.


Read in Other Languages (beta)

English

Share This Summary 📚

Summarize YouTube Videos and Get Video Transcripts with 1-Click

Download browser extensions on:

Try YouTube Summary with ChatGPT & Claude or YouTube Transcript Generator

Explore More Summaries from Two Minute Papers 📚

How to Create Virtual Worlds with AI thumbnail
How to Create Virtual Worlds with AI
Two Minute Papers
This Neural Network Learned The Style of Famous Illustrators thumbnail
This Neural Network Learned The Style of Famous Illustrators
Two Minute Papers
Finally, Instant Monsters! 🐉 thumbnail
Finally, Instant Monsters! 🐉
Two Minute Papers
NVIDIA’s Robot AI Finally Enters The Real World! 🤖 thumbnail
NVIDIA’s Robot AI Finally Enters The Real World! 🤖
Two Minute Papers
OpenAI’s DALL-E 3-Like AI For Free, Forever! thumbnail
OpenAI’s DALL-E 3-Like AI For Free, Forever!
Two Minute Papers
This Adorable Baby T-Rex AI Learned To Dribble 🦖 thumbnail
This Adorable Baby T-Rex AI Learned To Dribble 🦖
Two Minute Papers

Summarize YouTube Videos and Get Video Transcripts with 1-Click

Download browser extensions on:

Try YouTube Summary with ChatGPT & Claude or YouTube Transcript Generator

Apps & Extensions

  • Chrome Extension
  • Safari Extension
  • Edge Add-ons
  • Firefox Add-ons
  • iOS App
  • Android App

Key Features

  • YouTube Video Summarizer
  • Web & PDF Summarizer
  • Web & PDF Highlighter
  • Chat with PDF
  • Ask AI Clone
  • Audio Transcriber
  • Glasp Reader
  • Kindle Highlight Export
  • Idea Hatch

Integrations

  • Obsidian Plugin
  • Notion Integration
  • Pocket Integration
  • Instapaper Integration
  • Medium Integration
  • Readwise Integration
  • Snipd Integration
  • Hypothesis Integration

More Features

  • APIs
  • MCP Connector
  • Blog & Post
  • Embed Links
  • Image Highlight
  • Personality Test
  • Quote Shots

Company

  • About us
  • Blog
  • Community
  • FAQs
  • Job Board
  • Newsletter
  • Pricing
Terms

•

Privacy

•

Guidelines

© 2026 Glasp Inc. All rights reserved.