Handle YouTube transcript errors and improve metadata extraction by Sasivarnasarma · Pull Request #2178 · microsoft/markitdown

Sasivarnasarma · 2026-06-30T23:27:26Z

Description

This Pull Request addresses an issue in YouTubeConverter where transcript listing failures crash the entire converter, causing it to fall back to HtmlConverter and output raw, unreadable scraping HTML (such as cookie walls) instead of clean video details.

Problem

In markitdown/converters/_youtube_converter.py, the initial call to retrieve the transcripts list (ytt_api.list(video_id)) was executed outside of the safety try/except block.

Transcript listing and fetching commonly fail due to:

IP Blocking / Rate Limiting: Datacenter/CI IP addresses are regularly blocked or rate-limited by YouTube.
Subtitles Disabled: The target video does not have any manual or auto-generated captions.
Age Gate / Region Restrictions: The video requires authentication cookies.

When list() raises an exception under these conditions, the entire YouTubeConverter.convert method crashes. The orchestrator catches the crash and falls back to HtmlConverter, which yields a bad user experience (raw HTML of the YouTube sign-in page / cookie consent banner) instead of the successfully extracted video metadata (title, views, runtime, description).

Solution

Wrapped the transcript listing operations (list() and language extraction) in the outer try/except block.
If listing or fetching the transcript fails, the converter now gracefully handles the exception, prints the failure, sets a fallback string *(Transcript unavailable)*, and continues execution.
This ensures the output Markdown still contains the successfully parsed video title, views, keywords, runtime, and description.

Wraps YouTube transcript listing and retrieval in a try/except block. This prevents the converter from crashing and falling back to HtmlConverter when transcripts are disabled, rate-limited, or blocked. Instead, the converter now gracefully continues and returns the successfully extracted video metadata and description.

Sasivarnasarma · 2026-06-30T23:31:34Z

@microsoft-github-policy-service agree

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle YouTube transcript errors and improve metadata extraction#2178

Handle YouTube transcript errors and improve metadata extraction#2178
Sasivarnasarma wants to merge 1 commit into
microsoft:mainfrom
Sasivarnasarma:fix/youtube-transcript-fallback

Sasivarnasarma commented Jun 30, 2026

Uh oh!

Sasivarnasarma commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Sasivarnasarma commented Jun 30, 2026

Description

Problem

Solution

Uh oh!

Sasivarnasarma commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant