Skip to content

Feature Request: Add Original Language Detection to Gladia STT Plugin #4402

@mowtschan

Description

@mowtschan

Summary

The Gladia STT plugin currently exposes target_language from translation data but does not expose the original_language (source language) that was detected/used during transcription. This information would be valuable for multilingual applications and language-aware workflows.

Current Behavior

The plugin currently handles translation data and sets the target language:

target_language = translation_data.get("target_language", "")
language = translated_utterance.get("language", target_language)
...
if translated_text and language:
    speech_data = stt.SpeechData(
        language=language,  # Use the target language
        ...

The SpeechData.language field is already being used to store the target_language for translated text. However, the original_language (the language that was actually spoken/detected) is not being captured or exposed anywhere.

Problem

When translation is enabled, we lose information about what language the user was actually speaking. The language field in SpeechData contains the target language (what the text was translated to), but there's no way to access the original/source language (what was actually spoken).
This becomes critical when using multiple translation_target_languages.
According to the documentation, when multiple target languages are specified, the plugin emits a separate transcription event for each language. Without the original language information, it becomes impossible to distinguish which translations came from which source language.

Proposed Solution

Add support for capturing and exposing the original_language from Gladia's transcription/translation response.

Store in speaker_id field
Since SpeechData.language is already taken by the target language, the speaker_id field could potentially be used to store the original language information:

...
original_language = translation_data.get("original_language", "")
...
if translated_text and language:
    speech_data = stt.SpeechData(
        ...
        speaker_id=original_language
        ...

Concerns: This feels like a misuse of the speaker_id field ;-(

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions