Skip to content

SlugSecurity/orange-agent

Repository files navigation

orange-agent

Real-time voice agent with voice cloning. It listens on a virtual PulseAudio device, transcribes with Whisper, talks back through an LLM, and clones any voice with Qwen TTS. You can interrupt it mid-sentence and it'll stop and listen

Setup

You'll need Nix with flakes (it handles CUDA, PulseAudio, and all native deps), an NVIDIA GPU, and either Ollama running locally or an OpenRouter API key

nix develop

ollama serve &
ollama pull qwen3:8b

Voice profiles

Each profile lives in profiles/<name>/ and needs two files, a reference.wav with about 10 seconds of the target voice, and a profile.toml with the transcript and personality.

If you have an audio or video clip of someone talking, the prepare script will convert it to the right format and transcribe it for you:

python scripts/prepare_reference.py clip.mp4 --profile myvoice

This converts to mono 24kHz wav, runs Whisper on it, and prints the transcript to paste into your profile.toml:

voice_transcript = "<transcript from prepare script>"

personality_prompt = """<how the agent should behave>"""

Audio devices

The pipeline reads from a BotSpeaker monitor and writes to a BotMic sink. Set these up however you want, PipeWire virtual devices, pactl load-module, whatever. The names are configurable in config.toml

Usage

orange-agent --profile <name>

You can pass a scenario prompt to set the context for the conversation:

orange-agent --profile <name> --scenario "casual voice call with friends"

Testing TTS

You can test a profile's voice directly without running the full pipeline. Pass text to speak, or let the LLM generate something in character:

python scripts/test_tts.py --profile <name> "text to speak"
python scripts/test_tts.py --profile <name> --generate --scenario "casual voice call"
python scripts/test_tts.py --profile <name> -o output.wav "save to file"

Config

Everything else lives in config.toml, LLM provider and model, audio device names, VAD sensitivity, timing thresholds. To use OpenRouter instead of Ollama:

[llm]
provider = "openrouter"
model = "google/gemini-2.0-flash-001"
export OPENROUTER_API_KEY="sk-..."

About

Real time voice agent llm thing

Resources

Stars

Watchers

Forks