This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
# Full build → produces build/OST.app
./build.sh
# Type-check only (no binary)
./build.sh --typecheck
# Run project checks (includes type-check)
./test.sh
# Clean build
./build.sh --clean
# Run the app
open build/OST.app
# If macOS blocks execution
xattr -dr com.apple.quarantine build/OST.appNo Xcode project is used for building. The build.sh script compiles all 22 Swift source files together via xcrun swiftc (CommandLineTools SDK, arm64, macOS 15.0 target) with Swift warnings and strict concurrency diagnostics treated as failures. The ./test.sh script uses system command-line tools only and runs shell syntax, source list, plist, documentation, workflow, regression, behavioral, and type-check gates. Use docs/manual-qa.md for release checks that require real macOS permissions, audio devices, Apple Translation language packs, or online fallback network behavior. There is no package manager or linter configured.
Adding new source files: When creating a new .swift file, you must also add it to the SOURCES array in build.sh — the compiler only sees files listed there.
OST is a macOS menu bar app (LSUIElement=true) that captures system audio, performs real-time speech recognition, and displays translated subtitles in a floating overlay.
ScreenCaptureKit (16kHz mono) → SpeechRecognizer → AppState → TranslationService → Overlay Views
SystemAudioCapture SFSpeech entries Translation.framework NSPanel
- SystemAudioCapture — Uses SCStream (audio-only, minimal 2x2 video) to produce
AsyncStream<AudioSampleBuffer>, a Sendable wrapper aroundCMSampleBuffer. A fresh stream is created per capture session. - SpeechRecognizer —
@MainActorwrapper aroundSFSpeechRecognizer. PublishescurrentText(partial) andfinalizedText(confirmed). Auto-restarts recognition after each final result or transient error for continuous listening. Note:isFinalonly fires when the recognition task ends (~60s timeout), not per sentence — AppState uses a Combine-based debounce timer oncurrentTextto detect speech pauses and create subtitle entries. - AppState — Central
@MainActor ObservableObject. Owns the pipeline lifecycle. Detects speech pauses via a debounce timer oncurrentText, then createsSubtitleEntryitems for translation. Also extracts complete sentences immediately when punctuation boundaries are detected (before the pause timer fires). Manages time-based expiry and max-line trimming. Uses Combine.sink(notAsyncPublisher) to bind speech output. Supports automatic language detection viaNaturalLanguageframework (switches recognizer locale after detecting spoken language from initial text). - TranslationService — Wraps
TranslationSession. The session is injected by SwiftUI's.translationTaskmodifier onSubtitleView(combined mode) orTranslationOverlayView(split mode), so the overlay must be shown before capture starts. Online Google Translate fallback is opt-in and only used when noTranslationSessionis available. - SubtitleView / OverlayWindow —
NSPanel(borderless, floating, click-through when locked) hosting SwiftUI content viaNSHostingViewwrapped in a plainNSViewcontainer (prevents hosting view from driving window resizes). Entries animate in/out with rolling display. Smart auto-scroll: locked mode always scrolls to bottom; unlocked mode pauses auto-scroll when user scrolls up, resumes when they return to bottom.
- Audio format: SCStream must output 16kHz mono PCM — SFSpeechRecognizer silently fails on 48kHz float32.
- On-device recognition: Check
supportsOnDeviceRecognitionbefore settingrequiresOnDeviceRecognition = true; missing models produce zero results with no error. - Translation session lifecycle:
.translationTaskonly fires when its view is rendered. The combined overlay (SubtitleView) or split translation overlay (TranslationOverlayView) must be visible beforestartCapture()is called. - Threading: Recognition callbacks arrive on arbitrary threads; all UI updates go through
Task { @MainActor in }.AppLoggerrequiresnonisolated static func post()for off-main-thread logging. - Recognition task restart timing: When restarting the recognition task (every ~60s), the new
SFSpeechAudioBufferRecognitionRequestmust be created and swapped intorecognitionRequestbefore cancelling the old task. Otherwise audio buffers arriving from the continuous stream are silently dropped during the gap. - Combine sink synchronization:
extractCompleteSentencesreceivessinkCurrentText(the value delivered by the Combine sink) rather than readingspeechRecognizer.currentTextdirectly, which may have changed asynchronously between delivery and execution.
OST/Sources/
├── App/ AppState (pipeline coordinator), OSTApp (entry point), WindowManager, Logger, SessionRecorder
├── Audio/ SystemAudioCapture (ScreenCaptureKit)
├── Speech/ SpeechRecognizer (SFSpeech), SupportedLanguages
├── Translation/ TranslationService, TranslationConfig (availability check)
├── Settings/ UserSettings (@AppStorage persistence, color serialization)
├── UI/ SubtitleView, RecognitionOverlayView, TranslationOverlayView, OverlayWindow, MenuBarView, SettingsView, FontSettingsView, LanguagePickerView, LogViewerView, SessionHistoryView
└── Accessibility/ AccessibilityManager
WindowManager is a centralized coordinator for all windows (overlay, settings, logs, session history). It reuses existing visible windows instead of creating duplicates. The overlay is borderless/floating; other windows are standard with specific sizing.
The overlay window supports lock/unlock toggling: locked = click-through (ignoresMouseEvents = true), unlocked = movable/resizable. resetOverlay() restores default position/size and re-locks.
UserSettings uses @AppStorage for all preferences. Colors are serialized via NSKeyedArchiver/NSKeyedUnarchiver since @AppStorage doesn't natively support Color. Overlay frame position/size is persisted and restored on launch.
AppState processes speech text through two mechanisms:
- Sentence extraction — When punctuation creates 2+ sentence boundaries in
liveText, all complete sentences are immediately consumed as subtitle entries. The last (in-progress) sentence remains asliveText. - Pause-based consumption — A configurable debounce timer consumes remaining
liveTextafter a speech pause (default 3s).
Between recognition task restarts (~60s cycle), lastConsumedTail preserves the tail of the previous session's text to detect and strip overlapping content from the new session.
startCapture() in OSTApp.swift must follow this exact sequence:
- Show overlay window (
windowManager.showOverlay) — attaches.translationTaskmodifier - Wait ~200ms for SwiftUI to render (
try? await Task.sleep) - Configure translation service (
translationService.configure) - Start audio capture (
appState.startCapture)
Skipping step 2 or reordering causes .translationTask to never fire, silently breaking translation.
AppKit, SwiftUI, Speech, ScreenCaptureKit, CoreMedia, Translation, Combine, NaturalLanguage
- Screen Recording (for ScreenCaptureKit access)
- System Audio Recording (for system audio capture on macOS 15+)
- Speech Recognition (for SFSpeechRecognizer)
README is maintained in 4 languages. When modifying README.md, apply the same changes to all translations:
README.md— English (primary)README.ko.md— 한국어README.zh.md— 中文README.ja.md— 日本語
Each file has a language selector at the top. Code blocks, URLs, image paths, and CLI commands should remain identical across all versions.