Skip to content

Add conversation_kit + typed_input skill: a product-agnostic language layer for voice agents#43

Open
jakubkarolczyk wants to merge 10 commits into
mainfrom
add-conversation-kit
Open

Add conversation_kit + typed_input skill: a product-agnostic language layer for voice agents#43
jakubkarolczyk wants to merge 10 commits into
mainfrom
add-conversation-kit

Conversation

@jakubkarolczyk

Copy link
Copy Markdown

Summary

Adds signalwire.conversation_kit — a small, zero-dependency, product-agnostic
language layer for voice agents — and a typed_input skill built on it. Purely
additive: no existing files are modified (+2205 / −0 across 17 new files).

It covers the two halves of a voice turn's language handling, neither tied to any
product: understand what the caller said (spoken → value) and speak values
back correctly (value → spoken).

What's added

conversation_kit/ (stdlib only, zero third-party deps)

  • datescompute_date: resolve a spoken date the caller named (weekday +
    this/next, today/tomorrow, "in N days", explicit day/month/year) to a calendar
    date, so the model never does calendar math. Ships RESOLVE_DATE_PARAMS
    (a ready-made JSON-schema fragment) + WEEKDAYS.
  • inputsvalidate_input (email/phone/number) and input_request_payload
    / INPUT_REQUEST_TYPE for a typed-input (on-screen keypad) channel.
  • verbalizer — TTS-ready per-language output (numbers, units, dates, times,
    emails, acronym spelling) behind a plugin registry, plus a guidance() prompt
    helper. A concrete Verbalizer base (safe, language-neutral fallback) with
    en/pl plugins; add a language by subclassing and register()-ing it.
    get(lang) falls back to the neutral base for an unregistered language.

skills/typed_input/

A SkillBase skill that collects a value the caller types on an on-screen
keypad (email/phone/number) for cases speech-to-text can't capture: it emits an
input_request user-event, parks via wait_for_user, then validates and reads
back the RAW typed value (never a model argument, so a typo is never silently
"corrected"). One instance per field, per-language prompts. Reuses
conversation_kit.inputs.

Design

  • Zero third-party dependencies in conversation_kit (stdlib only); a
    self-contained leaf that never imports the rest of the SDK.
  • Deterministic, not generative — same input, same output.
  • Product-agnostic — no product names, business logic, or I/O; the app
    supplies domain wording (e.g. domain acronyms via a Verbalizer subclass).
  • Plugin languages — output is per-language behind one interface; additive.

Testing

  • 46 unit tests (tests/unit/conversation_kit/, tests/unit/skills/).
  • ruff format + ruff check clean; mypy clean on the added packages.

Notes for reviewers

  • Purely additive — no existing file touched; branch is current with main.
  • Packaging: picked up by the existing find-packages config; the skill README
    ships as package data.
  • conversation_kit is imported as signalwire.conversation_kit; not yet
    re-exported from the top-level namespace — happy to add that if preferred.
  • Per-package README.mds document usage and how to add a language.

…oice agents

New leaf subpackage signalwire.conversation_kit — the deterministic pieces a
voice agent needs to understand input, compute values, and speak output
correctly, none tied to any product:

- dates:      compute_date (spoken day -> ISO calendar math), WEEKDAYS, and
              RESOLVE_DATE_PARAMS (a resolve_date tool's JSON-schema fragment).
- inputs:     validate_input + is_valid_email/phone/number, and
              input_request_payload for the typed-input (on-screen keypad) channel.
- verbalizer: TTS-ready per-language output (number/unit/date/email/spell/
              measure_text) plus prompt guidance(), behind a small plugin registry.
              English and Polish ship; get(lang) falls back to English.

Zero dependencies. 20 unit tests under tests/unit/conversation_kit.
…een keypad

A multi-instance skill (one instance per field) for collecting a value the caller
TYPES on an on-screen keypad — email, phone, account number — when speech-to-text
can't capture it reliably.

- request_<field>: speak a "type it on screen" line, emit an input_request user
  event so a connected client reveals/focuses the field, then wait_for_user.
- confirm_<field>: read the raw typed value from global_data['typed_<field>'],
  validate it, reopen on missing/invalid, else read it back to confirm. The value
  is never a model argument, so a typo can't be silently altered.

Per-language prompts resolve against global_data['language'] at call time, so one
instance serves a multilingual agent. Validation, the user-event payload, and the
spoken read-back come from signalwire.conversation_kit. 12 unit tests.
…tter

Add Verbalizer.spell_acronyms(text): reads generic technical acronyms (DIN, ISO,
PPV, RMS, UTC) letter-by-letter via the per-language alphabet, so a TTS engine says
"er em es" instead of mangling "RMS" into a word.

Whole-token, case-sensitive matching (longest first) so it never touches a lowercase
word ("din"), a substring inside a longer word ("isolation"), or an unknown all-caps
name/code. The acronym set is a ClassVar, extensible per subclass. Two unit tests.
The verbalizer already read ISO measured values and acronyms; add temporal
verbalization so a language that needs it (Polish) speaks dates and clock
times naturally instead of letting the model guess at the digits — on a
combined timestamp it reads the day into the minutes.

- base.Verbalizer: extend date() with with_weekday/with_year, add time()
  (24h passthrough) and a VERBALIZES_DATETIME opt-in flag; datetime_text()
  rewrites ISO dates and date-times in free text (date-times first; a
  trailing UTC/Z is left in place for spell_acronyms to read). Base and
  English stay a no-op — they read ISO acceptably.
- pl: _PL_HOURS feminine hour names + time() (on-the-hour reads hour only;
  a single-digit minute keeps its leading zero), VERBALIZES_DATETIME=True.
- tests: time() and datetime_text() PL cases + English no-op (24 total).
The README predated the acronym-spelling and date/time verbalization work, so
spell_acronyms/datetime_text/time() and the ACRONYMS/VERBALIZES_DATETIME attrs
were undocumented. Update it and add the pieces an AI coding agent needs to
extend the package safely:

- Module map: every file, its responsibility, and its public names.
- Full verbalizer surface: methods table + the three free-text passes
  (measure_text -> datetime_text -> spell_acronyms) with their gating class
  attrs and required run-order.
- Adding a language: the class attributes that drive the shared methods, and
  the VERBALIZES_DATETIME opt-in.
- Testing: the pytest command + the reference plugin to mirror.
- Invariants: zero deps, no SDK import, product-agnostic, base-is-a-fallback,
  passes-are-no-op-by-default — the contract not to break.

Docs only; no runtime change.
compute_date could express today/tomorrow and weekdays but not a spoken day
COUNT ('in two days' / 'za dwa dni') — the model had to approximate it as
'tomorrow' or, worse, as day-of-month 2. Add an in_days integer param + handling
(today + N), kept distinct from the calendar day-of-month so an offset never
lands on the wrong date. Tests cover the offset and that day-of-month still wins.
…gnostic)

Addresses an independent critical review. CI green: ruff format+check, mypy
(check_untyped_defs), pytest (33 passed, was 25).

Crashes:
- pl cardinal() rewritten to a general 1000-grouping algorithm with millions +
  milliards tiers (KeyError'd >= 1_000_000, incl. long fractions); raises
  ValueError above the milliard scale.
- base measure_text ranges used a hardcoded Polish 'do'; add RANGE_WORD ClassVar
  ('to' base, 'do' pl) so a non-Polish plugin's ranges aren't Polish.
- datetime_text guards its callbacks + pl.date() validates up front, so a
  date-shaped-but-invalid token ('2026-13-45', '25:99') is left untouched.

Robustness / correctness:
- compute_date: 'the 31st' in a short month rolls forward; an out-of-range
  explicit month/year is None (not silently today's); bool excluded from int
  day/month/year/in_days; drop undocumented which-synonyms.
- datetime_text normalizes a trailing Z/UTC to a spellable ' UTC'.
- time() validates 0-23 / 0-59 (base + pl).
- inputs.is_valid_number rejects nan/inf.
- registry.get() falls back to the neutral BASE (not English) for an unknown
  language, so it keeps the generic guidance(); docstrings + README updated.
- fix package docstrings (from signalwire.conversation_kit ...); export Numeric;
  input_request_payload -> dict[str,str]; WEEKDAYS -> tuple.

Product-agnostic:
- drop domain-specific 'PPV' from the base ACRONYMS default (apps add domain
  acronyms by subclass); scrub product-y test data.

Deferred: Polish 'od <gen> do <gen>' range grammar (needs a genitive declension
table) — kept the nominative form as a documented simplification.
Keep the package free of any originating-product fingerprint before PR:
- replace a real personal-name email (karolczyk.jakub@…) with a fictional
  jan.kowalski@example.com
- rename the typed-input example field installer_email -> contact_email
- neutralize domain-flavored sample text: 'RMS velocity'/'PPV:' -> 'reading'/
  'value'; the vibration standards ISO 10816 / DIN 4150-3 -> domain-neutral
  ISO 9001 / DIN 5008-1

Tests + README only; no runtime change (33 tests still pass).
…prompts

Pre-PR review of the typed_input skill (CI already green; it correctly reuses
conversation_kit.inputs rather than duplicating). Two fixes:
- scrub the domain-flavored 'installer_email' / 'Installer's email' example from
  the docstring, README, and tests -> neutral 'contact_email' / 'Contact email'.
- setup() now validates the three required per-language prompt maps
  (open_prompt / field_label / invalid_prompt): the schema marks them required
  but the loader doesn't enforce it, so a missing one would silently speak '' at
  runtime. Fail loud instead; test added.

CI green: ruff format+check, mypy, pytest (13 typed_input tests).
…t audit

The _check() helper asserted inside itself, so the no-cheat audit's static scan
saw the test bodies as assertion-free and flagged six as cheat tests. Inline each
case as a visible 'assert fn(value) == expected' loop and drop the helper — the
coverage is identical, now visible to the audit. No behaviour change; 46 tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant