Skip to content

fix(instructions): add CJK context detection for Chinese/Japanese/Korean conversations#335

Open
lg320531124 wants to merge 1 commit into
DietrichGebert:mainfrom
lg320531124:fix/cjk-handling
Open

fix(instructions): add CJK context detection for Chinese/Japanese/Korean conversations#335
lg320531124 wants to merge 1 commit into
DietrichGebert:mainfrom
lg320531124:fix/cjk-handling

Conversation

@lg320531124

Copy link
Copy Markdown

Problem

Ponytail's style rules are designed for English-only conversations. In CJK (Chinese/Japanese/Korean) contexts:

  1. "Drop articles" is misleading — Chinese has no articles, so the instruction can cause the model to strip English articles embedded in Chinese text (e.g. "a LLM 模型" → "LLM 模型"), losing semantic specificity
  2. "Short synonyms" only works in English — CJK characters are already compact
  3. No CJK brevity guidance — Chinese users get English examples that don't map to their language patterns

Fix

Add CJK detection to getPonytailInstructions(). When CJK characters are detected in user input, append a CJK Context section:

## CJK Context
- Chinese has no articles — do not strip English articles in mixed text
- "Short synonyms" rule applies to English only; CJK characters are already compact
- Output brevity: 简短回答,不用敬语,不用客套话
- Code rules (YAGNI, stdlib first, shortest diff) are language-agnostic — apply fully

This mirrors the approach in caveman PR #576 which adds CJK detection to the compression engine.

The inputText parameter to getPonytailInstructions() is optional — when not provided, behavior is unchanged (backward compatible).

Fixes #333

@lg320531124

Copy link
Copy Markdown
Author

Gentle ping — open ~2 days. Small, focused fix (+22/-6): adds CJK context detection so Chinese/Japanese/Korean conversations aren't mangled by compression heuristics tuned for Latin text. No CI gating; ready when you have a moment.

…ean conversations

Ponytail's style rules are English-centric. In CJK contexts:
- 'Drop articles' is misleading (Chinese has no articles)
- 'Short synonyms' only works in English
- Output brevity needs CJK-specific guidance

Now detects CJK characters in user input and appends a CJK Context
section that:
- Warns not to strip English articles in mixed CJK text
- Notes short synonyms are English-only
- Adds Chinese brevity guidance (简短回答,不用敬语)
- Confirms code rules (YAGNI, stdlib first) are language-agnostic

Mirrors the approach in caveman PR #576 for CJK handling.

Fixes DietrichGebert#333
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: No CJK handling — English style rules conflict with Chinese/mixed-language conversations

1 participant