Lightweight semantic conversation routing engine. Embedding-based drift detection with sub-200ms latency, zero LLM API costs for routing decisions.
AI applications dump entire conversation history into every LLM call:
- Unfocused context → worse responses
- Token waste → higher costs
- No structure → can't query "what did we decide about X?"
driftos-embed uses local embeddings to detect topic shifts and route messages:
- STAY - Same topic, continue in current branch
- BRANCH - Topic drift detected, create new branch
- ROUTE - Return to a previous topic
Result: Focused context windows. 20 relevant messages instead of 1000.
| Approach | Latency | Cost | Accuracy |
|---|---|---|---|
| LLM-based routing | 500-2000ms | $0.001-0.01/call | High |
| Embedding-based | <200ms | $0 | Good |
driftos-embed uses paraphrase-MiniLM-L6-v2 for semantic similarity. Fast enough for real-time, accurate enough for production.
# Clone and install
git clone https://github.com/DriftOS/driftos-embed
cd driftos-embed
npm install
# Setup database
cp .env.example .env
npm run db:push
# Start embedding server (Python sidecar)
cd embedding-server && pip install -r requirements.txt
python server.py &
# Run
npm run devPOST /api/v1/drift/route
{
"conversationId": "conv-123",
"content": "I want to plan a trip to Japan",
"role": "user"
}Response:
{
"action": "BRANCH",
"driftAction": "BRANCH_NEW_CLUSTER",
"branchId": "branch-456",
"branchTopic": "I want to plan a trip to Japan",
"confidence": 1.0,
"similarity": 0,
"isNewBranch": true,
"isNewCluster": true
}POST /api/v1/drift/route
{
"conversationId": "conv-123",
"content": "What's the best time for cherry blossoms?",
"role": "user"
}Response:
{
"action": "STAY",
"driftAction": "STAY",
"branchId": "branch-456",
"similarity": 0.41,
"isNewBranch": false
}POST /api/v1/drift/route
{
"conversationId": "conv-123",
"content": "I need to sort out my tax return",
"role": "user"
}Response:
{
"action": "BRANCH",
"driftAction": "BRANCH_NEW_CLUSTER",
"branchId": "branch-789",
"similarity": 0.05,
"isNewBranch": true,
"isNewCluster": true
}POST /api/v1/drift/route
{
"conversationId": "conv-123",
"content": "Back to Japan - should I get a JR rail pass?",
"role": "user"
}Response:
{
"action": "ROUTE",
"driftAction": "STAY",
"branchId": "branch-456",
"similarity": 0.49,
"isNewBranch": false
}GET /api/v1/context/{branchId}Response:
{
"branchId": "branch-456",
"branchTopic": "I want to plan a trip to Japan",
"messages": [
{ "role": "user", "content": "I want to plan a trip to Japan" },
{ "role": "user", "content": "What's the best time for cherry blossoms?" },
{ "role": "user", "content": "Back to Japan - should I get a JR rail pass?" }
],
"allFacts": [
{
"branchTopic": "I want to plan a trip to Japan",
"isCurrent": true,
"facts": [
{ "key": "destination", "value": "Japan", "confidence": 1.0 }
]
}
]
}GET /api/v1/drift/branches/{conversationId}POST /api/v1/facts/{branchId}/extract| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/drift/route |
Route a message to a branch |
| GET | /api/v1/drift/branches/:conversationId |
List all branches |
| GET | /api/v1/context/:branchId |
Get optimized LLM context |
| POST | /api/v1/facts/:branchId/extract |
Extract facts from branch |
| GET | /api/v1/facts/:branchId |
Get existing facts |
# Required
DATABASE_URL=postgresql://...
# Drift Thresholds (tuned defaults)
DRIFT_STAY_THRESHOLD=0.38 # Above = same topic
DRIFT_NEW_CLUSTER_THRESHOLD=0.15 # Below = new domain
DRIFT_ROUTE_THRESHOLD=0.42 # Above = route to existing
# Embedding Server
EMBEDDING_SERVER_URL=http://localhost:8100
EMBEDDING_MODEL=paraphrase-MiniLM-L6-v2
# Optional: LLM for fact extraction
GROQ_API_KEY=your-key
LLM_MODEL=llama-3.1-8b-instant- Embed - Message is embedded using paraphrase-MiniLM-L6-v2
- Compare - Cosine similarity against current branch centroid
- Decide - Based on thresholds: STAY, BRANCH, or ROUTE
- Update - Branch centroid updated with running average
similarity > 0.38 → STAY (same topic)
similarity > 0.42 → ROUTE (if matches another branch)
similarity < 0.15 → BRANCH_NEW_CLUSTER (different domain)
else → BRANCH_SAME_CLUSTER (related subtopic)
Embedding-based routing requires threshold tuning for your use case. The defaults work well for general conversation, but you may need to adjust.
| Model | Size | Speed | Best For |
|---|---|---|---|
paraphrase-MiniLM-L6-v2 |
22M | ~30ms | Recommended. Trained for semantic similarity |
all-MiniLM-L6-v2 |
22M | ~30ms | General purpose, slightly less accurate for paraphrase detection |
all-mpnet-base-v2 |
110M | ~100ms | Higher accuracy, slower |
Key insight: Paraphrase-trained models outperform general-purpose embeddings for drift detection because they're optimized to recognize when two sentences mean the same thing.
Thresholds control sensitivity. Lower = more branches, higher = fewer branches.
# Conservative (fewer branches, may miss subtle shifts)
DRIFT_STAY_THRESHOLD=0.30
DRIFT_ROUTE_THRESHOLD=0.35
DRIFT_NEW_CLUSTER_THRESHOLD=0.10
# Default (balanced)
DRIFT_STAY_THRESHOLD=0.38
DRIFT_ROUTE_THRESHOLD=0.42
DRIFT_NEW_CLUSTER_THRESHOLD=0.15
# Aggressive (more branches, catches subtle shifts)
DRIFT_STAY_THRESHOLD=0.45
DRIFT_ROUTE_THRESHOLD=0.50
DRIFT_NEW_CLUSTER_THRESHOLD=0.20Questions and answers naturally have lower similarity (different sentence structures). The system applies a 1.3x boost when:
- Previous message contains
? - Current message does not contain
?
This keeps Q&A pairs together in the same branch.
The response includes similarity scores. Use these to tune:
{
"action": "STAY",
"similarity": 0.41, // Just above 0.38 threshold
"reason": "similar_to_current (0.410 > 0.38)"
}If you're seeing unexpected BRANCHes, check the similarity score and adjust thresholds accordingly.
Branch centroids update with a running average as messages are added. This means:
- Early messages have more influence on the centroid
- Long branches become more "settled" in their topic
- Very long branches may resist ROUTE back from other topics
For high-volume branches, consider periodic centroid recalculation.
Use with the official SDK:
npm install @driftos/clientimport { createDriftClient } from '@driftos/client';
const client = createDriftClient('http://localhost:3000');
const result = await client.route('conv-123', 'Plan my Japan trip');
const context = await client.getContext(result.branchId);
const prompt = await client.buildPrompt(result.branchId, 'You are a travel assistant');Or use via MCP with Claude Desktop: driftos-mcp-server
┌─────────────────────────────────────────────────────────┐
│ driftos-embed │
├─────────────────────────────────────────────────────────┤
│ Routes Layer │
│ └── /drift, /context, /facts, /branches │
├─────────────────────────────────────────────────────────┤
│ Services Layer │
│ ├── DriftService (routing orchestration) │
│ ├── ContextService (LLM context assembly) │
│ └── FactsService (LLM-based extraction) │
├─────────────────────────────────────────────────────────┤
│ Operations Layer │
│ ├── embedMessage (local embeddings) │
│ ├── classifyRouteEmbed (similarity + thresholds) │
│ └── executeRoute (branch/message creation) │
├─────────────────────────────────────────────────────────┤
│ Infrastructure │
│ ├── PostgreSQL + Prisma │
│ ├── Embedding Server (Python/FastAPI) │
│ └── Fastify + TypeScript │
└─────────────────────────────────────────────────────────┘
- Routing latency: <200ms
- Embedding generation: ~30ms
- Zero LLM costs for routing decisions
- LLM used only for fact extraction (optional)
- driftos-core - LLM-based routing (higher accuracy, higher latency)
- drift-sdk - TypeScript/JavaScript SDK
- driftos-mcp-server - MCP server for Claude Desktop
MIT
Patent Pending | driftos.dev