driftos-embed

Lightweight semantic conversation routing engine. Embedding-based drift detection with sub-200ms latency, zero LLM API costs for routing decisions.

The Problem

AI applications dump entire conversation history into every LLM call:

Unfocused context → worse responses
Token waste → higher costs
No structure → can't query "what did we decide about X?"

The Solution

driftos-embed uses local embeddings to detect topic shifts and route messages:

STAY - Same topic, continue in current branch
BRANCH - Topic drift detected, create new branch
ROUTE - Return to a previous topic

Result: Focused context windows. 20 relevant messages instead of 1000.

Why Embeddings?

Approach	Latency	Cost	Accuracy
LLM-based routing	500-2000ms	$0.001-0.01/call	High
Embedding-based	<200ms	$0	Good

driftos-embed uses paraphrase-MiniLM-L6-v2 for semantic similarity. Fast enough for real-time, accurate enough for production.

Quick Start

# Clone and install
git clone https://github.com/DriftOS/driftos-embed
cd driftos-embed
npm install

# Setup database
cp .env.example .env
npm run db:push

# Start embedding server (Python sidecar)
cd embedding-server && pip install -r requirements.txt
python server.py &

# Run
npm run dev

API

Route a Message

POST /api/v1/drift/route
{
  "conversationId": "conv-123",
  "content": "I want to plan a trip to Japan",
  "role": "user"
}

Response:

{
  "action": "BRANCH",
  "driftAction": "BRANCH_NEW_CLUSTER",
  "branchId": "branch-456",
  "branchTopic": "I want to plan a trip to Japan",
  "confidence": 1.0,
  "similarity": 0,
  "isNewBranch": true,
  "isNewCluster": true
}

Subsequent Messages

POST /api/v1/drift/route
{
  "conversationId": "conv-123",
  "content": "What's the best time for cherry blossoms?",
  "role": "user"
}

Response:

{
  "action": "STAY",
  "driftAction": "STAY",
  "branchId": "branch-456",
  "similarity": 0.41,
  "isNewBranch": false
}

Topic Shift Detection

POST /api/v1/drift/route
{
  "conversationId": "conv-123",
  "content": "I need to sort out my tax return",
  "role": "user"
}

Response:

{
  "action": "BRANCH",
  "driftAction": "BRANCH_NEW_CLUSTER",
  "branchId": "branch-789",
  "similarity": 0.05,
  "isNewBranch": true,
  "isNewCluster": true
}

Route Back to Previous Topic

POST /api/v1/drift/route
{
  "conversationId": "conv-123",
  "content": "Back to Japan - should I get a JR rail pass?",
  "role": "user"
}

Response:

{
  "action": "ROUTE",
  "driftAction": "STAY",
  "branchId": "branch-456",
  "similarity": 0.49,
  "isNewBranch": false
}

Get Context for LLM

GET /api/v1/context/{branchId}

Response:

{
  "branchId": "branch-456",
  "branchTopic": "I want to plan a trip to Japan",
  "messages": [
    { "role": "user", "content": "I want to plan a trip to Japan" },
    { "role": "user", "content": "What's the best time for cherry blossoms?" },
    { "role": "user", "content": "Back to Japan - should I get a JR rail pass?" }
  ],
  "allFacts": [
    {
      "branchTopic": "I want to plan a trip to Japan",
      "isCurrent": true,
      "facts": [
        { "key": "destination", "value": "Japan", "confidence": 1.0 }
      ]
    }
  ]
}

List Branches

GET /api/v1/drift/branches/{conversationId}

Extract Facts

POST /api/v1/facts/{branchId}/extract

Endpoints

Method	Endpoint	Description
POST	`/api/v1/drift/route`	Route a message to a branch
GET	`/api/v1/drift/branches/:conversationId`	List all branches
GET	`/api/v1/context/:branchId`	Get optimized LLM context
POST	`/api/v1/facts/:branchId/extract`	Extract facts from branch
GET	`/api/v1/facts/:branchId`	Get existing facts

Configuration

# Required
DATABASE_URL=postgresql://...

# Drift Thresholds (tuned defaults)
DRIFT_STAY_THRESHOLD=0.38        # Above = same topic
DRIFT_NEW_CLUSTER_THRESHOLD=0.15 # Below = new domain
DRIFT_ROUTE_THRESHOLD=0.42       # Above = route to existing

# Embedding Server
EMBEDDING_SERVER_URL=http://localhost:8100
EMBEDDING_MODEL=paraphrase-MiniLM-L6-v2

# Optional: LLM for fact extraction
GROQ_API_KEY=your-key
LLM_MODEL=llama-3.1-8b-instant

How It Works

Embed - Message is embedded using paraphrase-MiniLM-L6-v2
Compare - Cosine similarity against current branch centroid
Decide - Based on thresholds: STAY, BRANCH, or ROUTE
Update - Branch centroid updated with running average

Threshold Logic

similarity > 0.38  → STAY (same topic)
similarity > 0.42  → ROUTE (if matches another branch)
similarity < 0.15  → BRANCH_NEW_CLUSTER (different domain)
else               → BRANCH_SAME_CLUSTER (related subtopic)

Tuning Guide

Embedding-based routing requires threshold tuning for your use case. The defaults work well for general conversation, but you may need to adjust.

Choosing an Embedding Model

Model	Size	Speed	Best For
`paraphrase-MiniLM-L6-v2`	22M	~30ms	Recommended. Trained for semantic similarity
`all-MiniLM-L6-v2`	22M	~30ms	General purpose, slightly less accurate for paraphrase detection
`all-mpnet-base-v2`	110M	~100ms	Higher accuracy, slower

Key insight: Paraphrase-trained models outperform general-purpose embeddings for drift detection because they're optimized to recognize when two sentences mean the same thing.

Threshold Tuning

Thresholds control sensitivity. Lower = more branches, higher = fewer branches.

# Conservative (fewer branches, may miss subtle shifts)
DRIFT_STAY_THRESHOLD=0.30
DRIFT_ROUTE_THRESHOLD=0.35
DRIFT_NEW_CLUSTER_THRESHOLD=0.10

# Default (balanced)
DRIFT_STAY_THRESHOLD=0.38
DRIFT_ROUTE_THRESHOLD=0.42
DRIFT_NEW_CLUSTER_THRESHOLD=0.15

# Aggressive (more branches, catches subtle shifts)
DRIFT_STAY_THRESHOLD=0.45
DRIFT_ROUTE_THRESHOLD=0.50
DRIFT_NEW_CLUSTER_THRESHOLD=0.20

Q&A Pair Handling

Questions and answers naturally have lower similarity (different sentence structures). The system applies a 1.3x boost when:

Previous message contains ?
Current message does not contain ?

This keeps Q&A pairs together in the same branch.

Debugging Similarity Scores

The response includes similarity scores. Use these to tune:

{
  "action": "STAY",
  "similarity": 0.41,  // Just above 0.38 threshold
  "reason": "similar_to_current (0.410 > 0.38)"
}

If you're seeing unexpected BRANCHes, check the similarity score and adjust thresholds accordingly.

Centroid Drift

Branch centroids update with a running average as messages are added. This means:

Early messages have more influence on the centroid
Long branches become more "settled" in their topic
Very long branches may resist ROUTE back from other topics

For high-volume branches, consider periodic centroid recalculation.

SDK & MCP

Use with the official SDK:

npm install @driftos/client

import { createDriftClient } from '@driftos/client';

const client = createDriftClient('http://localhost:3000');

const result = await client.route('conv-123', 'Plan my Japan trip');
const context = await client.getContext(result.branchId);
const prompt = await client.buildPrompt(result.branchId, 'You are a travel assistant');

Or use via MCP with Claude Desktop: driftos-mcp-server

Architecture

┌─────────────────────────────────────────────────────────┐
│                     driftos-embed                        │
├─────────────────────────────────────────────────────────┤
│  Routes Layer                                            │
│  └── /drift, /context, /facts, /branches                │
├─────────────────────────────────────────────────────────┤
│  Services Layer                                          │
│  ├── DriftService (routing orchestration)               │
│  ├── ContextService (LLM context assembly)              │
│  └── FactsService (LLM-based extraction)                │
├─────────────────────────────────────────────────────────┤
│  Operations Layer                                        │
│  ├── embedMessage (local embeddings)                    │
│  ├── classifyRouteEmbed (similarity + thresholds)       │
│  └── executeRoute (branch/message creation)             │
├─────────────────────────────────────────────────────────┤
│  Infrastructure                                          │
│  ├── PostgreSQL + Prisma                                │
│  ├── Embedding Server (Python/FastAPI)                  │
│  └── Fastify + TypeScript                               │
└─────────────────────────────────────────────────────────┘

Performance

Routing latency: <200ms
Embedding generation: ~30ms
Zero LLM costs for routing decisions
LLM used only for fact extraction (optional)

Related Projects

driftos-core - LLM-based routing (higher accuracy, higher latency)
drift-sdk - TypeScript/JavaScript SDK
driftos-mcp-server - MCP server for Claude Desktop

License

MIT

Patent Pending | driftos.dev

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
docker		docker
docs		docs
embedding-server		embedding-server
plop-templates		plop-templates
prisma		prisma
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
plopfile.js		plopfile.js
setup.sh		setup.sh
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

driftos-embed

The Problem

The Solution

Why Embeddings?

Quick Start

API

Route a Message

Subsequent Messages

Topic Shift Detection

Route Back to Previous Topic

Get Context for LLM

List Branches

Extract Facts

Endpoints

Configuration

How It Works

Threshold Logic

Tuning Guide

Choosing an Embedding Model

Threshold Tuning

Q&A Pair Handling

Debugging Similarity Scores

Centroid Drift

SDK & MCP

Architecture

Performance

Related Projects

License

About

Uh oh!

Releases 1

Packages

Languages

License

DriftOS/driftos-embed

Folders and files

Latest commit

History

Repository files navigation

driftos-embed

The Problem

The Solution

Why Embeddings?

Quick Start

API

Route a Message

Subsequent Messages

Topic Shift Detection

Route Back to Previous Topic

Get Context for LLM

List Branches

Extract Facts

Endpoints

Configuration

How It Works

Threshold Logic

Tuning Guide

Choosing an Embedding Model

Threshold Tuning

Q&A Pair Handling

Debugging Similarity Scores

Centroid Drift

SDK & MCP

Architecture

Performance

Related Projects

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages