Search Pipeline Architecture

The search pipeline is the core user-facing flow. A query enters, gets classified, routed to retrieval agents, synthesized by an LLM, and streamed back as structured results.

End-to-End Search Flow

Pipeline Stages (Terminal Visualization)

The user sees a 3-lane visualization during search, styled as a retro Nintendo 8-bit pipeline:

Stage	What Happens	Duration
EMBED	OpenAI embeds query → 1536-dim vector (cached LRU, 500 max)	~200ms
SCAN	Supabase `search_transcripts` RPC with HNSW index	~500ms
SYNTH	GPT-4o-mini synthesizes results with citations	~2-4s

Search Modes

Users choose between two search modes before querying:

Direct Search (Default)

Model: GPT-4o-mini (fast, cheap) — aliased as haiku in code
Cost: 5 sats (planned)
Behavior: Returns 3-5 themed insights with source citations
Max tokens: 300 words
Tools available: searchBeliefs, semanticSearch, getEpisode

Playbook (Deep Analysis)

Model: GPT-4o (deeper reasoning) — aliased as sonnet in code
Cost: 25 sats (planned)
Behavior: Runs 4 analysis lenses in parallel, then synthesizes
Streaming: SSE events per lens, then synthesis stream

Playbook Analysis Flow

Each lens can fail independently — partial results are still returned.

Embedding Strategy

Property	Value
Model	`text-embedding-3-large` (OpenAI)
Dimensions	1,536
Storage type	`halfvec(1536)` in Postgres
Index	HNSW with `halfvec_cosine_ops` (m=16, ef_construction=64)
Caching	LRU cache, 500 entries max, in-memory

Thread Persistence

Every search automatically creates or appends to a thread:

Max 5 active threads per user
Thread counts (beliefs, speakers, episodes, themes) are scoped to that thread only
The 4 "squares" in the UI show thread-scoped counts, not global totals

Error Handling

Failure	Behavior
Embedding timeout	Return cached embedding if available, else error
Search returns 0 results	Show "insufficient evidence" state with suggestions
Claude timeout (>60s)	Abort, show timeout error with retry
Rate limit (429)	Queue with backoff, show "busy" state
Supabase RPC error	Log error, return partial results if possible

Performance Budget

Metric	Target
Total search latency	< 8 seconds
Embedding generation	< 500ms (cached: ~1ms)
Vector search	< 1 second
Synthesis streaming	First token < 2s
Token budget	12,000 tokens max per request

End-to-End Search Flow​

Pipeline Stages (Terminal Visualization)​

Search Modes​

Direct Search (Default)​

Playbook (Deep Analysis)​

Playbook Analysis Flow​

Embedding Strategy​

Thread Persistence​

Error Handling​

Performance Budget​