Search Pipeline Architecture
The search pipeline is the core user-facing flow. A query enters, gets classified, routed to retrieval agents, synthesized by an LLM, and streamed back as structured results.
End-to-End Search Flow
Pipeline Stages (Terminal Visualization)
The user sees a 3-lane visualization during search, styled as a retro Nintendo 8-bit pipeline:
| Stage | What Happens | Duration |
|---|---|---|
| EMBED | OpenAI embeds query → 1536-dim vector (cached LRU, 500 max) | ~200ms |
| SCAN | Supabase search_transcripts RPC with HNSW index | ~500ms |
| SYNTH | GPT-4o-mini synthesizes results with citations | ~2-4s |
Search Modes
Users choose between two search modes before querying:
Direct Search (Default)
- Model: GPT-4o-mini (fast, cheap) — aliased as
haikuin code - Cost: 5 sats (planned)
- Behavior: Returns 3-5 themed insights with source citations
- Max tokens: 300 words
- Tools available:
searchBeliefs,semanticSearch,getEpisode
Playbook (Deep Analysis)
- Model: GPT-4o (deeper reasoning) — aliased as
sonnetin code - Cost: 25 sats (planned)
- Behavior: Runs 4 analysis lenses in parallel, then synthesizes
- Streaming: SSE events per lens, then synthesis stream
Playbook Analysis Flow
Each lens can fail independently — partial results are still returned.
Embedding Strategy
| Property | Value |
|---|---|
| Model | text-embedding-3-large (OpenAI) |
| Dimensions | 1,536 |
| Storage type | halfvec(1536) in Postgres |
| Index | HNSW with halfvec_cosine_ops (m=16, ef_construction=64) |
| Caching | LRU cache, 500 entries max, in-memory |
Thread Persistence
Every search automatically creates or appends to a thread:
- Max 5 active threads per user
- Thread counts (beliefs, speakers, episodes, themes) are scoped to that thread only
- The 4 "squares" in the UI show thread-scoped counts, not global totals
Error Handling
| Failure | Behavior |
|---|---|
| Embedding timeout | Return cached embedding if available, else error |
| Search returns 0 results | Show "insufficient evidence" state with suggestions |
| Claude timeout (>60s) | Abort, show timeout error with retry |
| Rate limit (429) | Queue with backoff, show "busy" state |
| Supabase RPC error | Log error, return partial results if possible |
Performance Budget
| Metric | Target |
|---|---|
| Total search latency | < 8 seconds |
| Embedding generation | < 500ms (cached: ~1ms) |
| Vector search | < 1 second |
| Synthesis streaming | First token < 2s |
| Token budget | 12,000 tokens max per request |