Skip to main content

Search Pipeline Architecture

The search pipeline is the core user-facing flow. A query enters, gets classified, routed to retrieval agents, synthesized by an LLM, and streamed back as structured results.

End-to-End Search Flow

Pipeline Stages (Terminal Visualization)

The user sees a 3-lane visualization during search, styled as a retro Nintendo 8-bit pipeline:

StageWhat HappensDuration
EMBEDOpenAI embeds query → 1536-dim vector (cached LRU, 500 max)~200ms
SCANSupabase search_transcripts RPC with HNSW index~500ms
SYNTHGPT-4o-mini synthesizes results with citations~2-4s

Search Modes

Users choose between two search modes before querying:

Direct Search (Default)

  • Model: GPT-4o-mini (fast, cheap) — aliased as haiku in code
  • Cost: 5 sats (planned)
  • Behavior: Returns 3-5 themed insights with source citations
  • Max tokens: 300 words
  • Tools available: searchBeliefs, semanticSearch, getEpisode

Playbook (Deep Analysis)

  • Model: GPT-4o (deeper reasoning) — aliased as sonnet in code
  • Cost: 25 sats (planned)
  • Behavior: Runs 4 analysis lenses in parallel, then synthesizes
  • Streaming: SSE events per lens, then synthesis stream

Playbook Analysis Flow

Each lens can fail independently — partial results are still returned.

Embedding Strategy

PropertyValue
Modeltext-embedding-3-large (OpenAI)
Dimensions1,536
Storage typehalfvec(1536) in Postgres
IndexHNSW with halfvec_cosine_ops (m=16, ef_construction=64)
CachingLRU cache, 500 entries max, in-memory

Thread Persistence

Every search automatically creates or appends to a thread:

  • Max 5 active threads per user
  • Thread counts (beliefs, speakers, episodes, themes) are scoped to that thread only
  • The 4 "squares" in the UI show thread-scoped counts, not global totals

Error Handling

FailureBehavior
Embedding timeoutReturn cached embedding if available, else error
Search returns 0 resultsShow "insufficient evidence" state with suggestions
Claude timeout (>60s)Abort, show timeout error with retry
Rate limit (429)Queue with backoff, show "busy" state
Supabase RPC errorLog error, return partial results if possible

Performance Budget

MetricTarget
Total search latency< 8 seconds
Embedding generation< 500ms (cached: ~1ms)
Vector search< 1 second
Synthesis streamingFirst token < 2s
Token budget12,000 tokens max per request