Data Model Overview
All persistent data lives in Supabase (Postgres 17) with pgvector for embeddings. Row-Level Security is enabled on every table.
Entity Relationship Diagram
Table Summary
| Table | Rows | Purpose |
|---|---|---|
transcript_chunks | 11,546 | Chunked transcripts with halfvec(1536) embeddings + HNSW index |
episode_metadata | 345 | Podcast episode info (title, speakers, audio_url, published_at) |
persons | 24 | Speaker profiles (name, bio, wiki, trust badge, domain scores) |
users | — | App users (linked to Supabase Auth) |
threads | — | Search sessions (max 5 active per user) |
messages | — | Chat messages within threads |
thread_beliefs | — | Saved beliefs per thread (unique per chunk_id + thread_id) |
community_cards | — | Published community content |
community_card_votes | — | Vote tracking |
watchlist_items | — | User watchlist |
alerts / notifications | — | Alert system |
beta_invites | — | Beta access codes |
speaker_claims | — | User→speaker verification claims |
RPC Functions
| Function | Purpose |
|---|---|
search_transcripts(embedding, count, speaker?, episode?) | Vector similarity search with HNSW |
get_chunks_by_speaker(speaker, limit) | Transcript chunks for a speaker |
get_episode_by_id(id) | Episode metadata lookup |
get_speaker_aggregations() | Speaker nodes for graph visualization |
get_speaker_similarity(min_sim) | Speaker-to-speaker similarity edges |
get_speaker_coappearances(min_shared) | Co-occurrence links |
get_timeline_data(speaker) | Speaker mentions over time |
get_speaker_topics(speaker) | Topics a speaker discusses |
get_related_speakers(speaker) | Similar speakers |
Vector Index
CREATE INDEX ON transcript_chunks
USING hnsw (embedding halfvec_cosine_ops)
WITH (m = 16, ef_construction = 64);
- Type: HNSW (Hierarchical Navigable Small World)
- Operator:
halfvec_cosine_ops(cosine similarity on half-precision vectors) - Parameters: m=16 (max connections per layer), ef_construction=64 (build quality)
- Dimensions: 1,536 (
halfvec(1536))
Migration History
23 migrations track the full schema evolution from initial functions through community features:
| Range | Feature |
|---|---|
| 000-006 | Core: functions, users, threads, messages, beta invites, speaker claims |
| 007-011 | Transcripts: tables, search functions, foreign keys, RPC hardening |
| 012-013 | Community: cards and votes |
| 014-018 | Threads: thread model, beliefs, watchlist, alerts/notifications |
| 019-023 | Polish: shared threads, speaker aggregations, deduplication, public access |