Skip to main content

Data Model Overview

All persistent data lives in Supabase (Postgres 17) with pgvector for embeddings. Row-Level Security is enabled on every table.

Entity Relationship Diagram

Table Summary

TableRowsPurpose
transcript_chunks11,546Chunked transcripts with halfvec(1536) embeddings + HNSW index
episode_metadata345Podcast episode info (title, speakers, audio_url, published_at)
persons24Speaker profiles (name, bio, wiki, trust badge, domain scores)
usersApp users (linked to Supabase Auth)
threadsSearch sessions (max 5 active per user)
messagesChat messages within threads
thread_beliefsSaved beliefs per thread (unique per chunk_id + thread_id)
community_cardsPublished community content
community_card_votesVote tracking
watchlist_itemsUser watchlist
alerts / notificationsAlert system
beta_invitesBeta access codes
speaker_claimsUser→speaker verification claims

RPC Functions

FunctionPurpose
search_transcripts(embedding, count, speaker?, episode?)Vector similarity search with HNSW
get_chunks_by_speaker(speaker, limit)Transcript chunks for a speaker
get_episode_by_id(id)Episode metadata lookup
get_speaker_aggregations()Speaker nodes for graph visualization
get_speaker_similarity(min_sim)Speaker-to-speaker similarity edges
get_speaker_coappearances(min_shared)Co-occurrence links
get_timeline_data(speaker)Speaker mentions over time
get_speaker_topics(speaker)Topics a speaker discusses
get_related_speakers(speaker)Similar speakers

Vector Index

CREATE INDEX ON transcript_chunks
USING hnsw (embedding halfvec_cosine_ops)
WITH (m = 16, ef_construction = 64);
  • Type: HNSW (Hierarchical Navigable Small World)
  • Operator: halfvec_cosine_ops (cosine similarity on half-precision vectors)
  • Parameters: m=16 (max connections per layer), ef_construction=64 (build quality)
  • Dimensions: 1,536 (halfvec(1536))

Migration History

23 migrations track the full schema evolution from initial functions through community features:

RangeFeature
000-006Core: functions, users, threads, messages, beta invites, speaker claims
007-011Transcripts: tables, search functions, foreign keys, RPC hardening
012-013Community: cards and votes
014-018Threads: thread model, beliefs, watchlist, alerts/notifications
019-023Polish: shared threads, speaker aggregations, deduplication, public access