Person Schema
Person profiles aggregate beliefs across episodes into a single speaker identity with trust scoring, domain expertise, and semantic embeddings.
Person Profile
| Field | Type | Description |
|---|---|---|
person_id | string | Slug format (e.g., michael-saylor) |
name | string | Display name |
bio | string | Biography from Wikipedia enrichment |
belief_count | int | Total beliefs attributed to this person |
episode_count | int | Number of podcast appearances |
trust_badge | enum | bronze | silver | gold | platinum |
domain_scores | DomainScore[] | Top domain expertise scores |
top_quotes | TopQuote[] | Ranked quotes by confidence |
Domain Score
Each person has scores across the domains their beliefs touch.
| Field | Type | Description |
|---|---|---|
domain | string | Domain name (e.g., bitcoin, economics) |
score | float | Normalized score (0–100) |
belief_count | int | Number of beliefs in this domain |
Trust Badges
Trust scores combine five weighted factors into a 0–100 score, mapped to badge tiers.
Badge Tiers
| Badge | Threshold | Description |
|---|---|---|
| Platinum | ≥ 85 | Top-tier speakers with deep, verified track records |
| Gold | ≥ 70 | Established voices with significant content |
| Silver | ≥ 50 | Regular contributors with moderate presence |
| Bronze | ≥ 0 | Default tier for all profiled speakers |
Scoring Factors
| Factor | Weight | Max Normalization | Description |
|---|---|---|---|
| Years Active | 15% | 10 years | Longevity in the space |
| Words Spoken | 20% | 1M words (log scale) | Volume of content |
| Appearances | 20% | 100 episodes | Frequency of appearances |
| Audience Reach | 25% | 100M (log scale) | Size of potential audience |
| Verification Rate | 20% | — | Accuracy of attributed quotes |
The formula uses log-scale normalization for words spoken and audience reach to avoid penalizing smaller shows.
Person Pipeline (6 stages)
| Stage | Input | Output |
|---|---|---|
wiki_enrich | Person slug | Wikipedia bio, metadata |
sprite | Person profile | 8-bit NES sprite (PNG) |
person_matrix | All beliefs for speaker | Aggregated domain scores, top quotes |
person_embed | Person matrix | 1,536-dim person embedding |
build_index | All persons | Searchable person index |
build_viz | All persons + embeddings | 3D scatter plot data |
Storage Layout
data/persons/{person-slug}/
├── profile.json # Wikipedia metadata, bio, image URL
├── sprite.png # 8-bit avatar (generated)
├── matrix.json # Aggregated belief scores
├── beliefs.jsonl # All beliefs (JSONL format)
├── embedding.json # 1,536-dim person embedding
└── similarities.json # Most similar persons by embedding
Qdrant Collection
Person profiles are exported to the persons_v1 Qdrant collection for vector search:
| Field | Purpose |
|---|---|
| Vector | 1,536-dim person embedding (cosine similarity) |
| Payload | Full PersonProfileData as JSON |
This enables "find similar speakers" queries by comparing person-level embeddings.
Example
{
"person_id": "michael-saylor",
"name": "Michael Saylor",
"bio": "American entrepreneur and business executive. Co-founder of MicroStrategy...",
"belief_count": 47,
"episode_count": 8,
"trust_badge": "gold",
"domain_scores": [
{"domain": "bitcoin", "score": 92.3, "belief_count": 31},
{"domain": "economics", "score": 67.1, "belief_count": 12},
{"domain": "technology", "score": 45.8, "belief_count": 4}
],
"top_quotes": [
{
"text": "Bitcoin is the apex property of the human race",
"episode_id": "lex-fridman/michael-saylor-2024",
"confidence": 0.95,
"tier": 5
}
]
}