Person Schema

Person profiles aggregate beliefs across episodes into a single speaker identity with trust scoring, domain expertise, and semantic embeddings.

Person Profile

Field	Type	Description
`person_id`	string	Slug format (e.g., `michael-saylor`)
`name`	string	Display name
`bio`	string	Biography from Wikipedia enrichment
`belief_count`	int	Total beliefs attributed to this person
`episode_count`	int	Number of podcast appearances
`trust_badge`	enum	`bronze` \| `silver` \| `gold` \| `platinum`
`domain_scores`	DomainScore[]	Top domain expertise scores
`top_quotes`	TopQuote[]	Ranked quotes by confidence

Domain Score

Each person has scores across the domains their beliefs touch.

Field	Type	Description
`domain`	string	Domain name (e.g., `bitcoin`, `economics`)
`score`	float	Normalized score (0–100)
`belief_count`	int	Number of beliefs in this domain

Trust Badges

Trust scores combine five weighted factors into a 0–100 score, mapped to badge tiers.

Badge Tiers

Badge	Threshold	Description
Platinum	≥ 85	Top-tier speakers with deep, verified track records
Gold	≥ 70	Established voices with significant content
Silver	≥ 50	Regular contributors with moderate presence
Bronze	≥ 0	Default tier for all profiled speakers

Scoring Factors

Factor	Weight	Max Normalization	Description
Years Active	15%	10 years	Longevity in the space
Words Spoken	20%	1M words (log scale)	Volume of content
Appearances	20%	100 episodes	Frequency of appearances
Audience Reach	25%	100M (log scale)	Size of potential audience
Verification Rate	20%	—	Accuracy of attributed quotes

The formula uses log-scale normalization for words spoken and audience reach to avoid penalizing smaller shows.

Person Pipeline (6 stages)

Stage	Input	Output
`wiki_enrich`	Person slug	Wikipedia bio, metadata
`sprite`	Person profile	8-bit NES sprite (PNG)
`person_matrix`	All beliefs for speaker	Aggregated domain scores, top quotes
`person_embed`	Person matrix	1,536-dim person embedding
`build_index`	All persons	Searchable person index
`build_viz`	All persons + embeddings	3D scatter plot data

Storage Layout

data/persons/{person-slug}/
├── profile.json          # Wikipedia metadata, bio, image URL
├── sprite.png            # 8-bit avatar (generated)
├── matrix.json           # Aggregated belief scores
├── beliefs.jsonl         # All beliefs (JSONL format)
├── embedding.json        # 1,536-dim person embedding
└── similarities.json     # Most similar persons by embedding

Qdrant Collection

Person profiles are exported to the persons_v1 Qdrant collection for vector search:

Field	Purpose
Vector	1,536-dim person embedding (cosine similarity)
Payload	Full `PersonProfileData` as JSON

This enables "find similar speakers" queries by comparing person-level embeddings.

Example

{
  "person_id": "michael-saylor",
  "name": "Michael Saylor",
  "bio": "American entrepreneur and business executive. Co-founder of MicroStrategy...",
  "belief_count": 47,
  "episode_count": 8,
  "trust_badge": "gold",
  "domain_scores": [
    {"domain": "bitcoin", "score": 92.3, "belief_count": 31},
    {"domain": "economics", "score": 67.1, "belief_count": 12},
    {"domain": "technology", "score": 45.8, "belief_count": 4}
  ],
  "top_quotes": [
    {
      "text": "Bitcoin is the apex property of the human race",
      "episode_id": "lex-fridman/michael-saylor-2024",
      "confidence": 0.95,
      "tier": 5
    }
  ]
}

Person Profile​

Domain Score​

Trust Badges​

Badge Tiers​

Scoring Factors​

Person Pipeline (6 stages)​

Storage Layout​

Qdrant Collection​

Example​