Skip to main content

Person Schema

Person profiles aggregate beliefs across episodes into a single speaker identity with trust scoring, domain expertise, and semantic embeddings.

Person Profile

FieldTypeDescription
person_idstringSlug format (e.g., michael-saylor)
namestringDisplay name
biostringBiography from Wikipedia enrichment
belief_countintTotal beliefs attributed to this person
episode_countintNumber of podcast appearances
trust_badgeenumbronze | silver | gold | platinum
domain_scoresDomainScore[]Top domain expertise scores
top_quotesTopQuote[]Ranked quotes by confidence

Domain Score

Each person has scores across the domains their beliefs touch.

FieldTypeDescription
domainstringDomain name (e.g., bitcoin, economics)
scorefloatNormalized score (0–100)
belief_countintNumber of beliefs in this domain

Trust Badges

Trust scores combine five weighted factors into a 0–100 score, mapped to badge tiers.

Badge Tiers

BadgeThresholdDescription
Platinum≥ 85Top-tier speakers with deep, verified track records
Gold≥ 70Established voices with significant content
Silver≥ 50Regular contributors with moderate presence
Bronze≥ 0Default tier for all profiled speakers

Scoring Factors

FactorWeightMax NormalizationDescription
Years Active15%10 yearsLongevity in the space
Words Spoken20%1M words (log scale)Volume of content
Appearances20%100 episodesFrequency of appearances
Audience Reach25%100M (log scale)Size of potential audience
Verification Rate20%Accuracy of attributed quotes

The formula uses log-scale normalization for words spoken and audience reach to avoid penalizing smaller shows.

Person Pipeline (6 stages)

StageInputOutput
wiki_enrichPerson slugWikipedia bio, metadata
spritePerson profile8-bit NES sprite (PNG)
person_matrixAll beliefs for speakerAggregated domain scores, top quotes
person_embedPerson matrix1,536-dim person embedding
build_indexAll personsSearchable person index
build_vizAll persons + embeddings3D scatter plot data

Storage Layout

data/persons/{person-slug}/
├── profile.json # Wikipedia metadata, bio, image URL
├── sprite.png # 8-bit avatar (generated)
├── matrix.json # Aggregated belief scores
├── beliefs.jsonl # All beliefs (JSONL format)
├── embedding.json # 1,536-dim person embedding
└── similarities.json # Most similar persons by embedding

Qdrant Collection

Person profiles are exported to the persons_v1 Qdrant collection for vector search:

FieldPurpose
Vector1,536-dim person embedding (cosine similarity)
PayloadFull PersonProfileData as JSON

This enables "find similar speakers" queries by comparing person-level embeddings.

Example

{
"person_id": "michael-saylor",
"name": "Michael Saylor",
"bio": "American entrepreneur and business executive. Co-founder of MicroStrategy...",
"belief_count": 47,
"episode_count": 8,
"trust_badge": "gold",
"domain_scores": [
{"domain": "bitcoin", "score": 92.3, "belief_count": 31},
{"domain": "economics", "score": 67.1, "belief_count": 12},
{"domain": "technology", "score": 45.8, "belief_count": 4}
],
"top_quotes": [
{
"text": "Bitcoin is the apex property of the human race",
"episode_id": "lex-fridman/michael-saylor-2024",
"confidence": 0.95,
"tier": 5
}
]
}