Data Pipelines

Two offline pipelines feed data into Bitcoinology. Both run independently from the frontend.

Pipeline Architecture

Runs on bare metal with GPU acceleration, 24/7.

Stage	Tool	Purpose
Download	RSS parser	Fetch podcast audio files
Transcribe	Whisper large-v3	Speech-to-text
Diarize	Pyannote 3.1	Who spoke when
Speaker ID	ECAPA-TDNN	Match voices to known speakers

Output: Structured JSON → transcript_chunks and episode_metadata tables in Supabase.

10-stage pipeline that transforms raw transcripts into structured beliefs:

Stage	What It Does
1. Speaker Resolution	Map diarization labels to known speakers
2. Ad Removal	Strip sponsor reads and ads
3. Belief Extraction	Extract atomic beliefs (≤25 words) from quotes
4. Worldview Abstraction	Derive worldview and core axiom
5. Embedding Generation	OpenAI text-embedding-3-large (1536-dim)
6. Ideology Weighting	10-dimensional positioning vector
7. Headlines	Generate tabloid-style headlines
8. Matrix Scoring	Confidence and tier scoring
9. Clip Extraction	Identify audio clip timestamps
10. Trust Scores	Speaker trust badge calculation

Monthly export to ryan-beliefengines/podcast-transcripts:

License: CC BY-NC 4.0