Have you ever tried to find a specific moment on a podcast you listened to a while ago? Unless they have transcripts and you remember exact quotes, it usually doesn't go well.
Using AI, we can automatically transcribe all episodes of a podcast (The Indie Hackers Podcast in this demo) and index them semantically.
This lets us, for example, search for , and get results related to building in the open even if they don't use those exact same words.
How does it work?
First, we fetch a podcast's RSS feed and download all episodes. Then, we use Whisper to transcribe them, split them into small chunks, and obtain embeddings for them using OpenAI's API, which we insert into a vector database.
When you search for something, we obtain the embedding for your query and find the closest vectors in the database using cosine similarity, and return the results sorted by relevance.
Can you do this for my podcast?
I'm planning to publish the CLI tool I built to handle the above process soon, so you'll be able to do it yourself. In the meantime, reach out and I'll see what I can do.