Semantic Search for Podcasts

Search through podcasts by moments and topics, not exact quotes.

Searching The Indie Hackers Podcast

How does it work?

First, we fetch a podcast's RSS feed and download all episodes. Then, we use Whisper to transcribe them, split them into small chunks, and obtain embeddings for them using OpenAI's API, which we insert into a vector database.

When you search for something, we obtain the embedding for your query and find the closest vectors in the database using cosine similarity, and return the results sorted by relevance.

Can you do this for my podcast?

I'm planning to publish the CLI tool I built to handle the above process soon, so you'll be able to do it yourself. In the meantime, reach out and I'll see what I can do.