Podcast Transcripts with WavoAI, Cursor, Hugo

During its 84 episode run, the Metamuse podcast was a much loved for its discussion on local first software, deep work, creativity, and authenticity. Hosted by Muse founders Adam Wiggins and Mark McGranaghan, the podcast featured guests ranging from Obsidian CEO Stephan Ango, Roam’s founder Conor White-Sullivan, MindNode’s founder Markus Müller-Simhofer, and too many more to count. Wonderful conversations with the leaders of much loved deep-work products and researchers at the forefront of human-computer interaction.

Since going solo with Muse in late 2023, I’ve wanted to get the transcripts of all of these episodes online, but I knew that writing transcripts manually was an insurmountable task. Whisper had recently been released, and I was hopeful for AI transcription to help solve this. While Whisper’s transcription accuracy is fantastic, it does not do speaker diarisation – it doesn’t separate the transcript per speaker.

Recently I found WavoAI, an online transcription service with a generous free tier. Their transcripts are good quality, and most importantly, they do provide speaker diarisation.

Below is the process I followed to add transcripts to every podcast episode page on the Muse website, and take a look at Metamuse Episode 1 page to see the result.

Step 1: Transcribe with WavoAI

The WavoAI website is wonderfully simple. There’s an upload button, and a list of all your transcripts – that’s it. They do have a monthly cap of total transcript duration, so over a couple of months I would upload each episode until I had all 84 episodes transcribed.

Interestingly, the export for the transcript is a *.docx file of all things. So I clicked through and downloaded each docx transcript file, and made sure to name it the same as the podcast episode mp3 file so it was easy to reference.

Step 2: Convert to Markdown

Next, I used a short script to automatically translate those docx files into md Markdown files:

for file in docx/*.docx; do
    filename=$(basename "$file" .docx)
    pandoc "$file" -t markdown -o "md/${filename}.md"
    # Use the following sed command for macOS
    sed -i '' -e 's/\\$//' -e "s/\\\\'/'/g" "md/${filename}.md"
done
The original list of docx transcripts from WavoAI.
A list of the converted podcast transcripts into markdown files.

Step 3: Use Cursor to update Hugo site

The Muse website is built with the Hugo framework. Each podcast episode lives as a Markdown file in a podcast directory. I’ve been using Cursor for many menial coding tasks, and I wanted to see how it would do integrating the podcast transcript files into the podcast episode files. I admittedly don’t know too much about Hugo, so I very much threw this problem over the wall to the AI to see how it would do. I have Cursor setup to use Claude-3.5-Sonnet.

Here’s my first prompt to Cursor:

separate from this repo, I also have a folder full of transcripts for all of the podcast episodes. This site uses Hugo to be built. Is there a way to auto-process the transcript files using Hugo to include them as a sub-section of the respective podcast episode page? or would i need to edit each podcast episode page individually to manually copy/paste the entire transcript in?

That’s all it took and I was off to the races! It suggested adding a transcripts subfolder to the podcast folder. Since all the filenames already matched, it built a new partial for the transcript section:

{{ $transcriptPath := printf "podcast/transcripts/%s.md" .File.BaseFileName }}
{{ with site.GetPage $transcriptPath }}
<div class="transcript">
  <h3>Transcript</h3>
  {{ $content := .Content }}
  {{ $pattern := `(\d{2}):(\d{2}):(\d{2}) - ` }}
  {{ $replacement := `<a href="#t=$1$2$3" class="timestamp" data-time="$1:$2:$3">$1:$2:$3</a> - ` }}
  {{ $processed := replaceRE $pattern $replacement $content }}
  {{ $processed | safeHTML }}
</div>
{{ end }} 

And added a single line to the single.html podcast episode page: {{ partial "podcast-transcript" . }}

Barely 30s after opening cursor, I had all of the episode transcripts imported into the website and showing up correctly! Not bad!

Step 4: Clean up and deploy

At this point, it was good enough to ship, but I spent a bit more time to keep it extra tidy. The WavoAI did a great job with the transcript, but obviously wouldn’t know my last name is spelled with a u instead of an o. I fixed up other egregious misspellings – the word “Muse” was often transcribed as “MU” or “Mu’s”, etc.

The transcription file also includes 00:00:00 styled timestamps each time the speaker changes. I asked Cursor if it it could write a script to use those timestamps to jump the inline podcast player to that timestamp. A bit of back and forth, and now the podcast episode page has links for each timestamp which update the hash of the page URL. This means anyone can jump straight to a quote by clicking in the transcript, and can also share the URL with anyone so that they can jump straight to the quote too! Here’s a great moment in Molly Mielke’s episode where she’s talking about the importance of interoperability in tools for thought.

Wrap up

This has been something I’ve wanted to do for the Metamuse podcast library since day 1 of going solo. It’s been a long wait to find the right tooling to be able to do this without spending weeks of my own time manually annotating speakers. It just wouldn’t be feasible for me to do without the help of both WavoAI and Cursor. Even the Hugo work, I could’ve eventually gotten there on my own, but it would’ve taken me hours instead of minutes to learn the necessary Hugo-isms to pull it all together.

I’m very thankful to have this project ticked off my todo list! It makes the podcast episode pages accessible to the hearing impaired, and I’m hopeful it’ll increase SEO traffic to Muse too, and also make it just a bit easier to share loved moments of your favorite episodes.