Podcast Harvesting
The story is simple: every morning at 6AM, I run around Lake Merritt in Oakland, CA listening to a podcast.
The podcast is Conversations with Tyler, in which economist Tyler Cowen – a polymath, to be sure – interviews an expert in any given field: cuisine, music, economics, technology, etc. You name the domain, and Tyler can play ball.
The interviews proceed quickly and are filled to the brim with information. Names are dropped. References are made. Books are recommended. Tyler and his guest – like the internet – are a constant stream of information, and it is difficult to capture that information and do something with it – to save it for later for further exploration, deeper thinking, association, or sharing. I could stop running and transcribe what I am hearing, but I don’t have the time, patience, or wherewithal to be so disciplined.
While I run, I have one intuitive way to capture information from this infinite stream: a screenshot. With my phone in my right hand, I can simply squeeze the sides of the device and save the timestamp of whatever it is that piqued my interest in that moment. With over 8,000 screenshots in my phone, it is clear that this gesture for capturing information has become as close to muscle memory as putting one foot in front of the other.
Capturing is only one side of the equation, however. How do I make use of that information I, in one brief moment, found so compelling? When do I return to it? What can I do with it?
Up until a few months ago, the answers to these questions were: I don’t, I don’t, and not much.
This is because when I return to my screenshot folder, I just see timestamps: there is no context to remind me of what moved me to screenshot. I am even less likely to go back to the podcast, scrub along the player, and transcribe what I had heard when the heat of the moment has passed.
When ChatGPT was released, I figured my basic programming skills could be augmented enough to address this problem, or at least help me make a small amount of progress towards capturing information in more intuitive and rapid ways.
This is what I came up with:
Here's a brief description of how the program works:
Image Text Extraction:
The program uses the Tesseract OCR library to extract text from the provided image. It reads the image and converts it into a text string.
Podcast Information Inference:
The extracted text is then passed to OpenAI's GPT-3.5 Turbo model using the OpenAI API. The program formulates a conversation with the model, providing the extracted text and requesting it to identify the podcast name, episode name, and timestamp from the screenshot.
Audio Processing:
The program uses the podcast information obtained to fetch the corresponding audio using Google Podcasts' web interface. It makes an HTTP request to the Google Podcasts website, searching for the podcast name and episode name. The response is then parsed to extract the URL of the audio file.
Transcription Generation:
The program cuts the downloaded audio file at the specified timestamp using the provided duration (I wanted to capture the audio at the timestamp +- 30 seconds). The extracted audio segment is then passed to OpenAI's Whisper ASR system for transcription. The transcription response is cleaned up and converted to a dialogue format.
User Interface:
The program provides a user interface allowing users to upload an image containing a screenshot. Once the image is uploaded, the program executes the main functionality, processing the image and generating the cleaned transcription. The cleaned transcription is displayed in the GUI window.
And that’s it! I've added an interesting twist to my morning runs around the lake. Thanks to these AI coding assistants, GPT’s ‘reasoning/inference’ engine, and some useful APIs, I've now got an automated knowledge harvester right there in my pocket.
Now, when I hear something fascinating, I simply squeeze my phone, and the information gets stored for later. Gone are the days when those screenshots were just a mystery, void of any context. Instead, they've become gateways to further exploration, deeper thinking, and sharing.
I’m currently building harvest.online - get in touch!
isaac@harvest.online