Using OpenAI Whisper to Transcribe Podcasts on Koyeb
Introduction
Real-time automated transcription is incredibly useful for anyone who needs to capture spoken content quickly and accurately. Whether you're creating subtitles for videos, transcribing podcast episodes, or documenting meeting notes, having an automated system can save you a lot of time and effort.
In practical terms, automated transcription can be used in various real-world scenarios. For instance, journalists can transcribe interviews on the fly, educators can provide real-time captions for their lectures, and businesses can document conference calls and meetings more efficiently.
OpenAI Whisper is an open-source solution built for this purpose. It uses state-of-the-art machine learning algorithms to transcribe speech with high accuracy, even handling different accents and speaking speeds.
In this tutorial, you will learn how to set up a Streamlit application, integrate OpenAI Whisper for real-time podcast transcription, and deploy the application using Docker and Koyeb, creating a scalable transcription service.
You can consult the project repository as work through this guide. You can deploy the podcast transcription application as built in this tutorial using the Deploy to Koyeb button below:
Requirements
To successfully follow this tutorial, you will need the following:
- Git installed
- FFmpeg installed
- Python 3.6+ or later
- A Koyeb account
Demo
Before we jump into the technical details, let me give you a sneak peek of what you will be building in this tutorial:
Understanding the components
OpenAI Whisper
OpenAI Whisper is a sophisticated speech-to-text (STT) model designed to transcribe spoken words into written text with high accuracy. Utilizing advanced machine learning algorithms, Whisper is capable of recognizing various accents, dialects, and speaking speeds. It can be integrated into voice assistants, dictation software, and real-time translation services to convert spoken language into text efficiently.
OpenAI Whisper can be used in sectors such as healthcare for medical dictation, in customer service for automated call transcriptions, and in media for generating subtitles for videos and podcasts. Its ability to handle complex speech patterns and languages makes it the go-to service in any application requiring high-quality speech-to-text conversion.
Streamlit
Streamlit is an open-source Python library designed to create interactive data applications, often referred to as dashboards. It empowers developers to build and share data apps simply and intuitively, eliminating the need for extensive web development expertise.
Streamlit apps are created as Python scripts, which are then executed within the Streamlit environment. The library offers a set of functions that can be used to add interactive elements to the app such as upload file button.
Steps
To build the transcription service, we'll complete the following steps:
- Set up the environment: Start by setting up your project directory, installing necessary dependencies, and configuring environment variables.
- Set up Streamlit: Next, install Streamlit and create the initial user interface for your application.
- Transcribe podcasts with OpenAI Whisper: Use OpenAI Whisper to transcribe podcasts into text with timestamps.
- Dockerize the Streamlit application: Create a Dockerfile to containerize your application for consistent deployment.
- Deploy to Koyeb: Finally, deploy your application on the Koyeb platform.
Set up the environment
Let's start by creating a new Streamlit project. To keep your Python dependencies organized you should create a virtual environment.
First, create and navigate into a local directory:
# Create and move to the new directory
mkdir example-whisper-koyeb-gpu
cd example-whisper-koyeb-gpu
Afterwards, create and activate a new virtual environment:
# Create a virtual environment
python -m venv venv
# Active the virtual environment (Windows)
.\venv\Scripts\activate.bat
# Active the virtual environment (Linux)
source ./venv/bin/activate
Now, you can install the required dependencies. Open a requirements.txt
file in your project directory with the following contents:
streamlit
openai-whisper
watchdog
Pass the file to pip
to install the dependencies:
pip install -r requirements.txt
For the dependencies, we have included Streamlit for building the web app, OpenAI Whisper for real-time transcription, and watchdog to monitor file system events.
Don't forget to save your dependencies to a requirements.txt
file:
pip freeze > requirements.txt
Now, let's move on to creating a new Streamlit project.
Set up Streamlit
In this step, you will set up the Streamlit UI that will allow users to upload an audio file, click a button to start the transcribing process, and finally present the segmented transcriptions in an user-friendly manner. All of the logic for the project will reside in this file, so you can start by creating a app.py
file with the following code:
# File: app.py
import streamlit
stream_button_styles = """
<style>
header { display: none !important; }
</style>
"""
page_styles = """
<style>
h1 { font-size: 2rem; font-weight: 700; }
h2 { font-size: 1.7rem; font-weight: 600; }
.timestamp { color: gray; font-size: 0.9rem; }
</style>
"""
page_title = "Using OpenAI Whisper to Transcribe Podcasts"
page_description = "A demo showcasing the use of OpenAI Whisper to accurately and efficiently convert spoken content from podcasts into written text."
koyeb_box = "To deploy Whisper within minutes, <a href=\"https://koyeb.com/ai\">Koyeb GPUs</a> provide the easiest and most efficient way. Koyeb offers a seamless platform for deploying AI models, leveraging high-performance GPUs to ensure fast and reliable transcriptions."
step_1 = "1. Upload Podcast"
step_2 = "2. Invoke OpenAI Whisper to transcribe podcast 👇🏻"
step_3 = "3. Transcription 🎉"
def unsafe_html(tag, text):
return streamlit.markdown(f"<{tag}>{text}</{tag}>", unsafe_allow_html=True)
def main():
# Set title for the page
streamlit.set_page_config(page_title, layout="centered")
# Inject hide buttons CSS
streamlit.markdown(stream_button_styles, unsafe_allow_html=True)
# Inject page CSS
streamlit.markdown(page_styles, unsafe_allow_html=True)
# Create a H1 heading on the page
streamlit.title(page_title)
unsafe_html("h2", page_description)
unsafe_html("p", koyeb_box)
audio_file = streamlit.file_uploader(step_1, type=["mp3", "mp4", "wav", "m4a"])
if audio_file:
# If file is received
# Write the file on the server
# Show next step
unsafe_html("small", step_2)
if streamlit.button("Transcribe"):
# Get the transcription
unsafe_html("small", step_3)
# Showcase the transcription
if __name__ == "__main__":
main()
The code above does the following:
- Begins by importing the Streamlit module
- Defines CSS for hiding the navigation bar and styling the headings
- Defines text values for the page's title, description, and each step
- Defines the
unsafe_html
function to dynamically create the HTML tags with content - Accepts an audio file using Streamlit's builtin
file_uploader
function
With this, you have setup a UI that is able to accept podcast audio files from the user. Now, let's move on to transcribing the audio file obtained.
Transcribe podcasts with OpenAI Whisper
In this step, you will invoke OpenAI Whisper's base model to transcribe an audio file. By default, the model is able to return the timestamps along with the transcription. This enables you to use the generated transcriptions as subtitles as well. Make the following additions in the app.py
file:
# File: app.py
# Existing imports
# . . .
import whisper # [!code ++]
model = whisper.load_model("base") # [!code ++]
# ...
def unsafe_html(tag, text):
# ...
# Generate transcription of each segment
def timestamp_html(segment): # [!code ++]
return f'<span class="timestamp">[{segment["start"]:.2f} - {segment["end"]:.2f}]</span> {segment["text"]}' # [!code ++]
# Transcribe an audio file
def transcribe_audio(audio_file): # [!code ++]
return model.transcribe(audio_file.name) # [!code ++]
# Write the audio file on server
def write_audio(audio_file): # [!code ++]
with open(audio_file.name, "wb") as f: # [!code ++]
f.write(audio_file.read()) # [!code ++]
def main():
# ...
if audio_file:
# If file is received
# Write the file on the server
write_audio(audio_file) # [!code ++]
# Show next step
unsafe_html("small", step_2)
if streamlit.button("Transcribe"):
# Get the transcription
transcript_text = transcribe_audio(audio_file) # [!code ++]
unsafe_html("small", step_3)
# Showcase the transcription
for segment in transcript_text["segments"]: # [!code ++]
unsafe_html("div", timestamp_html(segment)) # [!code ++]
if __name__ == "__main__":
main()
The changes above do the following:
- Import and instantiate the OpenAI Whisper base model
- Define a
timestamp_html
function to display the transcription with start and end timestamps - Define a
transcribe_audio
function which invokes the model to generate transcriptions of the audio file - Define a
write_audio
function to write the audio file on the server - If an audio file is found, it writes the file on the server
- If the Transcribe button is clicked in the UI,
transcribe_audio
andtimestamp_html
functions are invoked to generate and display the transcriptions of the podcast
Now, you can run the Streamlit application with:
streamlit run ./app.py --server.port 8000
The application would now be ready on http://localhost:8000
. Test the application in action by uploading one of your favorite podcasts file and see the transcriptions generated in real-time.
Next, let's dockerize the application to ensure consistency between multiple deployments.
Dockerize the Streamlit application
Dockerizing deployments creates a consistent and reproducible environment, ensuring that the application runs the same way on any system. It simplifies dependency management and enhances scalability, making deployments more efficient and reliable. To dockerize, create a Dockerfile
at the root of your project with the following content:
FROM python:3.9 AS builder
WORKDIR /app
RUN python3 -m venv venv
ENV VIRTUAL_ENV=/app/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
COPY requirements.txt .
RUN pip install -r requirements.txt
FROM python:3.9 AS runner
WORKDIR /app
RUN apt-get update && apt-get install -y ffmpeg && rm -rf /var/lib/apt/lists/*
COPY --from=builder /app/venv venv
COPY app.py app.py
ENV VIRTUAL_ENV=/app/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
EXPOSE 8000
CMD ["streamlit", "run", "./app.py", "--server.port", "8000"]
Apart from the usual Dockerfile to deploy Python applications, the following tweaks and additions have been made in this code:
RUN apt-get update && apt-get install -y ffmpeg && rm -rf /var/lib/apt/lists/*
is used to installffmpeg
, and then clean up package lists to reduce image sizeEXPOSE 8000
is used to specify the port on which the Streamlit application will runCMD ["streamlit", "run", "./app.py", "--server.port", "8000"]
is used to define the command to start the Streamlit app on port 8000
With everything configured, let's move on to deploy the application to Koyeb.
Deploy to Koyeb
Now that you have the application running locally you can also deploy it on Koyeb and make it available on the internet.
Create a new repository on your GitHub account so that you can push your code.
You can download a standard .gitignore
file for Python from GitHub to exclude certain directories and files from being pushed to the repository:
curl -L https://raw.githubusercontent.com/github/gitignore/main/Python.gitignore -o .gitignore
Run the following commands in your terminal to commit and push your code to the repository:
git init
git add app.py Dockerfile requirements.txt .gitignore
git commit -m "first commit"
git branch -M main
git remote add origin [Your GitHub repository URL]
git push -u origin main
You should now have all of your local code in your remote repository. Now it is time to deploy the application.
Within the Koyeb control panel, while on the Overview tab, initiate the app creation and deployment process by clicking Create Service and then choosing Create web service.
- Select GitHub as the deployment source.
- Select your repository from the menu. Alternatively, deploy from the example repository associated with this tutorial by entering
https://github.com/koyeb/example-whisper-transcription
in the public repository field. - In the Instance selection, select a GPU Instance.
- In the Builder section, choose Dockerfile.
- Finally, click the Deploy button.
Once the application is deployed, you can visit the Koyeb service URL (ending in .koyeb.app
) to access the Streamlit application.
Conclusion
In this tutorial, you built a podcast transcription application with the Streamlit framework and OpenAI Whisper. During the process, you learned how to invoke the OpenAI Whisper model in Python to generate transcription with timestamps, and how to use the Streamlit framework to quickly prototype a user interface with a functioning upload button in a few lines of code.
Given that the application was deployed using the Git deployment option, subsequent code push to the deployed branch will automatically initiate a new build for your application. Changes to your application will become live once the deployment is successful. In the event of a failed deployment, Koyeb retains the most recent operational production deployment, ensuring the uninterrupted operation of your application.