Speech To Text Web App Using Python And OpenAI’s Whisper

Hello Learners…

Welcome To My Blog…

Table Of Contents:

  • Introduction
  • What is speech-to-text?
  • What is OpenAI’s whisper?
  • Speech-to-text using python and OpenAI’s whisper
  • Summary
  • References

Introduction

In this post, we create a web app for speech-to-text using python and OpenAI’s whisper.

What is Speech-To-Text?

Speech-to-text is the conversion of voice data into text data. speech to text is widely used everywhere at this time.

What is OpenAI’s Whisper?

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

Speech-to-text using python and OpenAI’s whisper

Here we create a simple web app for a speech-to-text using the python streamlit library and OpenAI’s whisper model.

First, we install the streamlit library using the pip command, open your terminal and enter the below command:

pip install streamlit

To record our voice we have to install the audio_recorder_streamlit library using the pip command:

pip install audio_recorder_streamlit

After we install openaiwhisper using the pip command:

pip install -U openai-whisper

It takes time based on internet speed.

After this, we have to download the pre-trained model of open-whisper, here are the models available for download which we can use based on our requirements and we simply use it.

ModelURL
tiny.enhttps://openaipublic.azureedge.net/main/whisper/models/d3dd57d32accea0b295c96e26691aa14d8822fac7d9d27d5dc00b4ca2826dd03/tiny.en.pt
tinyhttps://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt
base.enhttps://openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt
basehttps://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt
small.enhttps://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
smallhttps://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt
medium.enhttps://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt
mediumhttps://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt
largehttps://openaipublic.azureedge.net/main/whisper/models/e4b87e7e0bf463eb8e6956e646f1e277e901512310def2c24bf0e11bd3c28e9a/large.pt
Models By OpenAI

Every Model has a different size and different hardware requirements.

Here we use base.en model. which is 145MB in size. and the model is base.en.pt which is a pytorch model.

Download it and put it in our current working directory. so we can use it.

Now we create a web app using streamlit, here is the code

Create a file app.py and paste the below code into that file.

Code:

import streamlit as st
from audio_recorder_streamlit import audio_recorder
import whisper
import os

st.title("Speech To Text")
def main():
    audio_bytes = audio_recorder()
    if audio_bytes:
        st.audio(audio_bytes, format="audio/wav")

        if os.path.isfile("myfile.wav"):
            os.remove("myfile.wav")

        with open('myfile.wav', mode='bx') as f:
            f.write(audio_bytes)

    if os.path.isfile("myfile.wav"):
        model = whisper.load_model("./base.en.pt",device='cpu')
        result = model.transcribe("myfile.wav")
        st.write(result['text'])
        os.remove("myfile.wav")

    else:
        st.error("please record your voice")

main()

Now run the file, open the terminal, and run the below command:

streamlit run app.py

We get an URL: http://localhost:8501/

It will automatically be redirected to our default browser also we can manually open it in our browser, we can see the below interface.

screenshot by the author

Now click the record icon it will turn into red color and start speaking it will record your voice when you want to stop click again and then it will convert your voice into text.

screenshot by the author

Here we can see our output text.

Summary

This is a simple voice-to-text web app, we can use these models for any type of speech-to-text.

Happy Learning And Keep Learning…

Thank You…

Also, you can read my other articles for learning…

Grammar correction web app

References:

Leave a Comment