Speech To Text Web App Using Python And OpenAI's Whisper

Hello Learners…

Welcome To My Blog…

Model	URL
tiny.en	https://openaipublic.azureedge.net/main/whisper/models/d3dd57d32accea0b295c96e26691aa14d8822fac7d9d27d5dc00b4ca2826dd03/tiny.en.pt
tiny	https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt
base.en	https://openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt
base	https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt
small.en	https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
small	https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt
medium.en	https://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt
medium	https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt
large	https://openaipublic.azureedge.net/main/whisper/models/e4b87e7e0bf463eb8e6956e646f1e277e901512310def2c24bf0e11bd3c28e9a/large.pt

Models By OpenAI

Every Model has a different size and different hardware requirements.

Here we use base.en model. which is 145MB in size. and the model is base.en.pt which is a pytorch model.

Download it and put it in our current working directory. so we can use it.

Now we create a web app using streamlit, here is the code

Create a file app.py and paste the below code into that file.

Code:

import streamlit as st
from audio_recorder_streamlit import audio_recorder
import whisper
import os

st.title("Speech To Text")
def main():
    audio_bytes = audio_recorder()
    if audio_bytes:
        st.audio(audio_bytes, format="audio/wav")

        if os.path.isfile("myfile.wav"):
            os.remove("myfile.wav")

        with open('myfile.wav', mode='bx') as f:
            f.write(audio_bytes)

    if os.path.isfile("myfile.wav"):
        model = whisper.load_model("./base.en.pt",device='cpu')
        result = model.transcribe("myfile.wav")
        st.write(result['text'])
        os.remove("myfile.wav")

    else:
        st.error("please record your voice")

main()

Now run the file, open the terminal, and run the below command:

streamlit run app.py

We get an URL: http://localhost:8501/

It will automatically be redirected to our default browser also we can manually open it in our browser, we can see the below interface.

Now click the record icon it will turn into red color and start speaking it will record your voice when you want to stop click again and then it will convert your voice into text.

Here we can see our output text.

Summary

This is a simple voice-to-text web app, we can use these models for any type of speech-to-text.

Happy Learning And Keep Learning…

Thank You…

Also, you can read my other articles for learning…

Create Grammar Correction WebApp Using Python And HuggingFace

Grammar correction web app

https://galaxyofai.com/create-grammar-correction-webapp-using-python-and-huggingface/

References:

https://github.com/openai/whisper

Speech To Text Web App Using Python And OpenAI’s Whisper

Table Of Contents:

Introduction

What is Speech-To-Text?

What is OpenAI’s Whisper?

Speech-to-text using python and OpenAI’s whisper

Summary

References:

Leave a Comment Cancel reply