Response ): """Record stream audio to files as. It mainly consists of this function:ĭef record_stream_to_file ( stream : requests. With better specs / a GPU you can increase model size for better quality transcriptions. Thereby, we reduce false-negatives (but increase false-positive) alarms. To account for the non perfect transcription quality, we use fuzzy search when looking for our keywords in the transcription. I had good results deploying this on a c5a.large EC2 machine on AWS, costing ~65$ per month. In addition, it’s resource efficient enough to be run on a CPU without falling behind the stream. Using the small model, we achieve decent results even on non-english audio. Moreover, it’s easily available and comes in different model sizes. We use OpenAI’s Whisper as it is currently one of the best performing models for audio transcription. msg_group_via_signal.sh relays the alarm message to the signal-cli tool which messages a group on the Signal messenger.On match, it calls msg_group_via_signal.sh Then, it uses fuzzy matching to monitor the spoken word for our keywords. transcribe.py permanently transcribes each audio chunk using OpenAI Whisper.mp3 files in chunks of 30sec from a live audio stream In the following, I will go over the overarching structure of the solution and explain some of the relevant parts of the code. Hence, the goal was achieved!Īll the code is available in this repo. While it was not built with stability as the main focus, it actually performed flawlessly for several weeks without any downtimes. Also, it had to be as resource efficient as possible to minimize infrastructure costs. This needed to be done quickly, which resulted in a simple solution. This was a quick POC built on a weekend: I wanted to monitor a local radio station for the mention of some keywords in order to win a competition. Then, we trigger a message via Signal messenger to a group or person that contains the relevant part of the spoken passage. Using fuzzy matching on the transcribed text, we find mentions of our keywords. Also, the transcribed text is logged with timestamps for further use. We do this to monitor the stream for specific keywords. In this post, I demonstrate how to transcribe a live audio-stream in near real time using OpenAI Whisper in Python.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |