by Roman Egger
Have you heard about WHISPER, the OpenAI language model? If not, you’re in for a treat! I recently tried out WHISPER for myself and was blown away by its capabilities. WHISPER is a large language model developed by OpenAI. It was trained on a massive dataset and trained on 680,000 hours of multilingual and multitask supervised data. You can choose between the small, basic and large model.
I was particularly impressed with the speech recognition of the large model, which, however, takes looooong if you don´t run it on a GPU until you receive the output. As input, I used an expert interview I recently gave for a PhD student. I recognized everything correctly, even a mixture between German and Englisch at the beginning of the interview – amazing! So I think of setting up Whisper to be used for my students. Whenever they do qualitative studies, it can be used for transcription.
Overall, I highly recommend giving WHISPER a try if you have the opportunity. It´s super easy to use (but I struggled with ffmpeg, which you also need to install).
model = whisper.load_model(“base”)options = whisper.DecodingOptions(fp16=False)
result = model.transcribe(“test.mp3”)