Skip to main content
Version: 1.0.0

Speech to Text

Speech to Text or Automatic Speech Recognition (ASR) is one of the machine learning tasks that aim to predict the corresponding transcription from a given audio input. ASR can be beneficial for several use cases, such as online course transcription, movie subtitle, call center transcription, and so forth.

Base Model - Thai Speech to Text

Provider: Gowajee

We utilized a model from our partner, Gowajee. The model is trained by using over 1,000 hours of annotated data collected online by our partner from various sources. This model can perform in general topics but specializes in call center. The model could be worsened if audio contains code switching, low-quality speech, and overlapping speech data. The model's performance in the term of transcribe speech was evaluated on around 60 hours of speech data, which was collected from the same source as the training data. For diarization, it was evaluated on artificial conversations created from audio pools containing around 50 speakers. Each speaker appears only once in a conversation, ensuring there is no overlap between speakers across conversations.

Authentication

Speech to Text requires API key for API request. Go to VISAI Console - API Key to create and get your API Key.

  • X-API-Key