Skip to main content

Speech to Text API

POSThttps://stt.infer.visai.ai/predict
Header
    X-API-Key string required

    Your API key

form-data body

Request Body
  • files Filerequired

    Audio raw files in a form of multi-part form data using the key name files.

optional

Send with the form of multi-part form data

    num_speakers number

    Default value: 1

    Number of speakers diarization in 1-4 speakers.

    speaking_rate boolean

    Default value: false

    Get the speaking rate of each transcription for speech fluency analysis in the audio.

    word_timestamps boolean

    Default value: false

    Get time offsets of each word that is recognized in the audio.

    word_list string[]

    Default value: []

    List of terminology ['word_1', 'word_2', ...]

    decoder_type string

    Default value: LMBeamSearch

    Decoding methods including Greedy, BeamSearch, and LMBeamSearch

Responses


object
status string

success | failed

Status of request

data object
results Array [
List of file result
object
filename string

File name

predictions Array [
List of outputs corresponds to speech detected in audio file.
object
start_time string

Start time in HH:mm:ss.sss format

end_time string

Start time in HH:mm:ss.sss format

speaker string

Speaker in SPEAKER_{number} format

transcript string

The transcribed text

speaker_rate (optional) float

Speaking rate of each transcription

word_timestamps (optional) Array [
List of the beginning and end of each word in seconds that has elapsed from the beginning of the audio.
object
start_time string

Start time in HH:MM:SS.sss format

end_time string

Start time in HH:MM:SS.sss format

word string

Word in audio file

]
]
]
Loading...