Speech to Text AI Marketplace

AI can transcribe speech into text, supporting multiple file types such as mp3, wav, and flac.

Header Parameters
`X-API-Key` string — REQUIRED Your API key

Request Body
`files` binary — REQUIRED Audio speech file
`num_speakers` number (optional) Number of speakers diarization in 1-4 speakers
`word_timestamps` string (optional) true or false // default 'false'
`speaking_rate` string (optional) true or false // default 'false'
`decoder_type` string (optional) Greedy, BeamSearch, or LMBeamSearch // default 'LMBeamSearch'
`word_list` string (optional) List of terminology ex. ['word', 'word', ...]

Responses

200

Return list of transcribed text that corresponds to speech detected in audio.

Schema

status string

Status of transcription request

data object

Result data of transcription

results object[]

filename string

Name of audio file

duration string

Total time(sec) of audio file

predictions object[]

List of outputs corresponds to speech detected in audio file

transcript string

Transcribed text

start string

Start time(sec) of transcribed text

end string

End time(sec) of transcribed text

speaker_id number

Speaker ID // optional

speaking_rate string

Speaking rate of each transcription // optional

word_timestamps object[]

List of time offsets of each word // optional

word string

Transcribed word // optional

start_time string

Start time(sec) offset of word // optional

end_time string

End time(sec) offset of word // optional

204

No content | No result of transcription

400

No audio file | Not found audio file or Bad requests | Server cannot or will not process the request

401

Unauthorized | Incorrect X-API-Key or X-API-Key not have access to this model

415

Can't decode [filename] | Unsupported file format