Skip to main content

Speech to Text AI Marketplace

AI can transcribe speech into text, supporting multiple file types such as mp3, wav, and flac.

Header Parameters
X-API-Key string REQUIRED

Your API key

Request Body
files binary REQUIRED

Audio speech file

num_speakers number

(optional) Number of speakers diarization in 1-4 speakers

word_timestamps string

(optional) true or false // default 'false'

speaking_rate string

(optional) true or false // default 'false'

decoder_type string

(optional) Greedy, BeamSearch, or LMBeamSearch // default 'LMBeamSearch'

word_list string

(optional) List of terminology ex. ['word', 'word', ...]

Responses
200

Return list of transcribed text that corresponds to speech detected in audio.

Schema
status string

Status of transcription request

data object

Result data of transcription

results object[]
filename string

Name of audio file

duration string

Total time(sec) of audio file

predictions object[]

List of outputs corresponds to speech detected in audio file

transcript string

Transcribed text

start string

Start time(sec) of transcribed text

end string

End time(sec) of transcribed text

speaker_id number

Speaker ID // optional

speaking_rate string

Speaking rate of each transcription // optional

word_timestamps object[]

List of time offsets of each word // optional

word string

Transcribed word // optional

start_time string

Start time(sec) offset of word // optional

end_time string

End time(sec) offset of word // optional

204

No content | No result of transcription

400

No audio file | Not found audio file or Bad requests | Server cannot or will not process the request

401

Unauthorized | Incorrect X-API-Key or X-API-Key not have access to this model

415

Can't decode [filename] | Unsupported file format