Speech to Text
Convert audio to text using speech recognition
Convert audio or video files to text using speech recognition. Supports various audio and video formats.
Form Fields
Audio or video file to transcribe (mutually exclusive with url)
URL to audio or video file to transcribe (mutually exclusive with audio) Example: “https://example.com/media.mp4”
Speech recognition model to use Example: “onnx-community/whisper-large-v3-turbo_timestamped”
Target language for the transcription output Example: “pt”
Whether to return timestamps for each transcribed segment Example: true
Generate WebVTT caption output (requires returnTimestamps=true) Example: true
Response
The transcription result object containing:
- text: The transcribed text content
- chunks: Array of segments with timestamps (when returnTimestamps=true)
- webvtt: WebVTT formatted captions (when webvtt=true and returnTimestamps=true)
Status Codes
OK - Successful request
Bad Request - Validation error occurred
Unauthorized - Authentication failed
Payment Required