POST
/
v1
/
speech-to-text
# Example with audio file
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "audio=@input.wav;type=audio/wav" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt" \
  -F "returnTimestamps=true" \
  -F "webvtt=true"

# Example with video file (using audio field)
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "audio=@input.mp4;type=video/mp4" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt" \
  -F "returnTimestamps=true"

# Example with URL
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "url=https://example.com/media.mp4" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt"
{
  "data": {
    "text": "Transcribed text content",
    "chunks": [
      {
        "timestamp": [0, 2.5],
        "text": "First segment"
      },
      {
        "timestamp": [2.5, 5.0],
        "text": "Second segment"
      }
    ],
    "webvtt": "WEBVTT\n\n00:00:00.000 --> 00:00:02.500\nFirst segment\n\n00:00:02.500 --> 00:00:05.000\nSecond segment"
  }
}
Convert audio or video files to text using speech recognition. Supports various audio and video formats.

Form Fields

audio
string
Audio or video file to transcribe (mutually exclusive with url)
url
string
URL to audio or video file to transcribe (mutually exclusive with audio) Example: “https://example.com/media.mp4
model
string
Speech recognition model to use Example: “onnx-community/whisper-large-v3-turbo_timestamped”
outputLanguage
string
Target language for the transcription output Example: “pt”
returnTimestamps
boolean
Whether to return timestamps for each transcribed segment Example: true
webvtt
boolean
Generate WebVTT caption output (requires returnTimestamps=true) Example: true

Response

data
object
The transcription result object containing:
  • text: The transcribed text content
  • chunks: Array of segments with timestamps (when returnTimestamps=true)
  • webvtt: WebVTT formatted captions (when webvtt=true and returnTimestamps=true)

Status Codes

200
object
OK - Successful request
400
object
Bad Request - Validation error occurred
401
object
Unauthorized - Authentication failed
402
object
Payment Required
# Example with audio file
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "audio=@input.wav;type=audio/wav" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt" \
  -F "returnTimestamps=true" \
  -F "webvtt=true"

# Example with video file (using audio field)
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "audio=@input.mp4;type=video/mp4" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt" \
  -F "returnTimestamps=true"

# Example with URL
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "url=https://example.com/media.mp4" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt"
{
  "data": {
    "text": "Transcribed text content",
    "chunks": [
      {
        "timestamp": [0, 2.5],
        "text": "First segment"
      },
      {
        "timestamp": [2.5, 5.0],
        "text": "Second segment"
      }
    ],
    "webvtt": "WEBVTT\n\n00:00:00.000 --> 00:00:02.500\nFirst segment\n\n00:00:02.500 --> 00:00:05.000\nSecond segment"
  }
}