POST
/
v1
/
speech-to-text
# Example with audio file
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "audio=@input.wav;type=audio/wav" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt" \
  -F "returnTimestamps=true" \
  -F "webvtt=true"

# Example with video file (using audio field)
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "audio=@input.mp4;type=video/mp4" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt" \
  -F "returnTimestamps=true"

# Example with URL
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "url=https://example.com/media.mp4" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt"
{
  "data": {
    "text": "Transcribed text content",
    "chunks": [
      {
        "timestamp": [0, 2.5],
        "text": "First segment"
      },
      {
        "timestamp": [2.5, 5.0],
        "text": "Second segment"
      }
    ],
    "webvtt": "WEBVTT\n\n00:00:00.000 --> 00:00:02.500\nFirst segment\n\n00:00:02.500 --> 00:00:05.000\nSecond segment"
  }
}

Convert audio or video files to text using speech recognition. Supports various audio and video formats.

Form Fields

audio
string

Audio or video file to transcribe (mutually exclusive with url)

url
string

URL to audio or video file to transcribe (mutually exclusive with audio) Example: “https://example.com/media.mp4

model
string

Speech recognition model to use Example: “onnx-community/whisper-large-v3-turbo_timestamped”

outputLanguage
string

Target language for the transcription output Example: “pt”

returnTimestamps
boolean

Whether to return timestamps for each transcribed segment Example: true

webvtt
boolean

Generate WebVTT caption output (requires returnTimestamps=true) Example: true

Response

data
object

The transcription result object containing:

  • text: The transcribed text content
  • chunks: Array of segments with timestamps (when returnTimestamps=true)
  • webvtt: WebVTT formatted captions (when webvtt=true and returnTimestamps=true)

Status Codes

200
object

OK - Successful request

400
object

Bad Request - Validation error occurred

401
object

Unauthorized - Authentication failed

402
object

Payment Required

# Example with audio file
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "audio=@input.wav;type=audio/wav" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt" \
  -F "returnTimestamps=true" \
  -F "webvtt=true"

# Example with video file (using audio field)
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "audio=@input.mp4;type=video/mp4" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt" \
  -F "returnTimestamps=true"

# Example with URL
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "url=https://example.com/media.mp4" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt"
{
  "data": {
    "text": "Transcribed text content",
    "chunks": [
      {
        "timestamp": [0, 2.5],
        "text": "First segment"
      },
      {
        "timestamp": [2.5, 5.0],
        "text": "Second segment"
      }
    ],
    "webvtt": "WEBVTT\n\n00:00:00.000 --> 00:00:02.500\nFirst segment\n\n00:00:02.500 --> 00:00:05.000\nSecond segment"
  }
}