Speech to Text

# Example with audio file
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "audio=@input.wav;type=audio/wav" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt" \
  -F "returnTimestamps=true" \
  -F "webvtt=true"

# Example with video file (using audio field)
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "audio=@input.mp4;type=video/mp4" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt" \
  -F "returnTimestamps=true"

# Example with URL
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "url=https://example.com/media.mp4" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt"

{
  "data": {
    "text": "Transcribed text content",
    "chunks": [
      {
        "timestamp": [0, 2.5],
        "text": "First segment"
      },
      {
        "timestamp": [2.5, 5.0],
        "text": "Second segment"
      }
    ],
    "webvtt": "WEBVTT\n\n00:00:00.000 --> 00:00:02.500\nFirst segment\n\n00:00:02.500 --> 00:00:05.000\nSecond segment"
  }
}

POST

speech-to-text

# Example with audio file
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "audio=@input.wav;type=audio/wav" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt" \
  -F "returnTimestamps=true" \
  -F "webvtt=true"

# Example with video file (using audio field)
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "audio=@input.mp4;type=video/mp4" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt" \
  -F "returnTimestamps=true"

# Example with URL
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "url=https://example.com/media.mp4" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt"

{
  "data": {
    "text": "Transcribed text content",
    "chunks": [
      {
        "timestamp": [0, 2.5],
        "text": "First segment"
      },
      {
        "timestamp": [2.5, 5.0],
        "text": "Second segment"
      }
    ],
    "webvtt": "WEBVTT\n\n00:00:00.000 --> 00:00:02.500\nFirst segment\n\n00:00:02.500 --> 00:00:05.000\nSecond segment"
  }
}

Convert audio or video files to text using speech recognition. Supports various audio and video formats.

Form Fields

audio

string

Audio or video file to transcribe (mutually exclusive with url)

url

string

URL to audio or video file to transcribe (mutually exclusive with audio) Example: “https://example.com/media.mp4”

model

string

Speech recognition model to use Example: “onnx-community/whisper-large-v3-turbo_timestamped”

outputLanguage

string

Target language for the transcription output Example: “pt”

returnTimestamps

boolean

Whether to return timestamps for each transcribed segment Example: true

webvtt

boolean

Generate WebVTT caption output (requires returnTimestamps=true) Example: true

Response

data

object

The transcription result object containing:

text: The transcribed text content
chunks: Array of segments with timestamps (when returnTimestamps=true)
webvtt: WebVTT formatted captions (when webvtt=true and returnTimestamps=true)

Status Codes

200

object

OK - Successful request

400

object

Bad Request - Validation error occurred

401

object

Unauthorized - Authentication failed

402

object

Payment Required

# Example with audio file
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "audio=@input.wav;type=audio/wav" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt" \
  -F "returnTimestamps=true" \
  -F "webvtt=true"

# Example with video file (using audio field)
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "audio=@input.mp4;type=video/mp4" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt" \
  -F "returnTimestamps=true"

# Example with URL
curl -X POST https://api.woolball.xyz/v1/speech-to-text \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "url=https://example.com/media.mp4" \
  -F "model=onnx-community/whisper-large-v3-turbo_timestamped" \
  -F "outputLanguage=pt"

{
  "data": {
    "text": "Transcribed text content",
    "chunks": [
      {
        "timestamp": [0, 2.5],
        "text": "First segment"
      },
      {
        "timestamp": [2.5, 5.0],
        "text": "Second segment"
      }
    ],
    "webvtt": "WEBVTT\n\n00:00:00.000 --> 00:00:02.500\nFirst segment\n\n00:00:02.500 --> 00:00:05.000\nSecond segment"
  }
}

Text to Speech Speech to Text Models

API Documentation

Text Services

Audio Services

Image Services

Form Fields

Response

Status Codes

API Documentation

Text Services

Audio Services

Image Services

​Form Fields

​Response

​Status Codes

Form Fields

Response

Status Codes