Soniqo Cloud · Managed API

Managed transcription.
With speakers attributed.

Upload audio or stream a microphone — get back a timestamped transcript with each speaker labelled. Built on the same models as our open-source SDK, hosted for when you don’t want to ship the weights yourself.

$20 starter + $20/month free credit · 1¢ per audio-second · or run the OSS library free

What's included

Everything you need to ship voice features.

Speech-to-text, speaker separation, speaker identification, and OpenAI-compatible APIs — bundled into one price.

Batch and realtime, one API

Upload an audio file for batch transcription, or stream a WebSocket for live captioning. Same authentication, same billing, same speaker handling.

Diarization, not just transcription

Every utterance comes back attributed to a speaker. Register speaker profiles once and get stable named identities across meetings and call recordings.

Multilingual coverage

Parakeet TDT for 25 European languages plus Meta Omnilingual for the long tail (1,672 languages including Hindi, Arabic, Indonesian, Vietnamese). Auto-detect or explicit language hint.

Pay per audio-second

1¢ per audio-second on the published tier ($0.60/min for everything — speech-to-text, diarization, and speaker ID bundled). Volume discounts for paid tiers; enterprise SLAs available. No minimums.

OpenAI-Whisper drop-in compatible

Existing code written against the OpenAI Whisper API works against Soniqo by changing one configuration line (the base URL). No client rewrite, no SDK migration.

API keys + OAuth

Sign in with Google or GitHub for the console, or generate long-lived API keys for server-to-server traffic. Argon2-hashed; revocable any time.

At-least-once jobs

Durable queue with explicit acknowledgement and idempotency keys. A worker preemption never loses or double-bills a job — the ledger is the source of truth.

Your data stays yours

Transcripts and speaker profiles are tied to your account and exportable via the API. Self-serve account deletion. See the Privacy Policy for details.

Quickstart

One curl call to transcribe an audio file.

After signing in you'll have a starter balance and one API key. The endpoint is HTTP multipart upload; here's the smallest possible call.

# Batch transcription with diarization
curl -X POST https://api.soniqo.audio/v1/transcribe \
  -H "Authorization: Bearer $SONIQO_API_KEY" \
  -F "audio=@meeting.wav"

# Returns a job id; poll /v1/transcribe/<id> for the transcript.

Already using the OpenAI SDK? Change the base URL and your existing code works:

# Drop-in OpenAI-Whisper-compatible endpoint
from openai import OpenAI

client = OpenAI(
    base_url="https://api.soniqo.audio/v1",
    api_key="<your-soniqo-api-key>",
)

with open("meeting.wav", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=f,
    )
print(transcript.text)

Full quickstart Browse the API reference

Pricing

One number covers everything.

Free tier

$20

starter credit + $20/month recurring. Enough for most developers to evaluate without paying.

Pay as you go

1¢ / audio-second

$0.60/min. Includes speech-to-text, speaker separation, speaker identification. Rounded up, minimum 1¢ per call.

Enterprise

Custom

Volume tiers, on-premise install, custom SLAs, per-customer model fine-tuning. Contact us.

Or run it yourself

Same models, same API — for free.

The cloud is for convenience. If you want full control or zero per-minute cost, run the open-source library on your own infrastructure. Apache 2.0, no per-minute pricing, nothing leaves your machines.

speech-swift

The same speech recognition, speaker separation, and voice synthesis models, packaged for Apple platforms. Swift API, Homebrew install, Apache 2.0.

GitHub soniqo.audio →

speech-core

A cross-platform C++ engine for Linux, Windows, macOS, and Android — speech-to-text, diarization, speaker ID, and realtime streaming (Nemotron Speech Streaming) on CPU. Apache 2.0.

GitHub

Managed transcription.With speakers attributed.

Everything you need to ship voice features.

One curl call to transcribe an audio file.

One number covers everything.

Same models, same API — for free.

Managed transcription.
With speakers attributed.