Quickstart
Soniqo Cloud is wire-compatible with OpenAI’s Whisper API. Use the official OpenAI SDKs by overriding the base URL. Speaker diarization arrives as an additive x_soniqo field on the response — vanilla clients ignore it.
1. Get an API key
Sign in with Google or GitHub, then go to Account → API keys and click Create new key. The plaintext is shown once; copy it immediately. Save it as a shell env var:
export SONIQO_API_KEY=sk_...All examples below use this key as the bearer credential. New accounts get $20 in free credits, refreshed monthly.
2. Transcribe a file
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
api_key=os.environ["SONIQO_API_KEY"],
base_url="https://cloud.soniqo.audio/v1",
)
with open("meeting.wav", "rb") as f:
result = client.audio.transcriptions.create(
model="parakeet-tdt",
file=f,
response_format="verbose_json",
)
print(result.text)
# Speaker turns are in the response's `x_soniqo` field.
for seg in result.segments:
speaker = getattr(seg, "x_speaker_id", None)
print(f"[{speaker or '?'}] {seg.text}")Node.js (OpenAI SDK)
import OpenAI from "openai";
import fs from "node:fs";
const client = new OpenAI({
apiKey: process.env.SONIQO_API_KEY,
baseURL: "https://cloud.soniqo.audio/v1",
});
const result = await client.audio.transcriptions.create({
model: "parakeet-tdt",
file: fs.createReadStream("meeting.wav"),
response_format: "verbose_json",
});
console.log(result.text);
for (const seg of result.segments) {
console.log(`[${seg.x_speaker_id ?? "?"}] ${seg.text}`);
}curl
curl -X POST https://cloud.soniqo.audio/v1/audio/transcriptions \
-H "Authorization: Bearer $SONIQO_API_KEY" \
-F "file=@meeting.wav" \
-F "model=parakeet-tdt" \
-F "response_format=verbose_json"ffmpeg ships in the next release; until then convert with ffmpeg -i in.mp3 -f f32le -ar 16000 -ac 1 out.pcm.3. Response shapes
Pass response_format to control the output. All five formats from OpenAI’s spec are supported.
| Format | Body | Use case |
|---|---|---|
json (default) | {"text": "..."} | just the transcript |
text | plain text | shell pipelines |
verbose_json | segments + words + x_soniqo speaker turns | app integration; the rich one |
srt | SubRip subtitle file | video subtitles |
vtt | WebVTT subtitle file | browser-native subtitles |
4. Async for long audio
The Whisper-compat synchronous endpoint blocks for up to 5 minutes and returns 504 if the job runs past that. For longer audio, use Soniqo’s async API: it returns a job_id immediately, and you poll for the result.
# 1. submit
JOB=$(curl -sX POST https://cloud.soniqo.audio/v1/transcribe \
-H "X-Api-Key: $SONIQO_API_KEY" \
-F audio=@meeting.pcm | jq -r .job_id)
# 2. poll
while true; do
RES=$(curl -s "https://cloud.soniqo.audio/v1/transcribe/$JOB" \
-H "X-Api-Key: $SONIQO_API_KEY")
STATUS=$(echo "$RES" | jq -r .status)
case "$STATUS" in
completed) echo "$RES" | jq .; break ;;
failed) echo "$RES" >&2; exit 1 ;;
*) sleep 2 ;;
esac
done5. Realtime streaming
A WebSocket endpoint at wss://cloud.soniqo.audio/v1/realtime?intent=transcription streams partial transcripts as you send audio, speaking the same OpenAI Realtime event vocabulary (session.update, input_audio_buffer.append, conversation.item.input_audio_transcription.delta). Streaming recognition runs on Nemotron Speech Streaming, so existing OpenAI Realtime clients work by changing the URL.
6. Pricing & limits
- Free tier: $20 in starter credits + $20 monthly. 1¢ per audio-second.
- Synchronous endpoint: 5 minute wall-clock cap; longer audio uses the async path.
- Soft rate limit: 1000 req/min per IP. Higher quotas via support.
- Concurrent jobs scale with your billing class; free tier is best-effort.
7. What’s different from OpenAI
- Speaker diarization is built in. Set
response_format=verbose_jsonand readx_soniqofor speaker turns. - Speaker identification is opt-in. Register profiles via
POST /v1/speakers, then passidentify_speakers=trueto label them in the transcript. - The pipeline is multilingual. 25 languages via Parakeet TDT plus the Omnilingual fallback for the long tail. Hint with
language=esor auto-detect by leaving it empty. - Self-hosted SDKs available. The same models run on-device under Apache 2.0 if you would rather not go through a cloud at all — speech-swift for Apple platforms, and speech-core, a cross-platform C++ engine for Linux, Windows, macOS, and Android with the same speech-to-text, diarization, speaker ID, and realtime streaming.
