Transcription

Red5 Pro supports extracting audio from live streams in real time and forwarding it to a configured WebSocket endpoint. The audio is automatically converted to 16-bit PCM format at 16 kHz mono, delivered in optimized 40ms chunks suitable for Voice Activity Detection (VAD) processing.

This feature enables building live text transcription capabilities using AI speech-to-text models such as NVIDIA Parakeet, OpenAI Whisper, Deepgram, and other transcription services.

Enabling Transcription

Edit the red5.properties file located in the conf directory of your Red5 Pro server installation. Add or modify the following properties:

transcription.active=true
transcription.endpoint=YOUR_WEBSOCKET_ENDPOINT

When configured through red5.properties, transcription will be enabled automatically for every stream on the server.

Settings Per Stream

Whether transcription is active and the WebSocket endpoint can be configured per stream using connection params or query params.

Example:

ffmpeg -re -stream_loop -1 -i example.mp4 -c:v copy -c:a aac -f flv "rtmp://10.10.10.10/live/stream1?transcription.active=true&transcription.endpoint=ws://120.1.1.10/ws"

Property	Description	Default
transcription.active	Enable or disable transcription for this stream	false
transcription.endpoint	WebSocket URL for the transcription service
transcription.metadata	Optional metadata string to include with transcription results
transcription.allow.query.string.overrides	Allow per-stream settings via query params	true

Per-stream settings override the server-wide configuration from red5.properties.

To disable query string overrides for all streams on the server, add the following to red5.properties:

transcription.allow.query.string.overrides=false