Transcription
Red5 Pro supports extracting audio from live streams in real time and forwarding it to a configured WebSocket endpoint. The audio is automatically converted to 16-bit PCM format at 16 kHz mono, delivered in optimized 40ms chunks suitable for Voice Activity Detection (VAD) processing.
This feature enables building live text transcription capabilities using AI speech-to-text models such as NVIDIA Parakeet, OpenAI Whisper, Deepgram, and other transcription services.
Enabling Transcription
Edit the red5.properties file located in the conf directory of your Red5 Pro server installation. Add or modify the following properties:
transcription.active=true
transcription.endpoint=YOUR_WEBSOCKET_ENDPOINT
When configured through red5.properties, transcription will be enabled automatically for every stream on the server.
Settings Per Stream
Whether transcription is active and the WebSocket endpoint can be configured per stream using connection params or query params.
Example:
ffmpeg -re -stream_loop -1 -i example.mp4 -c:v copy -c:a aac -f flv "rtmp://10.10.10.10/live/stream1?transcription.active=true&transcription.endpoint=ws://120.1.1.10/ws"
| Property | Description | Default |
|---|---|---|
| transcription.active | Enable or disable transcription for this stream | false |
| transcription.endpoint | WebSocket URL for the transcription service | |
| transcription.metadata | Optional metadata string to include with transcription results | |
| transcription.allow.query.string.overrides | Allow per-stream settings via query params | true |
Per-stream settings override the server-wide configuration from red5.properties.
To disable query string overrides for all streams on the server, add the following to red5.properties:
transcription.allow.query.string.overrides=false