Try Red5 Cloud + PubNub!
Power Real-Time Interactivity at Global Scale /

Red5 Documentation

Transcription

Red5 Pro supports extracting audio from live streams in real time and forwarding it to a configured WebSocket endpoint. The audio is automatically converted to 16-bit PCM format at 16 kHz mono, delivered in optimized 40ms chunks suitable for Voice Activity Detection (VAD) processing.

This feature enables building live text transcription capabilities using AI speech-to-text models such as NVIDIA Parakeet, OpenAI Whisper, Deepgram, and other transcription services.

Enabling Transcription

Edit the red5.properties file located in the conf directory of your Red5 Pro server installation. Add or modify the following properties:

transcription.active=true
transcription.endpoint=YOUR_WEBSOCKET_ENDPOINT

When configured through red5.properties, transcription will be enabled automatically for every stream on the server.

Settings Per Stream

Whether transcription is active and the WebSocket endpoint can be configured per stream using connection params or query params.

Example:

ffmpeg -re -stream_loop -1 -i example.mp4 -c:v copy -c:a aac -f flv "rtmp://10.10.10.10/live/stream1?transcription.active=true&transcription.endpoint=ws://120.1.1.10/ws"
PropertyDescriptionDefault
transcription.activeEnable or disable transcription for this streamfalse
transcription.endpointWebSocket URL for the transcription service
transcription.metadataOptional metadata string to include with transcription results
transcription.allow.query.string.overridesAllow per-stream settings via query paramstrue

Per-stream settings override the server-wide configuration from red5.properties.

To disable query string overrides for all streams on the server, add the following to red5.properties:

transcription.allow.query.string.overrides=false