IP Camera Live Streaming: Connecting RTSP to WebRTC

IP Camera Live Streaming
SHARE

So you have your IP camera all hooked up and ready to begin streaming video. Now you just need to figure out how people can watch it. Ideally, it should be simple and universally available; like an internet browser. You also want to see the live stream coming from the IP camera with minimal latency,… Continue reading IP Camera Live Streaming: Connecting RTSP to WebRTC

So you have your IP camera all hooked up and ready to begin streaming video. Now you just need to figure out how people can watch it. Ideally, it should be simple and universally available; like an internet browser. You also want to see the live stream coming from the IP camera with minimal latency, and as close to real-time as possible.

Unfortunately, most IP cameras are RTSP (Real-time Streaming Protocol) based which is not natively supported in internet browsers. So how do you use an IP camera for live streaming?

One solution is to connect to the RTSP stream and view it in VLC Media Player. However, that requires additional configurations and is not convenient. Typically users of your app won’t understand how to configure it, and it would be much easier if you could just view the video in a standard web browser. Plus what if you need to support thousands or millions of viewers?

Even more likely, what if you have hundreds of IP cameras, such as security cameras, that you need to programmatically access and provide the video stream of each to view in a webpage. This way anyone in the organization can quickly access the live stream of each security camera remotely and in near real-time.

Live Streaming Surveillance

There are other IP camera streaming solutions on the market for accessing the live video from IP cameras in browser applications, but most of these use high latency protocols for delivery such as the HLS (HTTP Live Streaming) protocol. Using solutions like this will add many seconds of delay to the IP camera’s video stream. This makes it difficult to respond to critical situations in real-time. Applications for first responders accessing security cameras to understand what is happening as an emergency situation unfolds cannot deal with seconds of delay in the IP camera live stream.

Particularly, in this situation, viewing multiple cameras with different views and keeping them in perfect sync and as close to the time that the action was captured the better. Luckily we also have live server-side mixing technology that can be leveraged to provide live grids of multiple live streams coming from security cameras.

Fortunately, WebRTC is a real-time latency protocol widely accepted by internet browsers and native applications. All you need to do is convert the RTSP video stream coming from your IP cameras to WebRTC. This post will explain how that can be done with Red5 Pro. Note that soon we will be adding support for RTSP IP camera streaming to our fully managed service, Red5 Cloud. We will be sure to update this post as soon as it’s ready.

Without further ado, let’s dive into the details of IP camera video streaming and how to accomplish getting those streams into browser applications.

How Do I Convert an RTSP video stream to WebRTC?

The simple answer is to use Red5 Pro. We’ve built that logic into our Restreamer Plugin so all that work is done automatically once it’s configured. We also have egress support for many other streaming protocols like RTMP, Zixi, and HLS for the few clients that don’t support WebRTC.

However, I’m sure that some of you reading this are wondering about the specific process we use to convert the IP camera protocol, RTSP to WebRTC. Read on, my inquisitive friends, and we will cover that in more detail.

In order to integrate an IP camera (or really any RTSP live stream) with WebRTC, you first need to achieve media interoperability (a fancy term for making them work together). The media stream sent out by the IP camera needs to be made compatible with formats supported by browsers and the WebRTC codecs.

What is RTSP used for?

RTSP is a streaming control protocol that is used to control the streaming server, kind of like how a remote control works with a TV (enabling play, pause, etc.). It does not actually transport the stream, rather it defines how the data in the live stream should be packaged for delivery. It also defines how both ends of the connection should behave to prepare a pathway for transportation.

This process is also known as signaling. WebRTC uses a different method to handle signaling often involving WebSockets, with more modern solutions using the HTTP based WHIP and WHEP standards.

RTSP, technically is a signaling protocol, and what it uses for transporting the live stream is not actually specified in the official RFC. That said, people tend to just refer to the signaling plus the transport (usually RTP) as one thing. This is the protocol stack that most IP cameras use today. We will get to the transport layer in a bit, but first let’s dive into signaling.

What is Signaling in Detail?

Again, signaling is just the ability to connect end points so that they can effectively stream to each other. RTSP and WebRTC are quite different signaling technologies, so let’s look at each in detail.

WebRTC Signaling

Signaling, particularly with WebRTC is a complex subject. As WebRTC was originally designed as a peer to peer protocol allowing for live video chat between two parties, it meant that you typically had to negotiate NAT, and find ways to map one IP address to the other peer’s IP address. WebRTC signaling does this by using the ICE protocol and it’s related protocols STUN and TURN. At it’s simplest, this part of signaling with WebRTC is just the process of routing one end point’s IP address to another so that they can connect to each other. In the case of a media server in the middle like we have with Red5 Pro, we are basically mimicking a client-server model, but having to use P2P technology to do the work.

RTSP Signaling

Since RTSP was designed as a client server model, it is much simpler. The IP address of the server is a known entity, typically publicly accessible on the internet. So connecting to it just involves using the proper IP address (typically through DNS). RTSP signaling comes into play through a series of commands sent between the client and server. These commands—such as SETUP, PLAY, PAUSE, and TEARDOWN—initiate and control the session. The SETUP command initiates the session and defines the transport mechanism to be used, while PLAY starts the stream, PAUSE temporarily halts it without tearing down the session, and TEARDOWN ends the session entirely. These commands are exchanged over a persistent connection, typically TCP (in Red5 Pro we support UDP as well), and dictate how the media server should handle the requested stream. Again, RTSP does not transport the actual media but coordinates the media streaming, often working alongside protocols like RTP (Real-time Transport Protocol) to ensure synchronized delivery of audio and video. This separation of control (via RTSP) and media data (via RTP) allows for efficient management of streams, making RTSP a versatile choice for both live and on-demand video services.

What is Transport, and What Do IP Cameras Use?

With almost all RTSP based IP cameras, RTP (Real-time Protocol) is used for the actual transportation of the data/video stream. The RTP transport protocol it could be said, is like a runway and flight path between two airports while RTSP (signaling) is the air traffic controller that makes sure the runway is open and the flight path is clear of any obstacles. The actual video codecs (covered in more detail below) are used to encode the video data would be the airplane itself.

The Real-Time Transport Protocol (RTP) is crucial for streaming audio and video, particularly in applications involving IP cameras. RTP is designed to efficiently transport real-time data over IP networks, making it ideal for live video surveillance, remote monitoring, and streaming from IP cameras.

In the context of IP cameras, RTP encapsulates audio and video data into packets for transmission over the network. Each packet includes a header containing timing information, sequence numbers, and synchronization details, essential for the correct reconstruction of the stream at the receiving end. This allows IP cameras to send high-quality, synchronized audio and video feeds to the Red5 Pro media server in real-time.

It also turns out that WebRTC also uses RTP (technically an encrypted version of RTP known as SRTP) for transport, which brings us to our next subject.

How Do RTSP and WebRTC Work Together?

As WebRTC also uses RTP for its transport protocol, they are very compatible together. The complication comes from how IP cameras behave.

Most IP cameras directly produce an RTSP stream, acting as an RTSP server. Normally in the case of webcam or cell phone videos, the Red5 Pro (origin) server receives the live stream from those clients. In this way, there is a challenge in connecting a Red5 Pro server, to an IP camera that is also acting as a server. You need a client implementation in the mix, in this case an RTSP client to consume the live stream from the IP camera. Just like an electrical socket, you can’t join two of the same types of connections: the female wall outlet will only take a male plug. Female to female can’t link to each other and you can’t connect servers. You need a converter.

To perform the conversions role, Red5 Pro created the Restreamer Plugin to pull the RTSP stream as a client and re-stream the IP camera’s video out over WebRTC. The live stream can then be delivered over WebRTC to the browser clients.

In Red5 Pro, our primary ingest codecs are H.264 for video and AAC for audio. Normally, the IP cameras use either RTSP or MPEG-TS (the latter not using RTP) to encode media while WebRTC defaults to VP8 (video) and Opus (audio) in most applications. Since all modern browsers now accept H.264 it is faster for Red5 Pro to simply pass the H.264 codec straight through WebRTC while transcoding the AAC codec to Opus. Finally, it routes to WebRTC clients using the WebRTC protocol stack. However, in some use cases older systems might not be able to handle H.264. In those rare cases Red5 Pro converts H.264 to VP8.

Though this post focuses on RTSP IP camera video streaming, it bears mentioning that Red5 Pro Re-streamer also supports various protocols including SRT, Zixi, MPEG-TS and RTSP (MPEG-TS as either multicast or unicast ingest whereas RTSP would be unicast). All of the protocols mentioned support H.264 video and AAC audio codecs.

Red5 Mixers and how do they help?

So far we’ve explained how to get one iP camera streaming to Red5 Pro and out as a WebRTC stream, but we’ve not covered how to get video from multiple IP cameras on to a single web page. This use case is particularly important in monitoring style apps. For example you might want your users to be able to watch many security cameras in a building at once.

One approach to accomplish this is to have each live stream as an individual WebRTC connection rendering each separately on a single page. This means you are letting the browser client do the work of mixing the videos and layout. The downside of this is that it requires that client to do a lot of work increasing the load on the processor, having it handle many incoming connections, and increasing the bandwidth requirements per client. This approach does work well when you only need a few feeds up at any one time, but the methodology breaks down quickly when you start adding more and more IP camera feeds to the page.

A better approach when creating large grids of IP camera streams is to do the work on the server side. Luckily Red5 Pro has a Mixer node available in it’s architecture that you can take advantage of.

Here’s a step by step process for incorporating a grid of the live IP camera streams into a single output over WebRTC.

  1. RTSP Stream Ingestion:Each IP camera stream is ingested into the Red5 Pro server using RTSP. As already discussed, Red5 Pro’s ingest servers can handle these RTSP streams and prepare them for further processing.
  2. Red5 Mixers Configuration:Red5 Mixers are a component of Red5 Pro that allow for the dynamic composition of video streams. They can take multiple video inputs and combine them into a single output stream in a customized layout. In this scenario, each RTSP stream from the cameras acts as an input to the Mixer.
  3. Combining Streams into a Grid Layout:The Mixer combines the RTSP streams into a grid layout. For instance, if you have four IP camera streams, you can configure the Mixer to create a 2×2 grid where each IP camera’s feed occupies one quadrant of the output video. This grid can be dynamically adjusted based on the number of input streams and desired layout, providing flexibility in how the final output is presented.
  4. Processing and Mixing:Once configured, the Mixer processes these inputs in real-time. It overlays, resizes, and synchronizes the streams to form a single coherent video feed. This processing includes handling any timing and synchronization issues to ensure that all camera feeds are displayed smoothly together without lag or jitter.
  5. Encoding the Mixed Stream:The combined grid video is then encoded into a format suitable for WebRTC streaming. This encoding ensures that the output stream meets the requirements for WebRTC, including codec specifications and real-time delivery constraints.
  6. Streaming via WebRTC:The final step is to deliver the combined stream as a WebRTC feed. WebRTC is chosen for its ability to provide low-latency, high-quality streaming directly in browsers without needing additional plugins. Red5 Pro’s server handles the WebRTC signaling and media transport, ensuring that viewers receive the grid view of the combined RTSP streams with minimal delay.

So That’s How it All Works

Fortunately, as we covered earlier, Red5 Pro has already done all this work for you. Find out more by visiting our website and documentation pages. Send any questions you have to info@red5.net or schedule a call with us.