6 Approaches to Low Latency Video Streaming: Which is Most Effective?

Super fast stream of information using WebRTC, UDP

Live video substantially increases user engagement. This fact has dramatically increased the growth of the live video streaming industry. The rise of popular social video streaming apps like Periscope, Tik Tok or Facebook Live, along with live e-Sports and video games broadcasts like Twitch, prove the value of interactivity which can only be attained through… Continue reading 6 Approaches to Low Latency Video Streaming: Which is Most Effective?

Live video substantially increases user engagement. This fact has dramatically increased the growth of the live video streaming industry. The rise of popular social video streaming apps like Periscope, Tik Tok or Facebook Live, along with live e-Sports and video games broadcasts like Twitch, prove the value of interactivity which can only be attained through real-time latency. With this in mind, the current low latency streaming solutions can be divided into two categories of latency: those that are within a two to three second range (not practical for true interactive use cases), and others in the sub-second or real-time category. Of course in order to be fully effective, the solution must also be scalable into the millions.

This post will present the latest developments in updates to HTTP based live streaming protocols including CMAF, the community-built low latency HLS (LHLS), and Apple’s low latency HLS (ALHLS), as well as the WebSockets plus MSE approach used by Wowza for their Ultra Low Latency solution; all of which result in an inadequate latency of two to three seconds. The second part of the paper will focus on the true real-time latency solution implemented by Red5 Pro which is based on WebRTC.

Traditionally, the pursuit of real-time latency video streaming focused on experiences that absolutely require sub-second latency: trivia games, gambling, auctions, etc. However, the immediacy of social media, and the widespread use of mobile devices and drones, have driven the demand for real-time interactivity in other industries such as sports broadcasting and surveillance. We believe that the future of live streaming is fully interactive, and the kinds of “low latencies” that users put up with today will be intolerable within the next five years. Any protocol that still measures latency in seconds is– at best– a stopgap solution in a race to zero latency.



The increasing popularity of live streaming platforms and use cases with live interactions brought attention to the status quo of live streaming protocols and the latency that they can achieve. The latency figure is one of the most relevant parameters because high latency may make it impossible to interact with the live stream or may create a poor user experience. Figure 1 shows a classification of latency and what protocols can achieve each of it.

Figure 1: Latency classification and protocols that can achieve it. Source [1].

The maximum latency allowable in a stream depends on the use case. For instance, for real-time communication, it is generally considered that the experience starts to degrade above 200ms of latency (0.2 seconds!) – beyond this limit conversations start to become more challenging [2]. Live sports are also very sensitive because the last thing viewers want is a push notification informing them of the result of a match that hasn’t yet finished on their live stream. On a similar note, different providers may broadcast an event with different latencies and this can make neighbors ruin the climax of a sport event.

The gambling and auction industries have embraced live streaming as well. The creation of online casinos with real-life dealers emphasises the importance of reducing the critical time between each action by a dealer or player to minimize interruption and maintain flow. Additionally, auction houses expanded their online presence allowing users to bid virtually while watching a live stream.  Logically, the only way to guarantee a correctly synchronized bid is with sub-second latency.

The reason the video streaming industry hasn’t moved towards lower latency protocols is simply because Content Delivery Networks (CDNs) have built up infrastructure over many years that rely on HTTP. It’s only logical to try and make incremental improvements in the hopes that it will be good enough as opposed to rethink your entire technology stack. For this reason it’s been much easier for startups like Red5 Pro to take on implementing scalable real-time latency streaming architecture.

The following sections will describe HTTP based live streaming protocols like MPEG-DASH, HLS and its low latency versions including CMAF, community low latency HLS (LHLS) and Apple’s low latency HLS (ALHLS). These solutions will be compared to WebRTC and its use in Red5 Pro to achieve sub-500ms latency.

HTTP Based Live Streaming

HTTP based live streaming protocols like HLS and MPEG-DASH divide a video stream into small chunks which reside on an HTTP server. In this way the individual video chunks can be downloaded by a video player via TCP. This allows the video to traverse firewalls easily, where it can be delivered as it is watched. This minimizes caching and enables viewers to seek to a different position just by retrieving the corresponding video chunks. Moreover, it is scalable because it allows Content Delivery Networks (CDN) to leverage the capacity of their HTTP networks to deliver streaming content instead of relying upon smaller networks of dedicated streaming servers.

Apple developed its own HTTP Live Streaming solution called HLS. Initially, Apple recommended a chunk size of 10 seconds and made their player buffer three segments before starting the stream. As such, those settings became the standard for HTTP based live streaming. The main problem of this solution is that it has 30 to 40 seconds latency which is due to the settings mentioned above. The first approach taken to lower the latency figure is to reduce the size of the chunks to be only a few seconds. This is also called Short Segment HLS and the lowest latency it can provide is around 6 seconds. A drawback of this method is that it shortens the group-of-pictures (GOP) lengths which lowers the visual quality. Moreover, having a chunk duration that is only a few multiples of the round-trip-time leads to unstable delivery infrastructure in CDNs. Therefore, simply changing the settings in the protocol was not enough to achieve the low latency required by real-time live video experiences, but it was necessary to create new protocols and formats or enhance the existing ones.

The latest solutions to bring HTTP Live Streaming latency below 4 seconds are Common Media Application Format (CMAF), community low latency HLS (LHLS) and Apple’s low latency HLS (ALHLS).

Common Media Application Format – CMAF

CMAF was created in 2017 through a joint effort between Microsoft and Apple. CMAF is a standardized container that can hold video, audio, or text data which is deployed using HLS or DASH. The advantage of CMAF is that media segments can be referenced simultaneously by HLS playlists and DASH manifests. This allows content owners to store only one set of files which in turn doubles the cache hit rate and makes CDNs more efficient.

CMAF containers are not able to reduce latency by themselves, rather they need to be paired with encoders, CDNs and players so that the overall system enables low latency. A high level example of the media distribution system is shown in Figure 2.

Figure 2: High level diagram of the distribution of an HTTP based live stream. Source [2]

The first requirement of the system is chunk encoding, where a chunk is defined as the smallest referenceable unit that contains at least a moof and mdat atom. One or more chunks are combined together to form a fragment which in turn can be combined to form a segment. A standard CMAF media segment is encoded with a single moof and mdat atom, where the mdat holds a single IDR (Instantaneous Decode Refresh) frame which is required to begin every segment. However, when using chunk encoding each segment will hold a series of chunks which are defined as a moof/mdat tuple where only the first tuple holds an IDR frame. This allows the encoder to output each chunk for delivery right after encoding it, while with standard CMAF it would be necessary to wait until the entire segment is encoded.  The two different encodings are shown in Figure 3.

Figure 3: Chunked encoding of a CMAF segment.  Source [2]

When the encoder begins processing a new segment it makes a POST request to the ingest origin server and uses HTTP 1.1 chunked transfer encoding to send the encoded CMAF chunks immediately after processing them. As an example, if the encoder is producing 4 seconds segments with 30 frames per second then it would make a POST request to the origin every 4 seconds and each of the 120 frames would be sent using chunked transfer encoding.

Next, the chunks ingested into the origin are delivered using HTTP chunked transfer encoding to a CDN where Edge servers make them available to players using the same delivery protocol. The player uses the manifest or playlist associated with a stream to determine the Edge to connect to and makes a GET request to retrieve the segment. Additionally, the player’s starting algorithm impacts the latency based on how the segments and their chunks are fetched and decoded.

An advantage of chunked encoding is that, as long as last-mile throughput is greater than the encoded segment bitrate, segments are delivered to the player with consistent timing which is independent of the throughput. That’s because the Edge server can only send the chunks to the player as quickly as it receives them from the Origin. At the same time though, this consistency prevents the players from using the segment download time to estimate the last-mile throughput which is required for adaptive bitrate switching. In fact, when using chunked encoding a standard throughput estimation algorithm will determine that the throughput is exactly equal to the encoded bitrate, which will prevent the player from switching up. Therefore, players need to implement more advanced throughput estimation algorithms to switch between different stream bitrates successfully.

Laboratory tests have shown that CMAF, in a controlled environment, can achieve end-to-end latencies as low as 600ms. However, the latency figure is less impressive when CMAF is used in real world cases where the geographic dispersion between the encoder, origin, edge and clients increases the round-trip time between them. The current proof-of-concepts when deployed over the open internet show a sustainable Quality of Experience (QoE) only when the end-to-end latency is around 3 seconds, of which 1.5 to 2 seconds reside in the player buffer [1]. While this is an improvement over the latency of vanilla HLS and DASH it is still not low enough to enable real-time live video experiences.

Community Low Latency HLS – LHLS

Community low latency HLS (LHLS) is a joint effort of the HLS.js community with others including Mux, JWPlayer, Wowza, Elemental and Akamai to collaborate on a community driven approach to implement low latency streaming using HLS. The first approach to low latency HLS was carried out by Twitter’s Periscope in mid-2017 to use on their own platform. The goal of LHLS is to provide low latency in the 2 to 5 seconds range while still being scalable and backward compatible with standard HLS so players can fall back to it if needed.

LHLS is able to reduce the latency by using two approaches:

  1. Leveraging HTTP/1.1 chunked transport for segments
  2. Announcing segments before they are available

Chunked transport allows LHLS to download segments while they are being created. In fact, while standard HLS normally buffers video frames and aggregates them until multiple seconds of video are available, chunked transport allows the server to make the frames available as they are being delivered by the encoder. Therefore, by announcing segments before they are available the protocol allows a player to request the segments as they are produced as shown in Figure 4. Moreover, if a player requests a segment that does not exist or is incomplete, it will receive it automatically as soon as the segment becomes available. This approach is very similar to CMAF’s with the main difference being that LHLS uses the MPEG Transport Stream container. Announcing segments before they are available allows LHLS to reduce the latency introduced by the buffer offset. The purpose of this buffer is to allow a player to have time to load both the manifest as well as the segments before it can fill up its buffer. By anticipating the segment creation and location a player can anticipate which files need to be loaded and thus the overall latency is reduced. These features allow LHLS to keep its scalability over standard CDNs which can be exploited right away thanks to their support for HTTP/1.1 chunked transfer. Similarly to CMAF, LHLS players will need to use advanced bandwidth estimation algorithms to be able to determine the bandwidth available on the client for adaptive bitrate streaming. While LHLS is a step forward into low latency video streaming and a considerable improvement over the latencies attainable with standard HLS, it is still far from being suitable for real-time live video experiences as its 3 to 5 seconds end-to-end latency is still too high for those use cases.

Figure 4: Buffer of segments in LHLS playlist.

Apple’s Low Latency HLS – ALHSL

Apple extended the HLS protocol to enable low latency video streaming while still maintaining the same degree of scalability. The new low latency mode lowers video latencies over public networks into the range of standard television broadcasts where initial tests show a latency that is on-par with CMAF and LHLS in the range of 3 to 10 seconds [4]. The extended protocol includes new features that will need to be implemented by the backend tools and content delivery systems to achieve low latency live streams.

Apple’s low latency HLS consists in generating partial segments of media also called parts in TS or CMAF chunks of around 250 to 300ms in duration, which contain several complete video or audio frames. To increase the processing speed, these parts are advertised at the head of the HLS playlist so that players can download smaller groups of frames right after they are encoded. To further increase the speed with which the parts are retrieved by a player the new protocol uses HTTP/2 push. In this way, once the player makes a playlist request it will automatically receive the parts via HTTP/2 push.

A client-server communication has been added as well to allow HTTP requests for a playlist to be held until a particular segment or part is available. Additionally, this makes it possible to create “delta” playlists which contain only some of the segments of the overall playlist. In this way, a player could download the full playlist only once and then use the much smaller delta playlist to get the latest few segments. Another feature introduced allows playlist responses for a particular rendition to contain information about the most recent chunks and segments available in another rendition. This would allow the player to make a jump into another rendition without needing to make a full playlist request before it starts the switch. It should be noted that the latter feature while being supported is not completely specced out and even Apple’s demos do not support it yet.

A big difference between standard HLS and ALHLS is the significant increase in the state that needs to be communicated between the playlist generation process and the encoder process. Moreover, implementing the new protocol will require HTTP/2. While HTTP/2 is supported by the main CDN companies, the HTTP/2 push functionality, which is key in ALHLS, is not yet widely implemented. Even where implemented it’s usually by the use of the preload keyword in the Link headers found in the origin response. This causes the CDN to link together in its cache the HLS playlist and the media it references. Considering that ALHLS requires pushing the media along with the playlist response then it is necessary to use the same Edge for the playlist and media requests [3]. This will impact all CDN vendors because their systems have been built to separate the responsibility for playlist and media delivery, since the two functionalities have very different scale requirements.

At the time of writing, Apple’s beta ALHLS is only compatible with iOS devices even though they represent only a small part of the HLS ecosystem. In fact, players like HLS.js and Video.js are used to deliver HLS streams to a large number of non-Apple devices. Implementing ALHLS on those devices will not be an easy feat because of the choice of technologies like HTTP/2. That is because HTTP/2 is still a young technology and the tools to work with it are still limited, including the web APIs available in browsers. These factors coupled with the still high latency figure make ALHLS a less attractive technology for live interactive video experiences.

WebSockets and Media Source Extensions (MSE)

WebSockets and Media Source Extensions (MSE) can be used together to deliver low latency live streams with end-to-end latencies around 3 seconds. This is the approach used by Wowza in their proprietary WOWZ technology which offers under 3 seconds of latency when used with Wowza’s own player. It’s also worth noting that others groups like NanoCosmos have taken a similar approach using MSE and WebSockets.

WOWZ is a TCP based messaging protocol that establishes a WebSocket between server and client to allow for bidirectional data flow to deliver audio/video data and support interactivity. On the client side the packets, which represent segments of audio and video, are stored in a 250ms buffer before being delivered to the MSE API for playback. A diagram of the system is shown in Figure 5.

Figure 5: Client – Server system diagram when using WebSockets and MSE.

The main drawbacks of this approach are the lack of support for MSE on iOS devices and the inability of the protocol to leverage a traditional HTTP infrastructure. Additionally, WOWZ currently does not support Adaptive Bitrate and this can lower the user experience for clients that do not have a fast or reliable connection. These factors make this approach less attractive than CMAF and LHLS. Furthermore, the relatively high, end-to-end latency of 3 seconds makes it not suitable for live interactive video experiences.


WebRTC (Web Real-Time Communication) is a standards-based, open-source project supported by Apple, Google, Microsoft, Mozilla, and Opera that provides web browsers and mobile applications with real-time communication via simple APIs. It allows audio and video communication to work inside web pages by allowing direct peer-to-peer communication, eliminating the need to install plugins or download native apps.

The network protocol stack of WebRTC is shown in Figure 6. Unlike HLS and MPEG-DASH which are TCP based, WebRTC is UDP based. It’s worth noting that it is possible to deliver WebRTC over TCP, but for simplicity, and because the majority of WebRTC use cases leverage UDP, we focus on it in this paper. UDP is not concerned with the order of the data, rather it delivers each packet to the application the moment it arrives. Instead of queuing packets and waiting for them to load like TCP based protocols, WebRTC focuses on  the dropped packets. This is done either by using NACK to retransmit the most critical packets or by Packet Loss Concealment – estimating to some extent what should have been in the missing packet. In order to meet all the requirements of WebRTC, the browser needs a large supporting cast of protocols and services above it to implement congestion and flow control, traverse the many layers of NATs and firewalls, negotiate the parameters for each stream, and provide encryption of user data.

WebRTC is the technology used by Red5 Pro both on the publishing and subscribing side to achieve sub-500ms end-to-end latency. Additionally, Red5 Pro uses a cluster architecture to scale and support millions of concurrent users. The following sections will delve into the details of Red5 Pro’s approach.

Figure 6: Network protocol stack of WebRTC. Source [5].

Red5 Pro Server

The Red5 Pro Server supports fully scalable real-time, live streaming applications. Built on the open source Red5 project, the Red5 Pro Server is a standalone server distribution that provides all the possibilities from the open source Red5 project with the addition of custom streaming plugins. It enables connections between various end-points including Red5 Pro’s mobile SDKs, iOS and Android, and browser-based clients via WebRTC, Flash/RTMP (for legacy purposes), or HLS.

For cross-device compatibility, Red5 Pro provides the HTML5 SDK which can be used to create web apps supported by major browsers Chrome, Safari, Firefox and– with automatic RTMP fallback– Internet Explorer. Mobile SDKs can be used to build mobile applications that are fully compatible as well.

By leveraging existing cloud infrastructure, Red5 Pro’s Auto Scaling Solution automatically scales the system up or down in response to real-time conditions. Under the operating logic of a Stream Manager– a Red5 Pro Server Application that manages traffic and monitors server usage– clusters or NodeGroups– a group of one or more active server nodes– are established in the geographic regions where the streaming will be happening.

Each node in the architecture is a Red5 Pro server instance; either an Edge, Origin or Relay. Simply put, the Origin accepts broadcasters, the Edge accepts subscribers and the Relay connects them. Refer to Figure 7 to see a Red5 Pro cluster with the connections between the different entities. The Stream Manager works in real-time as it processes live stream information to add or remove server nodes depending on the current traffic demands, as well as supplying an Origin or Edge end point to Broadcasters and Subscribers respectively. Relays are created between multiple Origins or Edges so that more viewers or broadcasters can be added. This allows more users to join and synchronize with the stream without disrupting current viewers. Most notably, the Autoscaling solution maintains sub-500 millisecond end-to-end latency while scaling into the millions.

Figure 7: Sample Red5 Pro cluster showing the connections between a Broadcaster or Encoder,
Origins, Relays, Edges and Subscribers.

Red5 Pro clusters can be deployed on AWS, Google Cloud Platform, Azure and Digital Ocean. A notable advantage of Digital Ocean is that it substantially lowers bandwidth costs to be equal to, or better than, what CDNs charge for HLS or MPEG DASH delivery. With better performance at a competitive price, there’s no excuse for poor latency.  Additionally, Red5 Pro can also be deployed on other hosting providers or CDN networks (e.g. Limelight Networks) by simply modifying the targeted cloud API. Offline deployments with custom server-side apps are also supported for increased security, compliance, better performance or other requirements as needed. This high degree of flexibility and customization increases usability and practicality adapting to various needs.

Cluster Architecture

Red5 Pro enables developers to deploy servers in clusters to allow for unlimited scaling for their live streaming application, as shown above in Figure 7. Red5 Pro features an Auto Scaling solution that can be deployed on a cloud platform such as Google Compute Engine, Amazon’s AWS, Microsoft’s Azure and Digital Ocean.

Autoscaling refers to the ability to scale server instances on the fly when the size of  the network traffic is unknown or the change in traffic fluctuates greatly. However, autoscaling can also be used when the network size is stable and traffic conditions are known in advance. Autoscaling also optimizes operation costs by having an automatic cluster management in place which is able to monitor Nodes (Red5 Pro Server instances) and add or remove new Nodes which reduces operating costs by ensuring the efficient use of servers.

The auto-scaling process is carried out by the Red5 Pro Autoscaler– a software module built into the Red5 Pro Stream Manager that is capable of smart Node management in real-time. Simply put, the Autoscaler adds or removes server nodes (Edges, Relays and Origins) from a NodeGroup based on load conditions in real-time without manual intervention. The scale-out and scale-in procedures are shown in Figure 8 and 9. Figure 8 shows how the Stream Manager spins up and provisions a new instance to add an Edge node to an existing cluster while Figure 9 shows the reverse procedure to remove an Edge.

Figure 8: Scale-out procedure to add a new Edge node to the Red5 Pro cluster.

Figure 9: Scale-in procedure to remove an Edge node from the Red5 Pro cluster.

Besides Autoscaling, Red5 Pro includes a Stream Manager event scheduling API [6] that can be used to increase the capacity of a cluster for specific events held at a predetermined time. The process consists in creating a scheduled event with a certain scale policy and launch configuration that will be used to create a new node group automatically before the start of the event. The main difference between the scheduling API and Autoscaling is that the latter can only scale the cluster based on the current load and new servers may take around 40-90 seconds to start up (spin up times are dependent on the cloud network’s implementation). Therefore, whenever practical it is preferable to use the event scheduling API to make sure the cluster will not be undersized in the beginning of a live streaming event.

Cloud Platform Agnostic Approach

Red5 Pro is an agnostic hosting platform that supports custom plugins and features a flexible API that can lock into any back-end. Custom Cloud Controller plugins allow Red5 Pro to execute the abstract commands of the Stream Manager against the API of a chosen Cloud Platform or back-end. This allows existing applications to integrate with Red5 Pro to reduce their latency and increase scalability without resorting to rebuilding the entire back-end. A great example of this flexibility is Red5 Pro’s implementation on Limelight’s Network for the Limelight RTS offering which delivers live broadcast-quality video from anywhere in the world to anywhere in the world, with less than one second of latency. It’s worth noting that Limelight Networks’ solution does achieve sub-500 ms latency, but they chose not to guarantee this in their marketing.

Offline deployments with custom server-side apps are also supported for increased security, compliance, better performance or other requirements as needed. This is typical for government use cases and Novetta’s Ageon ISR is a great example of a Red5 Pro server offline deployment.


WebRTC is UDP based, and thus there is no guarantee that every packet will be delivered to the receiver or that packets will be delivered in order. Therefore, WebRTC has to deal with dropped packets and this can be achieved with NACK to retransmit the most critical packets, packet loss concealment measures like Forward Error Correction (FEC) or a combination of the two.

Accordingly, Red5 Pro implements a NACK approach to overcome the network packet loss. When acting as a sender, Red5 Pro caches the packets for 1 second after transmission. The receiver inspects the received packets and if it detects a missing sequence number it sends a RTCP NACK packet to the sender. When the sender receives a NACK packet it checks if there is a keyframe in transmission. If so, the NACK packets for past frames are ignored, otherwise the sender retrieves from the cache the packet that has been NACKed and resends it.

As a receiver Red5 Pro is constantly checking for sequence number continuity in the packets so that missing ones trigger the transmission of a NACK packet. When such a packet is sent, the receiver will wait for the sender’s response for a time equal to the round-trip-time. If the sender does not send the missing packet then the receiver assumes that there is a key frame in transmission and begins parsing the packets it already has for that next keyframe.

Adaptive Bitrate

Adaptive Bitrate is a process that works by detecting a client’s bandwidth and CPU capacity and in real-time adjusting the quality of the media stream as needed. It requires the use of an encoder which can encode a single source media (video or audio) at multiple bit rates. Then, the player on the client will switch between the different encodings depending upon available resources.

HTTP Based Live streaming protocols like HLS and MPEG-DASH support Adaptive Bitrate by encoding the video segments at a variety of different bit rates which are published in the HLS playlist and MPEG-DASH manifest. In this way, while the content is being played back, the client uses a bit rate adaptation (ABR) algorithm to automatically select the segment with the highest bitrate possible that can be downloaded in time for playback without causing stalls or re-buffering events in the playback.

Red5 Pro server supports Adaptive Bitrate as well by allowing to publish multiple variants of a single video stream and delivering to the subscribers the best variant given their available bandwidth. In particular, Red5 Pro allows for:

  • Publishing multiple provisioned streams.
  • Publishing to a transcoder node within a Red5 Pro deployment that generates provisioned streams.
  • Subscribing to an adaptive bitrate stream that will allow dynamic upgrading and downgrading of the stream quality based on network conditions.

Figure 10 shows an overview of the system when different variants of the same stream are published using a media encoder. The process works as follows:

  1. A provision is provided to the Stream Manager to specify the different variants of the stream that will be published.
  2. The Stream Manager selects a suitable Origin and provisions it to expect multiple versions of the same stream.
  3. The Stream Manager returns a JSON object with the details of the Origin endpoint that should be used to broadcast.
  4. The Broadcaster can then use its media encoder to publish to the Origin.

Figure 10: Overview of the system when a media encoder is used to publish multiple versions of a stream.

Figure 11 shows an overview of the system when using a Transcoder. The process works as follows:

  1. A provision is provided to the Stream Manager. It specifies the different variants of the
    stream that will be published. Then, the Stream Manager API is used to request a
    Transcoder endpoint.
  2. The Stream Manager selects a suitable Origin and Transcoder. The Origin is provisioned
    to expect multiple versions of the same stream while the Transcoder is provisioned with the details of the different stream variants that it needs to create.
  3. The Stream Manager returns a JSON object with the details of the Transcoder endpoint
    that should be used to broadcast
  4. The Broadcaster can then start to publish a single stream to the Transcoder.

Figure 11: Overview of the system when a Transcoder server is used to generate multiple versions of the same stream.

With a provision available from the Stream Manager and multiple variant broadcasts being streamed (either through a media encoder or Transcoder) it is possible to subscribe to an adaptive bitrate stream. When a client subscribes it receives the stream variant with the highest quality given its network conditions. This is achieved by having the Edge interact with the subscriber client. In particular, the Edge and Subscriber will use a RTCP message called Receiver Estimated Maximum Bitrate (REMB) to provide a bandwidth estimation that the Edge will use to make the corresponding adjustments to video quality. In particular, the Edge will deliver the video with the highest quality given the estimated bandwidth.



This paper presented the main protocols in use or proposed in the video streaming industry to achieve low latency. CMAF, LHLS and ALHLS represent a considerable improvement over the latency of standard HLS while continuing to be scalable, but their 3 to 10 seconds latency is still not low enough, especially considering use cases featuring interactive video experiences and real-time communication. Wowza’s solution based on WebSockets and MSE achieves the same latency of the aforementioned protocols but has an even higher barrier of adoption because of its lack of features like Adaptive Bitrate and its MSE API requirement which has not yet been  satisfied by iOS devices. WebRTC is the only technology that achieves the delivery of sub-second latency video streams. Red5 Pro is the best example of how WebRTC can be used for both publishers and subscribers to achieve sub-500ms end-to-end latency live streams while still being able to serve millions of users thanks to its clustering architecture.

The race towards zero latency is well underway. Despite modifications, HTTP based protocols continue to fall behind as Red5 Pro’s WebRTC based solution consistently leads the pack. The always evolving marketplace will encourage new use cases to emerge as well. Built for flexibility, designed for scalability, and engineered for real-time latency, Red5 Pro will power the live video streams of the future.


[1] https://mux.com/blog/the-low-latency-live-streaming-landscape-in-2019/

[2] https://www.akamai.com/us/en/multimedia/documents/white-paper/low-latency-streaming-cmaf-whitepaper.pdf

[3] https://mux.com/blog/the-community-gave-us-low-latency-live-streaming-then-apple-took-it-away/

[4] https://www.ibc.org/publish/the-impact-of-apples-protocol-extension-for-low-latency-hls/4395.article

[5] https://princiya777.wordpress.com/2017/08/19/webrtc-architecture-protocols/

[6] https://www.red5pro.com/docs/development/rest-api-v-400/smapi-events/