Featured Story

February 25, 2025 by Red5 Team

Last updated July 6, 2026

Keys to Optimizing End-to-End Latency with WebRTC

Table of Contents

While it’s well understood that WebRTC supports the lowest streaming latencies, it’s important to recognize that results can vary widely depending on how encoding, transcoding, decoding and other steps in the processing chain are executed. If you are new to this topic, start here. In this blog you will learn keys to optimizing end-to-end latency with WebRTC.

We at Red5 are witnessing these disparities with increasing regularity as providers of streaming services who were counting on other suppliers’ WebRTC streaming platforms come to us looking for a solution that lives up to their expectations. In one recent case in point, we learned that a streaming service counting on a WebRTC platform to support real-time throughput to a user base registering 30+ million visits monthly was not getting anything close to the promised performance.

Not only was end-to-end latency exceeding 1 second in contrast to the typical 250ms latencies our customers are registering. Technical issues were becoming more frequent; the absence of customizable API options was hampering service development, and, amid these problems, the platform supplier had the audacity to increase prices when their contract came up for renewal.

Based on extensive research and the recommendations of industry experts, the service provider transitioned to use of Red5’s Experience Delivery Network (XDN) platform, which cut end-to-end latency by 1 second with improvements in user engagements registering at up to 20%. The customer retained hosting control throughout what proved to be a smooth transition to the XDN architecture while achieving custom API development and the integration of new functionalities with significantly improved flexibility in pricing options.

The takeaway from such experiences is clear. From a pure transport standpoint, WebRTC applies the same standardized protocol stack in all use cases to cut end-to-end streaming latencies to a fraction of the multi-second lag times common to conventional HTTP streaming. But, otherwise, it’s up to the provider of a WebRTC streaming platform to determine what the ultimate performance parameters will be. More accurately, the way in which each vendor implements the open WebRTC protocol, will greatly determine the resulting latency in real-world applications.

Latency Reduction Implemented by Red5

We’ve often highlighted the superior outcomes produced by the Red5 XDN architecture in conjunction with things like:

scaling to mass audiences with support for largescale participation in interactive video communications,
enabling seamless operations with fail-safe redundancy in multi-cloud environments, and
adding ancillary features like DRM security, dynamic advertising, multi-camera viewing and personalized overlays.

Here we want to spotlight the advances in end-to-end A/V processing that contribute to superior XDN performance at key points in the distribution chain, as shown in Figure 1, including:

capturing and encoding live camera feeds,
mixing and transcoding inputs at the points of XDN ingestion,
orchestration of distribution across Relay and Edge Nodes, and
the decoding process on end users’ devices.

Points of Latency Reduction in XDN Implementations

These points of latency reduction in the processing chain are instrumental to Red5’s ability to support the industry’s lowest end-to-end streaming latencies, whether it be over the turnkey Red5 Cloud platform or through customers’ use of Red5 SDKs and TrueTime application tools to instantiate their own XDNs. At continental and intercontinental distances, Red5 is consistently registering end-to-end latencies at 250ms or less, and in more localized streaming scenarios, including in-venue deployments, customers are reporting latencies as low as 50ms.

Of course, when it comes to encoding, transcoding and other options that are ancillary to the basic functions of the XDN infrastructure, it’s up to our customers whether they employ the solutions we’re highlighting here. But extensive testing as well as experience in the field has led us to these as the lowest-latency solutions we know of, all of which have been pre-integrated with the XDN architecture.

With implementation of all these latency-reducing processing measures as described at length below, the distribution of latencies we achieve as registered across our customer ecosystem is illustrated in Figure 2.

Latency Reductions Across the End-to-End XDN Footprint

Stage	Latency
Capture	16.7ms
Encode	< 40ms
Ingest Transport	~10ms
Transcode	~7ms
Mix	~50ms
Scale	~40ms
Egress Delivery	~10ms
Decode	9ms
Total	<250ms

Breakdown of latency contributions across each stage of the XDN streaming pipeline, showing how cumulative optimizations enable sub-250 ms end-to-end real-time delivery.

While everything we’re discussing here is contributing to the persistent 250ms end-to-end latency our customers are registering, it’s important to note that there can be rare instances of disruptions to optimum user experience that are beyond our control. For example, at customers’ premises a poorly performing home wi-fi system can significantly add to latency as can a device with very slow screen rendering metrics.

Where outside plant is concerned, backups to malfunctioning routers on internet hops can take a split second to kick in, amplifiers on a cable network can be hit by power outages or brownouts, congestion or interference and other issues impacting a mobile cell coverage area can slow delivery of the XDN-streamed content to on-the-go viewers.

Typical Latency Contributions over Transport Links

Fortunately, we’ve found that these kinds of events are so rare as to be inconsequential to our customers’ overall end-to-end latency performances. Consequently, beyond the encoding, transcoding and other processing latencies, the primary contributors to end-to-end latency in a fiber-dominated internet and cloud transport environment are the times it takes signals traveling at lightspeed to traverse the various legs of the distribution path.

If a user in New York is viewing video streamed from San Francisco, that distance is going to add about 30ms to overall end-to-end latency no matter what. The lightspeed contribution to latency if that same content is streamed to a viewer in Hong Kong would be three times that much. These continental and transcontinental distance extremes are balanced by the fact that the vast majority of stream flows are at sub-continental distances.

Beyond the distance factor, there are very minor overhead contributions to transport latency from routers and optoelectronic equipment. Our calculations of latency norms on the transport legs from encoder output through to local access distribution as shown in Figure 2 take these miniscule overhead latency contributions into account.

Red5 Support for Reducing Transport Latencies

Of course, it’s always a good idea to avoid unnecessarily long distances, as can occur when points of ingest and egress on a cloud-hosted WebRTC transport platform are unusually far from streaming sources and end users.

With Red5 a big factor in keeping such distances to a minimum is the multi-cloud compatibility of XDN architecture. The availability of multiple cloud options that either have been pre-integrated with XDN architecture or can be tied into the XDN via the widely used Terraform open-source multi-cloud toolset maximizes customers’ flexibility to optimize the proximities of XDN Origin and Edge Nodes for their use cases.

And there’s another option Red5 has made available for customers who want to minimize XDN access and egress distances even farther in the interest of reducing latency to the greatest extent possible. This has to do with utilizing 5G transport for streaming content to cell sites that have been equipped by mobile carriers affiliated with the Amazon Web Services (AWS) Wavelength program to support direct on-ramps to the AWS cloud.

Red5 is the only real-time streaming supplier authorized for pre-integration with AWS Wavelength Zones. As a result, cell sites connected to Wavelength Zones make it possible for users of XDN infrastructure to bypass multiple internet hops in their connections to XDN Nodes hosted by AWS. Wavelength Zones linking 5G mobile cell sites and AWS infrastructure are now operative in 77 AWS Availability Zones across 24 AWS Regions worldwide.

Capture and Encoding

Turning to the other steps taken by Red5 to reduce latency, we start with the front end of the distribution chain where there can be great variations in the time it takes for encoders to capture and encode the live A/V feeds prior to their transmission over contribution transport links to the XDN platform. From the latency-reduction perspective, the best results from encoders currently on the market are achieved with purpose-built appliances that utilize microprocessors supporting hardware acceleration.

While software-based encoding systems running on commodity infrastructure have grown in popularity with conventional HTTP-based live content streaming, the latencies incurred by overheads related to memory transfers and OS tasks run counter to optimum results. That’s not to say use of software encoders is out of the question if a customer is willing to tolerate latencies exceeding our 250ms end-to-end benchmark, but the latency contributions from software encoding can be significant.

Here it’s important to note, it doesn’t have to be this way, as demonstrated by our success at creating a software-based transcoding platform running on standard-issue commodity CPUs for real-time multi-profile delivery of content over XDN infrastructure, as described below. But, apparently, software encoding suppliers haven’t been motivated to go to the lengths necessary to work out how to build encoders suited for live real-time video applications.

For anyone looking for software encoders that might be up to the task, the problem is, not only are there wide variation in software encoder latencies; it’s also hard to determine whether there are optimal settings available with a given solution that can be activated at lower latency without compromising on the output quality required for a given use case. For example, one leading software encoding vendor says its default latency is 300ms, but with reductions in resolution levels it can get to what it calls without specifying numbers “ultra-low latency.” Obviously, a 300ms encoding latency contribution would take streaming out of the real-time domain, and who knows what compromises on quality would be necessary to meet real-time latency goals.

Still, if a service provider’s choice is to use software encoding, the possibility of getting to a sweet spot where adjusting settings can achieve lower latencies without making quality unacceptable should not be dismissed out of hand. Too often, people unwittingly use defaults that are meant for higher resolution profiles than they need, leaving them with the impression that they can’t meet their end-to-end latency goals when, in fact, they could.

But it’s clear that, as things stand in the current marketplace, streaming at high resolutions generally puts software encoding out of reach for real-time streaming applications, as was demonstrated by a Comcast-led team’s use of software-based encoding in a recent project aimed at cutting HTTP-based streaming latency. As reported by ATEME, the project’s encoding vendor, it took a lot of work to get the software encoding latency for a 4K stream down to 800ms, “That’s a big challenge to have this low latency with a purely software solution,” said ATEME CTO Mikael Raulet during a session devoted to describing the Comcast project at the International Broadcasting Convention in September.

In any event, taking into account all the points of latency contribution, our calculations show we need encoding latency to be no greater than 40ms if we’re to stay within the 250ms end-to-end latency boundary. The best way to do that with commercially available encoders used in postproduction output is through use of encoders equipped with hardware acceleration.

Filling the bill are two partners offering hardware encoders fully integrated with XDN technology, Osprey Video and Videon Labs. Osprey’s Talon encoder gets the job done with a mere 10ms contribution to latency. Videon’s LiveEdge platform, which along with encoding supports processing for Docker containers executing feature enhancements to the content stream, contributes about 40ms. The hardware acceleration in Osprey’s case is supplied by AMD, while Videon relies on Qualcomm’s Snapdragon technology.

Along with the encoding process itself, there’s also a latency contribution incurred with capturing camera outputs before the encoding process can start. The encoders we’re profiling here are able to start processing video upon reception of the first frame, which means in the case of 60 frames-per-second input, the capture time for a single frame is 16.67ms. When added to the 10ms-per-frame encoding latency incurred with the Osprey Talon, this brings the total latency contribution of video capture and encoding to 20.67ms.

Transport Formatting Compatibility between Encoding Output and XDN Ingest

It’s also important that the transport format options available with encoder output match the options available at the ingest end of the contribution stream. XDN customers using the Osprey and Videon encoders have several commonly used transport options, all of which are supported by XDN architecture, including the open standards protocols Real-Time Messaging Protocol (RTMP), Real-Time Streaming Protocol (RTSP) and Secure Reliable Transport (SRT) and the proprietary Zixi Software-Defined Video Platform (SDVP).

In addition, both of these vendors’ latest encoders support the WebRTC-HTTP Ingestion Protocol (WHIP), which means their output can be transmitted to XDN infrastructure over WebRTC. And in instances where video has been encoded using the types of encoders used for MPEG-TS delivery over legacy TV networks, that content can be directly sent to XDN infrastructure for ingestion into the real-time streaming domain.

The bottom line is that any entity engaged in video distribution whether for consumer or other applications who is looking for a real-time multi-directional streaming solution can be assured access to XDN infrastructure doesn’t stand in the way of meeting their goals. The next place to look for what Red5 has done to maximize latency reduction has to do with what happens when the live content reaches the XDN platform.

Minimizing Transcoding Latency

Here the available options have been designed to achieve the best end-to-end latency results no matter how a customer wants to configure streaming to end users. One pair of options has to do with whether the customer wants to deliver the encoded stream as is to end users or perform transcoding. In the case of as-is transmission, the content is directly ingested into an Origin Node with no delay.

Transcoding adds latency but helps to optimize continuous throughput to end users by generating multiple bitrate profiles that accommodate the adaptive bitrate (ABR) approach to streaming. This saves on consumption of access bandwidth by assigning bitrates matched to end device screen resolutions, and it enables shifts to lower bitrates when local access congestion threatens continuous throughput at higher bitrates.

If a customer chooses to transcode the stream, XDN architecture is designed to execute transcoding prior to ingestion into the Origin Nodes that are used to stream the content, sometimes with the aid of Relay Nodes in mass distribution scenarios, to Edge Nodes serving local segments of the audience base. Transcoding prior to ingestion allows all profiles assigned to a content flow to be delivered to all Edge Nodes, where intelligence monitoring the types of receiving devices in use and how local access networks (LANs) are impacting bandwidth availability on those devices conditions determines which bitrate profile to send moment to moment in transmissions to each end user.

There’s no latency added with this bitrate selection process at XDN Edge Nodes. The Red5 approach to delivering all profiles to each edge node is far superior in terms of costs and reliability compared to the approach taken by other WebRTC streaming suppliers where the stream from the original encoding source isn’t transcoded until it reaches each edge location.

In these cases, the processing costs associated with transcoding are multiplied many times over, depending on the sizes of audiences served. And as the number of transcoding instances increases, so does the chance that malfunctions in processing will add to those costs. The cloud bandwidth costs of transporting multiple bitrate profiles to all locations pales next to those multi-transcoding costs.

Even more important to the Red5 transcoding advantage is the significant latency reduction we’ve achieved by building our transcoder from scratch rather than transcoding with reliance on FFMPEG or third-party solutions. Our CPU-based Transcoder Nodes utilize our real-time Cauldron stream processor, which employs our native code modules or Brews to initiate and configure stream scaling and compression resources into a Multiple Independent Re-encoded Video (MIRV) processor.

Operating in real time, these MIRV-configured resources decode and split a received asset for recompression into multiple bitrate profiles. The resulting transcoding latency, typically outpaces alternative approaches by hundreds of milliseconds.

Minimizing Mixing Latency

Another major advance performing a commonly required function at reduced latency through use of the Cauldron and Brew mechanisms entails the use of our Mixer technology to mix two or more streams from separate sources into a single composite stream. The Mixer software, running on cloud servers dedicated to Red5, can be implemented in two approaches to generating a composite stream for transcoding or direct ingestion into the Origin Node.

One approach employs the Cauldron stream processor in conjunction with Brews configured to acquire the raw video frames decoded by Cauldron from the incoming live streams entering the Mixer. The multiple frames are combined in time-synched sequence into a single frame as dictated by the mixing layout and then encoded for single stream output to end users.

The alternative approach utilizes the Chromium Embedded Framework (CEF), which employs an HTML5 page linked to the streams entering the XDN domain via WebRTC. The HTML5 page follows the composite stream layout as described in HTML, CSS and JavaScript to publish a single stream for distribution over the XDN.

These variations in how the Mixer is employed allow developers to employ whichever approach to selecting individual streams to be used in a combined stream works best for them.

End User Device Processing Latencies

Taking into account what we’ve already said about the way we avoid adding latency at Edge Nodes through our pre-ingestion approach to transcoding, the next and last point in the distribution chain where processing plays a role in latency is with decoding content for display on users’ devices. And, as mentioned earlier, the time it takes a device to render the decoded signal on screen can be a significant drag on end results.

We, of course, have no control over device decoding and rendering speeds, but we can report that, based on a comprehensive analysis of decoding speeds across multiple generations of handsets we find the average decoding latency is just 8.5ms with a range that goes from 5ms to 20ms. As for rendering latency, this isn’t an issue with small screens or with recent vintage computer screens.

In the connected TV and game console domains where streaming comes into play, newer models, like newer handsets and computers, are equipped with hardware acceleration provided by standalone GPUs or GPU/CPU hybrids, which cuts decoding time to a few milliseconds. As for rendering, newer smart TV displays are extremely fast, and gradually older smart TVs and sets relying on external IP streaming devices will disappear. Meanwhile, on balance, the device barriers to achieving our end-to-end latency goals are not plentiful enough to stand in the way of the proliferation of real-time streaming services.

Future Steps toward Latency Reduction

Indeed, we at Red5 are eager to move forward with further measures that will lower our end-to-end latency norm even further while providing those who need the lowest latencies allowed by the speed of light new options to reach their goals. Over the long haul, we believe internet-enabled network engagement supporting applications that require lightspeed connectivity will be a given of life in cyberspace. Until then, we’ll continue to push as hard as we can in that direction.

The lowest hanging fruit when it comes to next steps entails putting hardware acceleration to work more widely across the XDN infrastructure. This doesn’t require any new inventions, but it does require a more rational marketplace where the true costs and availability of microprocessors produced by Nvidia, AMD and many other suppliers aren’t distorted by the AI demand bubble.

Latency-reducing use cases for hardware acceleration under improved market conditions would emerge with cost-effective support for software-based encoding and transcoding in the cloud. In Red5’s case, we’ve already learned through extensive testing with hardware acceleration that, when the time is right, we’ll be able to shave even more milliseconds from our Cauldron transcoding and Mixer solutions than we already have. Moreover, hardware accelerators will facilitate our customers’ transitions to the use of next-generation codecs in real-time streaming.

We also envision putting hardware acceleration to use in Edge Nodes, which would allow us to scale to higher numbers of users with each node instance. Along with lowering power and other operational costs, this would reduce latency by eliminating the back-and-forth between processors and network cards that’s intrinsic to the use of CPUs.

These prospects provide XDN users confidence that there’s a roadmap to even greater latency performance when market conditions allow, which is sure to occur at some point. Meanwhile, there’s no need to wait for future developments to take advantage of XDN architecture and payload processing to achieve far lower end-to-end latencies with WebRTC streaming than can be found elsewhere.

Try Red5 For Free

🔥 Looking for a fully managed, globally distributed streaming PaaS solution? Start using Red5 Cloud today! No credit card required. Free 50 GB of streaming each month.

Looking for a server software designed for ultra-low latency streaming at scale? Start Red5 Pro 30-day trial today!

Not sure what solution would solve your streaming challenges best? Watch a short Youtube video explaining the difference between the two solutions, or reach out to our team to discuss your case.

Red5 Team

The Red5 Team brings together software, DevOps, and quality assurance engineers, project managers, support experts, sales managers, and marketers with deep experience in live video, audio, and data streaming. Since 2005, the team has built solutions used by startups, global enterprises, and developers worldwide to power interactive real-time experiences. Beyond core streaming technology, the Red5 Team shares insights on industry trends, best practices, and product updates to help organizations innovate and scale with confidence.

By Red5 Team

The Red5 Team brings together software, DevOps, and quality assurance engineers, project managers, support experts, sales managers, and marketers with deep experience in live video, audio, and data streaming. Since 2005, the team has built solutions used by startups, global enterprises, and developers worldwide to power interactive real-time experiences. Beyond core streaming technology, the Red5 Team shares insights on industry trends, best practices, and product updates to help organizations innovate and scale with confidence.

View all of Red5 Team's posts.