3 Key Approaches for Scaling WebRTC: SFU, MCU, and XDN


Some have characterized WebRTC as the gateway to a communication promised land that can connect people virtually anywhere in real time. Indeed, if you want to deliver real-time communications and low-latency video streaming in the browser, WebRTC is currently your only widely supported option. The challenge has been, as developers in this space know only… Continue reading 3 Key Approaches for Scaling WebRTC: SFU, MCU, and XDN

Some have characterized WebRTC as the gateway to a communication promised land that can connect people virtually anywhere in real time. Indeed, if you want to deliver real-time communications and low-latency video streaming in the browser, WebRTC is currently your only widely supported option. The challenge has been, as developers in this space know only too well, that WebRTC, by itself, doesn’t scale. To truly scale WebRTC applications, you have to leverage topologies designed to extend its capabilities. Each of these topologies has its pros and cons, of course, and which is best for any given application depends on the anticipated use cases.

In this post, we’ll look at the advantages and disadvantages of four topologies designed to support low-latency video streaming in the browser: P2P, SFU, MCU, and XDN. Three of these attempt to resolve WebRTC’s scalability issues with varying results: SFU, MCU, and XDN. Only XDN, however, provides a new approach to delivering video-enriched experiences at scale that even the most experienced WebRTC developers may be unfamiliar with. We’ll summarize the advantages and drawbacks of each topology to help you decide which is right for your WebRTC-based streaming application.

Peer to Peer (P2P)

P2P, or mesh, is the easiest to set up and most cost-effective architecture you can use in a WebRTC application; it’s also the least scalable. In a mesh topology, two or more peers (clients) talk to each other directly or, when on opposite sides of a firewall, via a TURN server which relays audio, video, and data streaming to them.

P2P applications can be resource intensive, because the burden of encoding and decoding streams is offloaded to each peer, which is why they perform best when you only have a small number of concurrent users. Although you can achieve some level of scalability by configuring a P2P mesh network, you still end up with a resource-intensive and inefficient application. On the plus side, mesh provides the best end-to-end encryption because it doesn’t depend on a centralized server to encode/decode streams.

Peer-to-peer streaming: n-1 upstreams and n-1 downstreams


  • Easy to set up using a basic WebRTC implementation
  • Better privacy
  • Cost-effective because it doesn’t require a media server


  • Can connect only a small number of participants without noticeable decline in streaming quality
  • CPU intensive because the processing of streams is offloaded to each peer

Selective Forwarding Unit (SFU)

SFU is perhaps the most popular architecture in modern WebRTC applications. Put simply, an SFU is a pass-through routing system designed to offload some of the stream processing from the client to the server. Each participant sends their encrypted media streams once to a centralized server, which then forwards those streams—without further processing—to the other participants. Although an SFU is more upload efficient than a mesh topology—for example, on a call with n participants you have only one upstream per client rather than n–1 upstreams—clients still have to decode and render multiple (n-1) downstreams which, as the number of participants grows, will drain client resources, reduce video quality, and thus limit scalability.

SFU streaming: 1 upstream and n-1 downstreams


  • Requires less upload bandwidth than a P2P mesh
  • Streams are separate, so each can be rendered individually – allowing full control of the layout of streams on the client side


  • Limited scalability
  • Higher operational costs as some CPU load is shifted to the server

Multipoint Conferencing Unit (MCU)

MCU has been the backbone of large-group conferencing systems for many years. This is not surprising given its ability to deliver stable, low-bandwidth audio/video streaming by offloading much of the CPU-intensive stream processing from the client to a centralized server.

In an MCU topology, each client is connected to a centralized MCU server, which decodes, rescales, and mixes all incoming streams into a single new stream and then encodes and sends it to all clients. Although bandwidth friendly and less CPU intensive on the client side—instead of processing multiple streams, clients have to decode and render only one stream—an MCU solution is rather expensive on the server side. Transcoding multiple audio and video streams into a single stream and then encoding it at multiple resolutions in real time is very CPU intensive, and the more clients connect to the server the higher its CPU requirements.

One of the greatest benefits of an MCU, however, is its ease of integration with external (legacy) business systems because it combines all incoming streams into a single, easy-to-consume outgoing stream.

MCU streaming


  • Bandwidth friendly
  • Composite output simplifies integration with external services
  • Your only option when you need to combine many streams (unless you use an XDN approach, which we’ll discuss next)


  • CPU intensive; the more streams the bigger your server
  • Single point-of-failure risk because of centralized processing
  • High operational costs due to computational load on server

Experience Delivery Network (XDN)

XDN presents a new approach to extending WebRTC that combines elements of both SFU and MCU. Unlike SFU and MCU, however, XDN uses a cloud-based clustering architecture rather than a centralized server to tackle WebRTC’s scalability issues. Each cluster consists of a system of distributed server instances, or nodes, and includes origin, relay, or edge nodes. Within this topology, any given origin node ingests incoming streams and communicates with multiple edge nodes to support thousands of participants. For larger deployments, origin nodes can stream to relay nodes, which in turn stream to multiple edge nodes to scale the cluster even further to realize virtually unlimited scale. An XDN also supports so-called mixers that can be deployed between publishers and origin nodes, to combine many streams into a single stream that is then passed on to an origin node. A mixer is essentially an MCU that can be clustered in order to combine many more streams than a single-server MCU could handle.

The brains behind this operational cluster is a stream manager, which controls the nodes, performs load balancing, and replaces nodes should they fail for any reason. The stream manager is also responsible for connecting participants in a live-streaming event: it connects publishers to an origin node and subscribers to an edge node that is geographically closest to them.

XDN streaming, incorporating a Mixer node (M), which is essentially an MCU that can be clustered in order to combine more streams than a single-server MCU.

XDN is designed to meet the demands of the next era of online engagement: video-enriched, interactive experiences that are bound to reshape our personal and commercial lives. It combines elements of SFU and MCU to deliver these video-rich streaming experiences at scale. For an example of how this topology can be deployed today, have a look at our Red5 Pro platform.

By default, Red5 Pro’s implementation of XDN uses an SFU mechanism for stream delivery. And similar to MCU, it includes a low-level stream-processing engine. The server has full control over the media packets, which it can use to manipulate data as necessary before sending it out to the requesting client.

XDN streaming can achieve unlimited scale through clustering

Using this hybrid architecture, WebRTC applications can scale beyond any SFU- or MCU-driven multiparty conferencing applications. For example, Red5 Pro can support real-time, low-latency sports events with watch parties as well as other events that require real-time synchronization among any number of participants, such as auctions, live shopping, or sports betting.

Built upon a cross-cloud platform, Red5 Pro also supports a wide array of hosting options, including AWS, Azure, GCP, and DigitalOcean as well as more than a dozen other IaaS providers using Terraform. Bare metal server installations can be integrated as well by installing a cloud-like API such as vSphere in a private data center. Having this variety of virtual server hosting platforms maximizes flexibility for full scalability and geographic dispersion.


  • Fully scalable
  • Flexible hosting with cross-cloud distribution
  • Provides the best of both MCU and SFU in one scalable package


  • More complex than other real-time streaming topologies

Which Architecture Is Right for You?

P2P, SFU, and MCU each have their advantages and drawbacks with regard to building low-latency streaming solutions. P2P works best for simple video chat applications with two to four concurrent participants. SFUs and MCUs resolve WebRTC’s scalability issues to a certain extent. An SFU is best used in multiparty conferencing applications that don’t have too many concurrent participants. Using an MCU is essential for large conference applications, fan walls, and other use cases where lots of participant streams need to be rendered on a single WebRTC client. None of these architectures, however, offer the flexibility and scalability of a hybrid cluster topology such as XDN, which supports real-time connectivity across any configuration of any number of concurrent users at any distance.

To learn more about the unlimited scalability of WebRTC on the Red5 Pro XDN platform contact info@red5pro.com or schedule a call. For our take on the inevitable shift to real-time video experiences, see our introduction to XDN.