Last updated

GPU vs CPU. How to Cut Live Streaming & AI Processing Costs?

GPU vs CPU. How to Cut Live Streaming and AI Processing Costs
SHARE

The question of how to cut live streaming and AI processing costs using GPU vs CPU is becoming increasingly common as businesses look to optimize operating costs and improve ROI. In this blog, you will learn practical ways to reduce costs across real-time streaming workflows as well as AI processing in live streaming environments. If… Continue reading GPU vs CPU. How to Cut Live Streaming & AI Processing Costs?

The question of how to cut live streaming and AI processing costs using GPU vs CPU is becoming increasingly common as businesses look to optimize operating costs and improve ROI. In this blog, you will learn practical ways to reduce costs across real-time streaming workflows as well as AI processing in live streaming environments. If you want to skip ahead, you can jump directly to the second part using the table of contents below.

Introduction

Minimizing the costs of operations in real-time video streaming has never been more important or more challenging than it is now with the use of AI models beginning to permeate the marketplace.

Not only do users have to determine whether a given real-time streaming platform and their choices of AI models meet expectations. They must be sure they’re spending as little as possible on all the processing end to end that goes into achieving their goals.

Keeping live stream processing costs as low as possible requires looking under the operational hood to see how a given platform provider’s processing requirements are divided between reliance on low-cost Central Processing Units (CPUs) and much costlier hardware acceleration via Graphics Processing Units (GPUs) or other custom microprocessors used by cloud providers such as Google’s Tensor Processing Units (TCUs) and Amazon’s Graviton. This touches on how money is spent on cloud resources at every juncture, from basic ingestion, transcoding, mixing and routing functionalities to executing the intelligence determining how all the elements are packaged for each end user.

Of course, in the case of the hyperdynamic AI processing marketplace, there’s only so much to convey in the way of general cost guidance beyond exploring how to take advantage of the fact that processing AI doesn’t always have to involve GPUs or other high-cost processors. However, as shall be seen, there are other ways beyond the GPU vs. CPU assessment that streaming platforms can be used to keep AI usage costs in check.

The good news is, costs don’t need to be the barrier that some observers claim when it comes to executing highly scalable real-time streaming solutions or any type of live streaming with AI. Our goal here is to shed some light on why that’s the case.

Part 1: Cost Containment In Real-Time Streaming 

CPU and GPU Per-Instance Costs

CategoryCPUGPU
Typical Cloud Instance CostUnder $1 per instance$3–$10+ per instance
Processing StyleSerial execution of complex tasksMassive parallel execution for large data sets
Price StabilityPredictable year-to-yearVolatile, varies by vendor and demand
Impact on Real-Time StreamingWorks for logic + workflow tasks if tunedGreat for heavy AI/encoding but expensive at scale

CPU vs GPU In Real-Time Streaming: Cost and Architecture Overview

Judging the cost implications of CPU vs. GPU usage is complicated but begins with the basic contrast in costs per cloud instances. Looking at prices charged by various cloud providers, we find cloud instances involving use of CPUs running in the sub-dollar range and per-instance single GPU usage costs ranging from around $3 to over $10.

These differences can be monumental to the bottom-line consequences of choices impacting digital commerce at the semiconductor level. However, looking at pricing in that light beyond simply acknowledging there’s a gap in basic unit usage costs is not all that helpful, given how the respective chip designs are shaped by what they’re meant to accomplish. 

CPU and GPU cores alike handle millions of calculations per second and rely on internal memory to facilitate performance. But CPUs, as the brains of a computer, are optimized for serially processing a complex set of related tasks whereas GPUs are designed to run calculations in parallel to accommodate processing of immense amounts of data related to a given task. Integrated circuit boards comprising a CPU can have anywhere from one to dozens of cores, each far more powerful than a GPU core, while the latest generation GPUs come with core counts exceeding 15,000. 

Complicating matters, the costs of GPU instances and clusters vary greatly among hyperscalers and smaller providers of cloud computing services. To cite just one example of the pricing disparities among hyperscalers, the per-instance hourly cost in 2025 as measured in Verda’s study for use of an eight-GPU cluster of Nvidia’s H100 supporting 80GB of processing capacity was $88.49 on the Google Cloud Platform, $55.04 on AWS and $98.32 on Microsoft Azure. As a further data point, Oracle Cloud Infrastructure offers an 8× H100 bare-metal node under the instance shape BM.GPU.H100.8 — priced at US $80.00/hour (i.e. US $10.00 per GPU-hour).

Cloud ProviderHourly Price
(8× H100)
Price per GPU-HourSource
Google Cloud Platform (GCP)$88.49/hr$11.06Verda GPU Pricing Comparison (2025)
Amazon Web Services (AWS)$55.04/hr$6.88Verda GPU Pricing Comparison (2025)
Microsoft Azure$98.32/hr$12.29Verda GPU Pricing Comparison (2025)
Oracle Cloud Infrastructure (OCI)$80.00/hr$10.00Oracle official GPU pricing (updated 2025)

H100 (8-GPU) Hourly Pricing Comparison — 2025

Moreover, GPU or related commodity hardware accelerator costs are in a highly volatile state as options multiply with AMD, Intel and ever more entrants, including newcomers like OpenAI’s chip-making venture, Apple’s anticipated foray into the field and many others looking to dent Nvidia’s dominance. What that means to any narrowing of the price gap between traditional CPUs and GPUs over time is hard to say.

But, for now, the analysis regarding the semiconductor processing cost impact on real-time streaming platforms is fairly straight forward. Here the complexities relevant to cost containment revolve around the fact that anything done with CPUs to minimize processing costs must also meet quality performance goals without compromising on latency performance. 

Cost Containment at the Speed of Thought

In other words, the use of CPUs should not result in pushing end-to-end live streaming latencies above the real-time benchmark set by Red5 at 250ms, which ensures there’s no interval lengthy enough to be perceived as lag time between source and reception. Moreover, the more processing that’s required to support the kind of enhanced use-case versatility intrinsic to Red5’s Experience Delivery Network Architecture, the more challenging cost containment through reliance on CPUs becomes.

It’s a tall order, but Red5’s has met the challenge with architectural and software engineering that enables reliance on CPUs to support a vast majority of real-time interactive streaming use cases. This is cost containment with end user experiences delivered, as our tagline says, “at the speed of thought.” (See Addendum for an overview of XDN Architecture, distinctions between the Red5 Cloud software-as-a-service (SaaS) and the Red5 Pro self-hosted approach to XDN implementation, and the use-case versatility enabled by the Red5 TrueTime toolsets, SDKs and other innovations.)

We begin with exploration of the ways in which Red5’s support for all real-time streaming use cases serves as a cost-saving foundation for any instances involving use of AI. The breadth of applications, with support as needed for social or collaborative interactivity, encompasses sports, esports and other live content distribution, multiplayer gaming, sports betting, virtual casino gambling, online auctions, e-commerce, distributed live production, large-scale public safety surveillance, and much else, including the use of extended reality (XR) technology wherever applicable in these scenarios. 

real-time streaming use cases powered by Red5

Examples of real-time use cases powered by Red5.

Of course, Red5 welcomes any customer’s decision to use hardware acceleration in conjunction with streaming over XDN infrastructures, but the net cost/performance benchmarks we’re setting have to do with the fact that we rely on CPUs to execute all the commands flowing over our customers’ real-time streaming networks. This touches on how money is spent on cloud resources at every juncture, from basic ingestion, transcoding, mixing and routing functionalities to executing the intelligence determining how all the elements are packaged for each end user.

Encoding & Ingress

Real-time streaming cost optimization starts with approaches to encoding output from the cameras used to capture live content. Here it should be noted that the latency incurred with the capturing process is determined by the number of frames captured per second, which diminishes as the frame rate goes up, as in the comparison between 60fps at 16.7ms per frame versus 33ms at 30fps.

The latency contributions from the encoding used to digitize and compress raw camera output depends on the choice of encoders by whoever is overseeing the output, whether it’s for consumer services or any of the myriad use cases underway in other market sectors. Generally speaking, if achieving top-level quality at 4K UHD or 1080p HD resolutions at the lowest possible latencies is the goal, users will need to choose encoders equipped with GPU, ASIC or FPGA hardware acceleration as opposed to software systems that run on commodity CPU-based appliances. 

Making the right choices is especially critical to keeping latency under control for real-time streaming when processing-intensive advanced encoding like High Efficiency Video Coding (HEVC or H.265) is in play. Whether the encoding needed to attain these quality levels is based on HEVC or the more universally employed Advanced Video Coding (AVC or H.264), the latency contribution from hardware-accelerated encoders must be kept at or below 40ms to ensure end-to-end latency doesn’t exceed our 250ms target.

But it’s also important to note that there are plenty of scenarios where lower quality requirements allow latency goals to be met with the right choice of low-cost software encoders. In looking for such options, it’s important to remember not to simply rely on default settings meant for the highest resolution profiles when lower settings could cut latencies to the required levels.

Another consideration in encoding cost containment is that the costs of high-quality encoding output can often be lowered with use of cameras that come with built-in encoding capabilities. Examples are prevalent among PTZ (Pan, Tilt, Zoom) and RTSP IP cameras used in surveillance and remote news gathering.

Where the providers of real-time streaming platforms have a role to play with encoding cost containment has to do with ensuring that their customers have the widest possible latitude to make choices that meet their output quality and cost containment goals without exceeding the latency requirements. There’s quite a bit to consider here.

First of all, real-time streaming platforms must be able to ingest source output transported over all the commonly used frameworks, including not only MPEG-TS, RTSP, RTMP, SRT, Zixi and WebRTC but also MOQ, the forthcoming IETF real-time streaming standard. With MOQ already going into deployment with support from multiple platform providers, including Red5, it’s inevitable that there will be widespread use of direct MOQ transport feeds from sources onto MOQ CDNs. Read our whitepaper “Real-Time Streaming at the Dawning of MOQ” to learn more. 

All of these transport modes are supported for ingestion onto XDN infrastructures. In addition, Red5 makes it easy for customers to capitalize on best-of-breed encoders that support low-latency encoding of live source content destined for XDN ingestion. This includes pre-integrations with encoders supplied by partners like Osprey Video and Videon Labs and open API support for integrations with other encoding platforms selected by customers.  

The XDN platform has also been engineered to take advantage of the IETF’s Enhanced Real-Time Media Protocol (eRTMP) to facilitate seamless XDN ingestion of HEVC-encoded content transported over RTMP from content sources. And we’ve brought HEVC-encoded stream ingestions into play in our interfaces with the Zixi and SRT contribution transport platforms.

More generally, Red5 has added to customers’ flexibility in any use case involving contribution feeds from live production sources. The XDN platform can ingest the content whether HEVC is the only codec in use or the source is also encoding the content for AVC output, in which case both streams will be ingested with frame-accurate synchronization for distribution over the XDN platform. 

Of course, the evolution to advanced coding doesn’t end with HEVC. The cost-saving steps and impacts we’re ascribing here to HEVC will be applied with AOMedia Video 1 (AV1) and, further down the line, Versatile Video Coding (VVC), when it becomes possible to use those codecs at processing latencies conducive to streaming in real time. 

Transcoding 

Beyond encoding and transport ingress support, streaming platform providers have an even greater role to play in cost containment at every point of processing all the way to end users. 

This begins with how transcoding is executed with ingestion of live content, as when users want to generate the adaptive bitrate (ABR) profiles that are used in HTTP streaming to overcome congestion by switching from high to low bitrates and to save bandwidth by matching bitrates to end device screen resolutions.

Performing transcoding with software running on commodity CPU cores avoids hardware acceleration costs, but doing so without adding unacceptable contributions to end-to-end latency presents a big challenge. That’s especially the case when transcoding HEVC is involved, which along with producing ABR profiles can involve transcoding HEVC-encoded content to AVC ABR streams so that all users can receive the content whether or not their devices support HEVC.

The transcoding cost efficiencies enabled by Red5 begin with the fact that transcoding is executed prior to live content ingestion into the Origin Nodes that are used to stream the content, sometimes with the aid of Relay Nodes, to Edge Nodes serving local segments of the audience base. Transcoding prior to ingestion allows all profiles assigned to a content flow to be delivered to all Edge Nodes, where intelligence monitoring the types of receiving devices in use and how local access networks are impacting bandwidth availability on those devices determines which bitrate profile to send moment to moment in transmissions to each end user. 

There’s no latency added with this bitrate selection process at XDN Edge Nodes. Streaming all profiles to each Edge Node is far superior in terms of costs and reliability compared to the approach taken by other WebRTC streaming suppliers, where the stream from the original encoding source isn’t transcoded until it reaches each edge location. 

In these cases, the processing costs associated with transcoding are multiplied many times over, depending on the size of audiences served. And as the number of transcoding instances increases, so does the chance that malfunctions in processing will add to those costs. The cloud bandwidth costs of transporting multiple bitrate profiles to all locations pales next to those multi-transcoding costs. 

Even more important to the Red5 transcoding cost advantage is the significant latency reduction we’ve achieved by building our transcoder from scratch rather than transcoding with reliance on FFMPEG or third-party solutions. Our CPU-based Transcoder Nodes utilize our real-time Cauldron stream processor, which employs our native code modules or Brews to initiate and configure stream scaling and compression resources into a Multiple Independent Re-encoded Video (MIRV) processor. 

Operating in real time, these MIRV-configured resources decode and split a received asset for recompression into multiple bitrate profiles. The resulting transcoding latency, at just 7ms, typically outpaces alternative approaches by hundreds of milliseconds with no need for hardware acceleration.  

Mixing

Another commonly employed function performed by Cauldron and Brew mechanisms processed on CPUs at required latencies engages our Mixer Node technology to mix two or more streams from separate sources into a single composite stream, which never adds more than 50ms to total latency. These capabilities anchor several major applications that can be uniquely created with Red5’s TrueTime Multiview tools in use cases operating over XDN infrastructure. 

One use case, as explained in depth here, involves synchronized real-time monitoring and analysis of live surveillance camera feeds covering wide swaths of territory. At points of XDN ingestion the input is mixed for consolidated delivery to operations centers where responders can extract any feed or combinations of feeds for manual or automated AI scrutiny. 

Mixer Node technology is also vital to the unique benefits enabled by the TrueTime Multiview toolset in production workflows connecting dispersed personnel and for selecting multiviewing options streamed to end users. The ability to extract and edit live A/V renderings from a consolidated stream of multiple cameras and other inputs in real-time collaboration across great distances is a game changer in live sports and other event productions. And the Mixer Node technology used with Red5’s Multiview for Fans application allows producers to choose and deliver any number of viewing options presented in thumbnail arrays that audiences can choose from for full-screen display with no disruption to the flow of action as they go from one view to another. 

Processing for Egress at the Edge

The XDN Architecture also makes use of cloud resources for processing at XDN Edge Nodes where live content is off-loaded for access network delivery to end users. Intelligence orchestrated by the XDN Stream Manager enables output, no matter how many streams may be coming in from the Origin/Relay Nodes, to be tuned to the device capabilities, bandwidth availability and demographic profile of the end user. 

This intelligence is essential to enabling XDN infrastructure support for Red5’s approaches to ABR profile streaming, dynamic ad and other content insertions, watermarking, and multiviewing. In all cases we have engineered software solutions that enable processing by CPUs, resulting in a total egress latency that doesn’t exceed 10ms.

There’s also a cost benefit associated with egress stemming from the fact that, by bringing support for HEVC over WebRTC into play, Red5 facilitates significant savings in egress charges assessed by many cloud computing systems. The 50% bitrate reduction attained by using HEVC instead of AVC translates to a 50% egress cost saving on every affected stream running on any cloud platform that charges for egress.

Other Savings Intrinsic to Uses of Red5 Cloud and Red5 Pro

Another aspect to cost saving has to do with the ways Red5 facilitates minimizing spending on public cloud resources. In cases involving deployments of self-hosted Red5 Pro server software tools and SDKs, the cross-cloud capabilities of XDN Architecture allow customers to use whatever combination of cloud computing platforms works for them, not only in terms of reach but also costs.   

In the case of Red5 Cloud deployments on the global Oracle Cloud Infrastructure (OCI), customers benefit from the fact that OCI’s egress and other fees are among the industry’s lowest. Moreover, they only need to pay underlying costs for resources when they’re actively using them. That’s because Red5 takes advantage of OCI’s cloud virtualization technology across the resources dedicated to XDN use, so that when a customer deactivates its use of those resources at any given moment, Red5 can cover their costs by making them available to other users. 

Watch the Red5 Cloud product overview video.

Going even farther, Red5 has secured additional cloud resources to enable the lowest pricing yet seen for real-time streaming in U.S. markets. As described at length in this blog, the recently introduced Red5 Cloud Pay-as-Grow plan allows entry-level customers, without even registering credit cards, to stream up to 50 gigabytes consuming up to 6,000 instance hours per month at no cost on a continuing basis. 

In any month exceeding 50 GB usage, these customers, upon registering payment methods, pay just $0.08 per GB and $0.69 per instance hour, reverting to free usage in the succeeding month at or below 50 GB. Any time customers reach the point where they want to scale to much higher monthly volumes with real-time streaming to any number of users numbering into the millions at continental and transcontinental distances, they can seamlessly transition to use of Red5 Cloud resources running on the OCI platform.

Part 2: AI Cost Containment In Live Streaming

The abundance of generative AI solutions available for applications directly impacting payloads delivered over A/V streaming infrastructures of all kinds has added a complex dimension to cost assessment on the part of decision makers everywhere. Here, as mentioned earlier, the bedrock cost-containment principle is that decision makers should avoid acting on the assumption that GPUs are always needed to handle tasks related to the various permutations of AI, including Large Language Model (LLM) and Vision Language Model (VLM) versions of generative AI (genAI) and the decision-making capabilities of Agentic AI. 

It’s also important to note that, as shown by innovations introduced by Red5, a real-time streaming platform can have a major role to play in facilitating efficient, cost-saving uses of generative, agentic and the new AI formulations that will be exploding into economic reality in the years ahead. We’ll explore these developments in the second part of this section.

Choosing Semiconductors for AI Deep Learning and Inference Processing

As in the case of assessing CPU vs. GPU possibilities in streaming-related processing, decision makers should take into account the full scope of the outcomes of using either type of processor for AI beyond just the basic per-instance cost differential. If analysis shows that a GPU instance incurs four or five times the cost of a CPU instance but delivers ten times the execution results, the GPU path turns out to be least costly.

ScenarioCPUGPU
Training generative AI modelsUsed only for serially sequenced machine-learning algorithms that do not require parallel computingUsually the right choice due to immense processing power
Inference execution in commercial AI-assisted solutionsOften used to handle real-time execution tasksRequired only when inference must process massive workloads in real time, such as speech-to-text, translation, visual object analysis, or any task requiring very high QPS
Cost efficiency considerationsLower per-instance cost but slower; may be more costly overall if throughput is insufficientHigher per-instance cost but can deliver 10x the execution results, making it the lower-cost option when throughput matters
Decision criteriaBalanced perspective required; may be preferred for inference unless workload is massivePreferred when parallel processing or extremely high throughput is required

Comparing scenarios where a CPU or GPU should be chosen.

That said, when it comes to the deep learning that goes into training generative AI models for a particular application, GPUs with their immense processing power are usually the right choice, but there are some training scenarios where CPUs are a better fit, such as serially sequenced machine-learning algorithms that don’t require parallel computing. 

On the other hand, CPUs can often be used to handle processing required for execution at the inference stage of commercially marketed AI-assisted solutions, even when the task must be completed in real time. With AI inference execution, nuances determining which type of processor might be the best choice dictate a more balanced perspective than in the case of the GPU-dominated deep learning processes. 

For example, it takes GPUs to train a recommendations system to know how to analyze user behavior and tie it to available viewing options, but CPUs can take care of executing a recommendation based on a particular user’s interactions with content. But if the inference stage involves real-time execution of massive processing workloads, as happens in speech-to-text and language translations, visual object analysis and anything else that requires very high numbers of queries-per-second (QPS), GPU-enabled parallel processing is the way to go. 

Taking all these factors into consideration, it’s clear that keeping the processing costs associated with AI in check requires an open-minded, thorough and highly informed approach to choosing processors and their cloud hosts. As IT solutions supplier HorizonIQ puts it in a recent blog, “By strategically combining the computational capabilities of CPUs and GPUs, businesses can maximize performance while minimizing costs.”

The Streaming Platform’s Role in AI Cost Containment

Apart from users’ choices of AI processing environments, there are steps as illustrated by Red5 that providers of streaming platforms can take to play a direct role in AI-related cost containment. This starts with facilitating ready access to a broad selection of AI solutions.

In addition, the platform’s architecture should support the flexibility and functionality essential to getting the most out of AI applications, including those that are meant to support specific use cases running on the platform as well as those that can be applied to enhance performance of the platform itself. On all counts, decision makers will find that nothing rivals Red5’s XDN Architecture as the foundation for using AI applications with real-time multidirectional streaming. 

As described in this blog, Red5 has pre-integrated AI solutions offered by a wide range of partners for use in live production, streamed in-venue sports viewing experiences, multilingual closed captioning, public safety surveillance, interactive dispersed user engagement in live sports and esports, online betting, video games, ecommerce and telehealth, and much else. Leveraging its portfolio of open APIs, Red5 is continually adding new AI-solution integrations.

Watch a short video on this topic on YouTube by our co-founder and CEO, Chris Allen.

At the same time, Red5 has vastly expanded the usefulness of AI in live streaming by introducing for the first time anywhere a means by which A/V frames live streamed at any commonly used frames-per-second (fps) rate can be instantly extracted on the fly not only from Red5 Cloud XDN streams but from conventional HTTP-based streams as well. This is a game changer in the use of applications aided by LLMs or VLMs, which, until now, have not been executed to their full potential owing to the time consumed by pulling frames for AI processing.

Diagram illustrating real-time frame extraction and encoding for AI in live streaming using SRT.

Whether HTTP- or XDN-based streaming is involved, the Red5 Cloud frame-extraction service employs Red5’s unique cloud-hosted real-time transcoding process to deliver extracted frames in whatever resolutions and bitrate profiles users choose on the service portal. Within the Red5 Cloud domain, Red5 has taken an AI-agnostic approach to making the broadest range of applications available for real-time frame extraction. As a result, just about any relevant AI solution can be programmatically applied on the Red5 Cloud platform to execute tasks at warp speed.

The possibilities are limitless. Some AI-related applications now enabled by real-time A/V frame extraction relate to surveillance or other scenarios where there’s a need to detect specific objects or faces, variations in street and highway traffic patterns, fires and other emergency-related developments, criminal activity, and other types of developments in the live stream flow, including unwanted audio or video elements such as swear words or indecent exposure.

Other applications have to do with pulling and formatting high-quality still images to capture key moments in sports competition, generate promotional thumbnails, or pinpoint defective parts in factory production, to name just some types of use cases that can benefit in ways far beyond what once could only be done by extracting screenshots from stored files. 

Conclusion

There will be much more to come as we expand our partnerships and devise new ways to put AI to use with XDN Architecture. Now and into the foreseeable future Red5 will be at the forefront providing the means by which AI can be applied to maximum effect at the lowest possible costs.

And with the savings resulting from low-latency use of software running on CPUs, Red5 will remain the go-to source for real-time streaming that minimizes the need for hardware-accelerated processing at all points end to end.

Addendum: XDN Overview

Whether real-time streaming is implemented though the Red5 Cloud service or via Red5 Pro,

XDN infrastructure runs on commodity servers in locations that can be dynamically configured by the XDN Stream Manager to seamlessly operate in cloud clusters as Origin, Relay and Edge Nodes. One or more Origin Nodes in a cluster serve to ingest and stream encoded content out to Relay Nodes, each of which serves an array of Edge Nodes that deliver live unicast streams to end points in their assigned service areas. 

Origin Node placements can be optimized to accommodate ingestion of massive volumes of streams at minimum latency in interactive scenarios serving all end users through co-locations with XDN Edge Nodes. By leveraging both containerized and virtual machine-based iterations of datacenter virtualization, the XDN platform enables the flexibility and speed of resource utilization that is essential to unlimited scalability and fail-safe redundancy.

The multidirectional flexibility of this architecture can be applied so that whenever anyone at any moment of a streamed session chooses to generate a video, that user’s stream will be conveyed along with any marketing enhancements to everyone else. Whatever the use case might be, it doesn’t matter whether just a few, thousands or even millions of users are engaged or where they are.

Regardless of usage scale or transmission distance, persistent performance across all our customers’ XDN applications confirms the latency incurred from live production output through ingest at Origin Nodes and transport through Relay and Edge Nodes to rendering on end-user devices is no greater than 250ms – hence our use of the term “real-time streaming,” which applies to any instance where, as in digital voice communications, the lag time is imperceptible to users.

Latencies can drop as low as 50ms when end-to-end distances are limited to in-region footprints. Lower latencies in the case of 5G network streaming are also attained when XDN access and egress distances are reduced with connectivity to cell sites that have been equipped by mobile carriers affiliated with the Amazon Web Services (AWS) Wavelength program to support direct on-ramps to the AWS cloud. Red5 is the only real-time streaming supplier authorized for pre-integration with AWS Wavelength Zones, which allows customers’ 5G stream flows to bypass multiple internet hops in their connections to XDN Nodes hosted by AWS.

Wherever XDN infrastructure is deployed, cluster-wide redundancy essential to fail-safe operations is enabled by the Stream Manager’s autoscaling mechanism through platform controllers designed to work with cloud providers’ APIs. With comprehensive performance monitoring, the Stream Manager executes the load balancing essential to persistent high performance across the entire infrastructure without manual intervention. And in the event of a malfunctioning node component, it can instantly shift processing to another appliance in that node.

Currently, WebRTC is the most used XDN streaming mode owing to the client-side support provided by all the major browsers, including Chrome, Edge, Firefox, Safari and Opera, which eliminates the need for plug-ins or purpose-built hardware. Alternatively, if a mobile device with built-in client support for RTSP is receiving the stream, the platform transmits via RTSP. The client-optimized flexibility of XDN architecture also extends to packaging ingested RTMP, MPEG-TS and SRT encapsulations for transport over RTP when devices compatible with these protocols can’t be reached via WebRTC or RTSP. 

It’s also important to note that, over time, there’s a good chance the emerging IETF MOQ standard will become the dominant real-time streaming protocol. As discussed in this blog, Red5 is taking steps toward incorporating MOQ into the XDN Architecture for customers’ use once the standard is finalized, which is expected to occur sometime in 2026. 

The basic difference between working in the Red5 Cloud and Red5 Pro environments has to do with how the XDN infrastructure is implemented and managed over time. The full end-to-end real-time multidirectional streaming supported by Red5 Cloud is automatically implemented in accord with user requirements on the global Oracle Cloud Infrastructure (OCI), which spans 50 geographic regions on six continents. 

In response to customers input setting geographical reach, targeted user counts and other basic parameters on their service portals, the Red5 Cloud service instantly activates resources hosted by the global Oracle Cloud Infrastructure for implementations precisely tuned to their needs. The service includes sustained managed support for maintenance, changes in original parameters and other needs through the entire engagement life cycle. 

Each Red5 Cloud instantiation of a customer’s XDN infrastructure and its subsequent modifications remain dedicated exclusively to that customer’s use in perfect alignment with the use case requirements. This is a major departure from the shared usage platforms operated by other suppliers of WebRTC cloud services, where pre-formatted use-case applications are offered on a take-it-or-leave-it basis.

Customers choosing to pursue the Red5 Pro DevOps approach can mount XDN infrastructure in public or private clouds utilizing a comprehensive portfolio of Red5 Pro SDKs and open APIs with recourse to assistance from Red5 personnel. Public cloud XDN infrastructures built with Red5 Pro can operate seamlessly with no loss of latency in cross-cloud scenarios involving the leading cloud providers, including AWS, Google Cloud, Microsoft Azure, OCI and others that have been pre-integrated with the platform, as well as many more that can be integrated for XDN use with the aid of the Terraform open-source multi-cloud toolset.  

Whatever approach customers take to deploying their XDN infrastructures, XDN Architecture accords them unparalleled freedom to respond with speed and precision to new opportunities. The architecture’s reliance on open-source technology and APIs together with the availability of application-specific TrueTime tools bundled with native iOS, macOS, Windows, Android, Linux, and HTML5 SDKs provides the flexibility they need to tailor applications as they see fit.

Red5 uses its open-source APIs to create an ever-expanding ecosystem of partners whose solutions are pre-integrated into XDN Architecture to deliver best-of-breed solutions individually or in whatever combinations work to reduce time to market as well as the time it takes to introduce ongoing service enhancements. The applications range across support for cloud computing, backend transport, asset management, storage, transcoding, packaging, content protection, conventional CDN tie-ins, and a host of value-addedfeatures developed by Red5 and its partners to capitalize on these capabilities.

Try Red5 For Free

🔥 Looking for a fully managed, globally distributed streaming PaaS solution? Start using Red5 Cloud today! No credit card required. Free 50 GB of streaming each month.

Looking for a server software designed for ultra-low latency streaming at scale? Start Red5 Pro 30-day trial today!

Not sure what solution would solve your streaming challenges best? Watch a short Youtube video explaining the difference between the two solutions, or reach out to our team to discuss your case.


The Red5 Team brings together software, DevOps, and quality assurance engineers, project managers, support experts, sales managers, and marketers with deep experience in live video, audio, and data streaming. Since 2005, the team has built solutions used by startups, global enterprises, and developers worldwide to power interactive real-time experiences. Beyond core streaming technology, the Red5 Team shares insights on industry trends, best practices, and product updates to help organizations innovate and scale with confidence.

By Red5 Team

The Red5 Team brings together software, DevOps, and quality assurance engineers, project managers, support experts, sales managers, and marketers with deep experience in live video, audio, and data streaming. Since 2005, the team has built solutions used by startups, global enterprises, and developers worldwide to power interactive real-time experiences. Beyond core streaming technology, the Red5 Team shares insights on industry trends, best practices, and product updates to help organizations innovate and scale with confidence.