Uncategorized

October 28, 2020 by Red5 Team

Last updated May 21, 2026

How Google Meet Implements Audio using Mix-Minus with WebRTC

Table of Contents

Google Meet is a well-used video conferencing solution. As we here at Red5 are quite interested in all things live streaming our System Architect, Davide, decided he would take a look under the covers to see how it works. Specifically, he wanted to explore how Meet handles the audio channels. To be clear, Davide’s approach was a pure black-box reverse engineer of Google Meet. He didn’t have access to their backend or their source code, nor did he decompile anything to find out how this works. He used tools like Chrome’s WebRTC-internals to observe how the system functions and used deductive reasoning as to how it might be working.

TLDR – It appears that Google Meet uses an approach throughout the call where it sends only the three loudest participants using three separate WebRTC audio tracks (A, B, and C).

What is mix-minus?

Mix-minus is an audio engineering term that refers to how the audio is delivered over an internet connection. The process of sending a feed over the internet means that there is a slight delay between when something is said and when it is heard. Since that audio feed is mixed together with all the other audio feeds there is the potential for the the audio from the speaker to get caught up in the same speaker’s microphone. At worst, this causes an ear splitting feedback loop and at best this causes a very distracting echo.

The process of mix-minus subtracts the current speaker’s audio input from the complete audio feed they receive. This is also know as a “clean feed”. Essentially it is a method for ensuring that everyone on the call won’t get any painful or annoying echos. While there are more straightforward applications of mix-minus such as with a telephone caller dialing in to a radio station, implementing mix-minus for a conference call is a little more difficult considering all the extra inputs from the other callers involved.

how does Google Meet implement mix-minus with WebRTC?

The test involved creating a conference with seven participants in Google Chrome, each in separate tabs. Davide then monitored the sessions with Chrome WebRTC internals. All the participants were initially muted and then one participant was unmuted and instructed to talk. As that participant spoke, the WebRTC internals showed that each of them was mapped to only one of the three audio tracks. The same results were achieved with the other participants speaking one at a time. Whenever a participant was the only one talking, their corresponding audio track was receiving data.

Then two participants that had been previously mapped individually to the same audio track were unmuted. For example, Participant 1 spoke and was mapped to track A, then Participant 2 spoke and was mapped to track A as well. Then both Participant 1 and Participant 2 were unmuted and spoke at the same time. When they spoke it resulted in two tracks receiving data. At that point, one of the participants was remapped by the Meet platform to different tracks. After that, each of those two participants spoke individually and were then remapped by the Meet platform to different tracks. So if participants 1 and 2 were initially mapped to track A, when they were talking together tracks A and B received data at the same time. After that, when participant 1 was talking alone I was seeing data only on track A, while for participant 2 only on track B. Therefore, one of the participants had been remapped to a different track.

Repeating the same test with three participants that were initially mapped to the same track replicated the same result with the three participants being remapped to different audio tracks. Additional testing revealed that when two participants are mapped to different tracks, they remain on different tracks the whole time. This suggests that despite multiple participants being mapped to an audio track, the audio track itself can only transport the audio of a single participant. When both participants try to use the same audio track, one is remapped to a different track. That further suggests that Google Meet does not send a mixed stream to the participants.

The three audio tracks (A, B, and C) seem to be shared between participants even though each may see them under a different name. In fact, after running a few tests that forced the remappings it was apparent that even renaming the tracks of one participant resulted in the remappings looking identical to the ones of another participant.

Figure 1 shows a high-level diagram of the architecture that Google Meet may be using. The set of presenters’ audio streams is fed to a Detector that determines the four loudest tracks with their level and feeds their packets to an Audio Engine. When a presenter subscribes to a conference it will receive the three loudest tracks fed to the Audio Engine. If one of these tracks belongs to the presenter, it will be swapped with the remaining one. In this way, each presenter will get a mix-minus audio stream.

Using an example, if there are six presenters and 1, 2, 3, and 4 are talking and the “loudest” are 1, 2, and 3 the presenters would get the streams:

Presenter 1: 4, 2, 3
Presenter 2: 1, 4, 3
Presenter 3: 1, 2, 4
Presenter 4: 1, 2, 3
Presenter 5: 1, 2, 3
Presenter 6: 1, 2, 3

Also, instead of sending the three separate audio tracks to each presenter, it is possible to send a single track that carries the correct mix selected between the four that would be generated on the server-side.

Try Red5 For Free

🔥 Looking for a fully managed, globally distributed streaming PaaS solution? Start using Red5 Cloud today! No credit card required. Free 50 GB of streaming each month.

Looking for a server software designed for ultra-low latency streaming at scale? Start Red5 Pro 30-day trial today!

Not sure what solution would solve your streaming challenges best? Watch a short Youtube video explaining the difference between the two solutions, or reach out to our team to discuss your case.

Red5 Team

The Red5 Team brings together software, DevOps, and quality assurance engineers, project managers, support experts, sales managers, and marketers with deep experience in live video, audio, and data streaming. Since 2005, the team has built solutions used by startups, global enterprises, and developers worldwide to power interactive real-time experiences. Beyond core streaming technology, the Red5 Team shares insights on industry trends, best practices, and product updates to help organizations innovate and scale with confidence.

By Red5 Team

The Red5 Team brings together software, DevOps, and quality assurance engineers, project managers, support experts, sales managers, and marketers with deep experience in live video, audio, and data streaming. Since 2005, the team has built solutions used by startups, global enterprises, and developers worldwide to power interactive real-time experiences. Beyond core streaming technology, the Red5 Team shares insights on industry trends, best practices, and product updates to help organizations innovate and scale with confidence.

View all of Red5 Team's posts.