Virtual Backgrounds using the Red5 WebRTC SDK

SHARE

While online meetings have become an unavoidable part of our daily lives, what can be avoided is having others in the meeting notice how often – or infrequently – you keep your workspace tidy. That is where a good background replacement strategy comes into play. While the Red5 WebRTC SDK does not have any APIs… Continue reading Virtual Backgrounds using the Red5 WebRTC SDK

While online meetings have become an unavoidable part of our daily lives, what can be avoided is having others in the meeting notice how often – or infrequently – you keep your workspace tidy. That is where a good background replacement strategy comes into play.

While the Red5 WebRTC SDK does not have any APIs specifically for virtual backgrounds, it is certainly possible to utilize the SDK along with Insertable Streams and the help of 3rd party libraries to deliver streams with virtual backgrounds to the Red5 Cloud Server. 

It’s important to note that we refer to Red5 Cloud throughout this blog as the server infrastructure, and that’s because it’s the easiest solution to get up and running. However, the approach we outline below works just as well with Red5 Pro. So if you have an on-prem deployment requirement, or want to run it on your own AWS, OCI, Azure, GCP, etc. account, this solution will work with Red5 Pro as well. 

Simple Approach

If you know your customers’ system supports media features that allow them to apply filters or effects, the quickest solution is to inform them how to use such features to apply things such as background blur if that meets your application requirements.

These effects are applied to the media directly – and across multiple programs – prior to being delivered to the Red5 Cloud Server and are much more efficient computationally than the browser-based approaches described in further sections.

In the above screenshot, a background blur effect is applied to the camera from a Macbook Pro with M1.

If your application requirements are to deliver the same experience for all customers across multiple platforms and devices, a background replacement approach will have to differ from simply instructing them how to apply media effects when accessing the camera on their system.

That is where Machine Learning (ML) and the MediaPipe Selfie Segmentation solution comes into play. The solution segments out human figures within 2 meters of the camera, which results in providing a mask which can be used – along with drawing routines on canvas – to show the human figures with a custom background color, image or even blurred background.

Additionally, Insertable Streams can be utilized to process these frames for live video streaming using our Red5 WebRTC SDK.

Initial Approach for Background Replacement

Before we get into integrating with the Red5 WebRTC SDK to send video with background replacement, let’s first look at the solutions available using the OfflineCanvas which will have drawing routines applied to incoming VideoFrame objects to be transformed through a processor.

The first step is to include the MediaPipe Selfie Segmentation library in our web application:

<script src="https://cdn.jsdelivr.net/npm/@mediapipe/selfie_segmentation/selfie_segmentation.js" crossorigin="anonymous"></script>

With the library loaded, a SelfieSegmentation class is accessible from the window global which can then be assigned a result handler to process the resulting mask:

const onResults = (results) => {
	ctx.save()
	ctx.clearRect(0, 0, canvas.width, canvas.height)
	ctx.drawImage(results.segmentationMask, 0, 0, canvas.width, canvas.height)
	ctx.restore()
}

const selfieSegmentation = new SelfieSegmentation({
locateFile: (file) => `https://cdn.jsdelivr.net/npm/@mediapipe/selfie_segmentation/${file}`,
})
selfieSegmentation.setOptions({
modelSelection: 1,
})
selfieSegmentation.onResults(onResults)

With this simple handler, you can see the segmentation mask being provided (red) and all other pixels transparent (showing the default white of the background):

If we want to simply draw the pixels of the person segmented out and allow whatever color or image in the background of the page be shown, we could add the following to the canvas draw routine, after drawing the mask:

ctx.globalCompositeOperation = 'source-in'
ctx.drawImage(results.image, 0, 0, canvas.width, canvas.height)

The source-in composite operation will draw the pixels from the original image wherever they overlap the mask.

Background Image

We could be on our way with this solution, but what if we wanted to provide a background image for the live stream?

By changing around the composite operations applied we can define an image as a background pattern and overlay our segmentation:

ctx.globalCompositeOperation = 'source-out'
const pat = ctx.createPattern(backgroundImage, 'no-repeat')
ctx.fillStyle = pat
ctx.fillRect(0, 0, canvas.width, canvas.height)

ctx.globalCompositeOperation = 'destination-atop'
ctx.drawImage(results.image, 0, 0, canvas.width, canvas.height)

With the source-out operation, an image is drawn in the area the mask does not overlay and then using the destination-atop operation the pixels from the original image are then drawn where they overlap the mask:

Blurring the Background

In a similar approach to applying a custom background image, if all we wanted was to blur the original background we can change the order of composite operations on the original image and apply a blur filter:

ctx.globalCompositeOperation = 'source-in'
ctx.drawImage(results.image, 0, 0, canvas.width, canvas.height)

ctx.filter = 'blur(15px)'
ctx.globalCompositeOperation = 'destination-atop'
ctx.drawImage(results.image, 0, 0, canvas.width, canvas.height)

Insertable Streams

Up to this point in the article, we have been exploring virtual background techniques using drawing routines of the canvas and the segmentation results from MediaPipe Selfie Segmentation. But how are these results generated, and how is the end result fed to a live stream being delivered to the Red5 Cloud server?

The solution is to use Insertable Streams to process the video track and pipe the modified transformations to the outgoing stream.

We’ll first grab a MediaStream and setup our MediaStreamTrackProcessor and MediaStreamTrackGenerator:


const media = await navigator.mediaDevices.getUserMedia({
audio: true,
	video: {
		width: 640,
		height: 360,
	},
})
const audioTrack = media.getAudioTracks()[0]
const videoTrack = media.getVideoTracks()[0]

const trackProcessor = new MediaStreamTrackProcessor({ track: videoTrack })
const trackGenerator = new MediaStreamTrackGenerator({ kind: 'video' })

These will be used to pipe the video through a transformer function which utilizes the SelfieSegmentation and sets the modifications from the canvas as the next VideoFrame on the outgoing generated stream:

const transformer = new TransformStream({
	async transform(videoFrame, controller) {
		const { displayWidth, displayHeight, timestamp } = videoFrame
		videoFrame.width = displayWidth
		videoFrame.height = displayHeight
		await selfieSegmentation.send({ image: videoFrame })
	
const frame = new VideoFrame(canvas, { timestamp })
		videoFrame.close()

		controller.enqueue(frame)
	},
})
trackProcessor.readable.pipeThrough(transformer).pipeTo(trackGenerator.writable)

While the video track is running, each frame is sent to the SelfieSegmentation which results in the segmentation mask provided to the onResult handler where we apply our drawing routines for virtual background. While that happens we are setting the canvas element as the next VideoFrame image written on the generator.

The result of such can be assembled as a new MediaStream to define as the stream source for a Red5 Pro publisher (in this case, a WHIPClient):

const stream = new MediaStream([trackGenerator, audioTrack])

publisher = new WHIPClient()
publisher.on('*', onPublisherEvent)
await publisher.initWithStream(config, stream)
await publisher.publish()

That ensures that all our desired virtual background transformations are applied to the stream prior to being sent to the Red5 Cloud server.

Full Code

Below is the full code used in this example.

HTML

<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Insertable Streams</title>
    <!-- Polyfills for non-standard Insertable Streams APIs -->
    <script src="https://jan-ivar.github.io/polyfills/mediastreamtrackprocessor.js"></script>
    <script src="https://jan-ivar.github.io/polyfills/mediastreamtrackgenerator.js"></script>
    <!-- Red5 -->
    <script src="https://unpkg.com/red5pro-webrtc-sdk@latest/red5pro-sdk.min.js"></script>
    <!-- MediaPipe -->
    <script src="https://cdn.jsdelivr.net/npm/@mediapipe/selfie_segmentation/selfie_segmentation.js" crossorigin="anonymous"></script>
    <link rel="stylesheet" href="style/main.css" />
  </head>
  <body>
    <div id="app">
      <!-- User facing video -->
      <video id="red5pro-publisher" autoplay playsinline muted></video>
    </div>
    <script type="module" src="src/main.js"></script>
  </body>
</html>

JavaScript

/* global red5prosdk, SelfieSegmentation */

red5prosdk.setLogLevel('debug')

const { WHIPClient } = red5prosdk
const searchParams = new URLSearchParams(window.location.search)

let publisher
const publisherVideo = document.querySelector('#red5pro-publisher')

const canvas = new OffscreenCanvas(640, 360)
const canvasWidth = canvas.width
const canvasHeight = canvas.height
const ctx = canvas.getContext('2d')

const backgroundImage = new Image(640, 360)
backgroundImage.crossOrigin = 'anonymous'
backgroundImage.src = 'assets/images/bbb.png'

const config = {
	host: searchParams.get('host') || window.location.hostname,
	streamName: searchParams.get('stream_name') || 'stream1',
}

const onPublisherEvent = (event) => {
	console.log('Publisher Event:', event.type)
}

const onResults = (results) => {
	ctx.save()
	ctx.clearRect(0, 0, canvasWidth, canvas.height)
	ctx.drawImage(results.segmentationMask, 0, 0, canvasWidth, canvas.height)

	// // Option 1: Background Color
	// ctx.globalCompositeOperation = 'source-out'
	// ctx.fillStyle = 'black'
	// ctx.fillRect(0, 0, canvasWidth, canvasHeight)
	// // ~ background color

	// // Option 2: Background Image
	// ctx.globalCompositeOperation = 'source-out'
	// const pat = ctx.createPattern(backgroundImage, 'no-repeat')
	// ctx.fillStyle = pat
	// ctx.fillRect(0, 0, canvasWidth, canvas.height)

	// ctx.globalCompositeOperation = 'destination-atop'
	// ctx.drawImage(results.image, 0, 0, canvas.width, canvasHeight)
	// // ~ background image

	// Option 3: Blurred Background
	ctx.globalCompositeOperation = 'source-in'
	ctx.drawImage(results.image, 0, 0, canvasWidth, canvasHeight)

	ctx.filter = 'blur(15px)'
	ctx.globalCompositeOperation = 'destination-atop'
	ctx.drawImage(results.image, 0, 0, canvasWidth, canvasHeight)
	// ~ blurred background

	ctx.restore()
}

const start = async () => {
	try {
		const selfieSegmentation = new SelfieSegmentation({
			locateFile: (file) => `https://cdn.jsdelivr.net/npm/@mediapipe/selfie_segmentation/${file}`,
		})
		selfieSegmentation.setOptions({
			modelSelection: 1,
		})
		selfieSegmentation.onResults(onResults)

		const media = await navigator.mediaDevices.getUserMedia({
			audio: true,
			video: {
				width: 640,
				height: 360,
			},
		})
		const audioTrack = media.getAudioTracks()[0]
		const videoTrack = media.getVideoTracks()[0]

		const trackProcessor = new MediaStreamTrackProcessor({ track: videoTrack })
		const trackGenerator = new MediaStreamTrackGenerator({ kind: 'video' })
		const transformer = new TransformStream({
			async transform(videoFrame, controller) {
				const { displayWidth, displayHeight, timestamp } = videoFrame
				videoFrame.width = displayWidth
				videoFrame.height = displayHeight
				await selfieSegmentation.send({ image: videoFrame })

				const frame = new VideoFrame(canvas, { timestamp })

				videoFrame.close()
				controller.enqueue(frame)
			},
		})
		trackProcessor.readable.pipeThrough(transformer).pipeTo(trackGenerator.writable)

		// Assemble stream
		const stream = new MediaStream([trackGenerator, audioTrack])
		publisherVideo.srcObject = stream

		// Red5 broadcast
		publisher = new WHIPClient()
		publisher.on('*', onPublisherEvent)
		await publisher.initWithStream(config, stream)
		await publisher.publish()
	} catch (e) {
		console.error(e)
		alert('An error occurred. See the developer console for more information.')
	}
}

start()

It has also been added as a testbed example from our publicly available repo on Github at: 

https://github.com/red5pro/streaming-html5/tree/feature/virtual_background/src/page/test/publishVirtualBackground

Conclusion

Using the Red5 WebRTC SDK along with MediaPipe Selfie Segmentation allows you to easily provide virtual background support for your Users.

With this basic introduction on how to integrate Insertable Streams for outgoing streams to a Red5 Cloud server, you can start developing other transformations – through your own custom Worker – to be applied not only to video but also audio being delivered.

Support

Some items from Insertable Streams are considered experimental and as such may not be available in all browsers – I am particularly speaking about MediaStreamTrackProcessor and MediaStreamTrackGenerator. Thankfully, there are some polyfills generously provided by Jan-Ivar Bruaroey:

https://github.com/jan-ivar/polyfills

So, while such APIs are available in Chrome and Edge, you will need to include the polyfills for use in Firefox and Safari.