Tuesday, September 29, 2020

Building GStreamer on Windows the Correct Way

For the past 4 years, Tim and I have spent thousands of hours on better Windows support for GStreamer. Starting in May 2016 when I first wrote about this and then with the first draft of the work before it was revised, updated, and upstreamed.

Since then, we've worked tirelessly to improve Windows support in GStreamer  with patches to many projects such as the Meson build system, GStreamer's Cerbero meta-build system, and writing build files for several non-GStreamer projects such as x264, openh264, ffmpeg, zlib, bzip2, libffi, glib, fontconfig, freetype, fribidi, harfbuzz, cairo, pango, gtk, libsrtp, opus, and many more that I've forgotten. 

More recently, Seungha has also been working on new GStreamer elements for Windows such as d3d11, mediafoundation, wasapi2, etc. Sometimes we're able to find someone to sponsor all this work, but most of the time it's on our own dime.

Most of this has been happening in the background; noticed only by people who follow GStreamer development. I think more people should know about the work that's been happening upstream, and the official and supported ways to build GStreamer on Windows. Searching for this on Google can be a very confusing experience with the top results being outdated links or just plain clickbait.

So here's an overview of your options when you want to use GStreamer on Windows:

Installing GStreamer on Windows

GStreamer has released MinGW binary installers for Windows since the early 1.0 days using the Cerbero meta-build system which was created by Andoni for the non-upstream "GStreamer SDK" project, which was based on GStreamer 0.10.
Today it supports building GStreamer with both MinGW and Visual Studio, and even supports outputting UWP packages. So you can actually go and download all of those from the download page:

This is the easiest way to get started with GStreamer on Windows.

Building GStreamer yourself for Deployment

If you need to build GStreamer with a custom configuration for deployment, the easiest option is to use Cerbero, which is a meta-build system. It will download all the dependencies for you (including most of the build-tools), build them with Autotools, CMake, or Meson (as appropriate), and output a neat little MSI installer.
The README contains all the information you need, including screenshots for how to set things up:

As of a few days ago, after months of work the native Cerbero Windows builds have also been integrated into our Continuous Integration pipeline that runs on every merge request, which further improves the quality of our Windows support. We already had native Windows CI using gst-build, but this increases our coverage.

Contributing to GStreamer on Windows

If you want to contribute to GStreamer from Windows, the best option is to use gst-build (created by Thibault), which is basically a meson 'wrapper' project that has all the gstreamer repositories aggregated as subprojects. Once again, the README file is pretty easy to follow and has screenshots for how to set things up:

This is also the method used by all GStreamer developers to hack on gstreamer on all platforms, so it should work pretty well out of the box, and it's tested on the CI. If it doesn't work, come poke us on #gstreamer on FreeNode IRC or on the gstreamer mailing list.

It's All Upstream.

You don't need any special steps, and you don't need to read complicated blog posts to build GStreamer on Windows. Everything is upstream.

This post previously contained examples of such articles and posts that are spreading misinformation, but I have removed those paragraphs after discussion with the people who were responsible for them, and to keep this post simple. All I can hope is that it doesn't happen again.

Monday, August 31, 2020

GStreamer 1.18 supports the Universal Windows Platform

tl;dr: The GStreamer 1.18 release ships with UWP support out of the box, with official GStreamer binary releases for it. Try out the 1.17.90 pre-release 1.18.0 release and let us know how it goes! There's also an example gstreamer app for UWP that showcases OpenGL support (via ANGLE), audio/video capture, hardware codecs, and WebRTC.

Short History Lesson

Last year at the GStreamer Conference in Lyon, I gave a talk (slides) about how “Firefox Reality” for the Microsoft HoloLens 2 mixed-reality headset is actually Servo, and it uses GStreamer for all media handling: WebAudio, HTML5 Video, and WebRTC.

I also spoke about the work we at Centricular did to port GStreamer to the HoloLens 2. The HoloLens 2 uses the new development target for Windows Store apps: the Universal Windows Platform. The majority of win32 APIs have been deprecated, and apps have to use the new Windows Runtime, which is a language-agnostic API written from the ground up.

So the majority of work went into making sure that Win32 code didn't use deprecated APIs (we used a bunch of them!), and making sure that we could build using the UWP toolchain. Most of that involved two components:
  • GLib, a cross-platform low-level library / abstraction layer used by GNOME (almost all our win32 code is in here)
  • Cerbero, the build aggregator used by GStreamer to build binaries for all platforms supported: Android, iOS, Linux, macOS, Windows (MSVC, MinGW, UWP)
The target was to port the core of GStreamer, and those plugins with external dependencies that were needed to do playback in <audio> and <video> tags. This meant that the only external plugin dependency we needed was FFmpeg, for the gst-libav plugin. All this went well, and Firefox Reality successfully shipped with that work.

Upstreaming and WebRTC

Building upon that work, for the past few months we've been working on adding support for the WebRTC plugin, and also upstreaming as much of the work as possible. This involved a bunch of pieces:
  1. Use only OpenSSL and not GnuTLS in Cerbero because OpenSSL supports targeting UWP. This also had the advantage of moving us from two SSL stacks to one.
  2. Port a bunch of external optional dependencies to Meson so that they could be built with Meson, which is the easiest way for a cross-platform project to support UWP. If your Meson project builds on Windows, it will build on UWP with minimal or no build changes.
  3. Rebase the GLib patches that I didn't find the time to upstream last year on top of 2.62, split into smaller pieces that will be easier to upstream, update for new Windows SDK changes, remove some of the hacks, and so on.
  4. Rework and rewrite the Cerbero patches I wrote last year that were in no shape to be upstreamed.
  5. Ensure that our OpenGL support continues to work using Servo's ANGLE UWP port
  6. Write a new plugin for audio capture called wasapi2, great work by Seungha Yang.
  7. Write a new plugin for video capture called mfvideosrc as part of the media foundation plugin which is new in GStreamer 1.18, also by Seungha.
  8. Write a new example UWP app to test all this work, also done by Seungha! 😄
  9. Run the app through the Windows App Certification Kit
And several miscellaneous tasks and bugfixes that we've lost count of.

Our highest priority this time around was making sure that everything can be upstreamed to GStreamer, and it was quite a success! Everything needed for WebRTC support on UWP has been merged, and you can use GStreamer in your UWP app by downloading the official GStreamer binaries starting with the 1.18 release.

On top of everything in the above list, thanks to Seungha, GStreamer on UWP now also supports:

Try it out!

The example gstreamer app I mentioned above showcases all this. Go check it out, and don't forget to read the README file!

Next Steps

The most important next step is to upstream as many of the GLib patches we worked on as possible, and then spend time porting a bunch of GLib APIs that we currently stub out when building for UWP.

Other than that, enabling gst-libav is also an interesting task since it will allow apps to use FFmpeg software codecs in their gstreamer UWP app. People should use the hardware accelerated d3d11 decoders and mediafoundation encoders for optimal power consumption and performance, but sometimes it's not possible because codec support is very device-dependent. 

Parting Thoughts

I'd like to thank Mozilla for sponsoring the bulk of this work. We at Centricular greatly value partners that understand the importance of working with upstream projects, and it has been excellent working with the Servo team members, particularly Josh Matthews, Alan Jeffrey, and Manish Goregaokar.

In the second week of August, Mozilla restructured and the Servo team was one of the teams that was dissolved. I wish them all the best in their future endeavors, and I can't wait to see what they work on next. They're all brilliant people.

Thanks to the forward-looking and community-focused approach of the Servo team, I am confident that the project will figure things out to forge its own way forward, and for the same reason, I expect that GStreamer's UWP support will continue to grow.

Sunday, April 21, 2019

GStreamer's Meson and Visual Studio Journey

Almost 3 years ago, I wrote about how we at Centricular had been working on an experimental port of GStreamer from Autotools to the Meson build system for faster builds on all platforms, and to allow building with Visual Studio on Windows.

At the time, the response was mixed, and for good reason—Meson was a very new build system, and it needed to work well on all the targets that GStreamer supports, which was all major operating systems. Meson did aim to support all of those, but a lot of work was required to bring platform support up to speed with the requirements of a non-trivial project like GStreamer.

The Status: Today!

After years of work across several components (Meson, Ninja, Cerbero, etc), GStreamer is being built with Meson on all platforms! Autotools is scheduled to be removed in the next release cycle (1.18). Edit: as of October 2019, Autotools has been removed.

The first stable release with this work was 1.16, which was released yesterday. It has already led to a number of new capabilities:
  • GStreamer can be built with Visual Studio on Windows inside Cerbero, which means we now ship official binaries for GStreamer built with the  MSVC toolchain.
  • From-scratch Cerbero builds are much faster on all platforms, which has aided the implementation of CI-gated merge requests on GitLab.
  • The developer workflow has been streamlined and is the same on all platforms (Linux, Windows, macOS) using the gst-build meta-project. The meta-project can also be used for cross-compilation (Android, iOS, Windows, Linux).
  • The Windows developer workflow no longer requires installing several packages by hand or setting up an MSYS environment. All you need is Git, Python 3, Visual Studio, and 15 min for the initial build.
  • Profiling on Windows is now possible, and I've personally used it to profile and fix numerous Windows-specific performance issues.
  • Visual Studio projects that use GStreamer now have debug symbols since we're no longer mixing MinGW and MSVC binaries. This also enables usable crash reports and symbol servers.
  • We can ship plugins that can only be built with MSVC on Windows, such as the Intel MSDK hardware codec plugin, Directshow plugins, and also easily enable new Windows 10 features in existing plugins such as WASAPI.
  • iOS bitcode builds are more correct, since Meson is smart enough to know how to disable incompatible compiler options on specific build targets.
  • The iOS framework now also ships shared libraries in addition to the static libraries.
Overall, it's been a huge success and we're really happy with how things have turned out!

You can download the prebuilt MSVC binaries, reproduce them yourself, or quickly bootstrap a GStreamer development environment. The choice is yours!

Further Musings

While working on this over the years, what's really stood out to me was how this sort of gargantuan task was made possible through the power of community-driven FOSS and community-focused consultancy.

Our build system migration quest has been long with valleys full of yaks with thick coats of fur, and it would have been prohibitively expensive for a single entity to sponsor it all. Thanks to the inherently collaborative nature of community FOSS projects, people from various backgrounds and across companies could come together and make this possible.

There are many other examples of this, but seeing the improbable happen from the inside is something special.

Special shout-outs to ZEISS, Barco, Pexip, and Cablecast.tv for sponsoring various parts of this work!

Their contributions also made it easier for us to spend thousands more hours of non-sponsored time to fill in the gaps so that all the sponsored work done could be upstreamed in a form that's useful for everyone who uses GStreamer. This sort of thing is, in my opinion, an essential characteristic of being a community-focused consultancy, and we make sure that it always has high priority.

Tuesday, April 10, 2018

A simple method of measuring audio latency

In my previous blog post, I talked about how I improved the latency of GStreamer's default audio capture and render elements on Windows.

An important part of any such work is a way to accurately measure the latencies in your audio path.

Ideally, one would use a mechanism that can track your buffers and give you a detailed breakdown of how much latency each component of your system adds. For instance, with an audio pipeline like this:

audio-capture → filter1 → filter2 → filter3 → audio-output

If you use GStreamer, you can use the latency tracer to measure how much latency filter1 adds, filter2 adds, and so on.

However, sometimes you need to measure latencies added by components outside of your control, for instance the audio APIs provided by the operating system, the audio drivers, or even the hardware itself. In that case it's really difficult, bordering on impossible, to do an automated breakdown.

But we do need some way of measuring those latencies, and I needed that for the aforementioned work. Maybe we can get an aggregated (total) number?

There's a simple way to do that if we can create a loopback connection in the audio setup. What's a loopback you ask?

Ouroboros snake biting its tail

Essentially, if we can redirect the audio output back to the audio input, that's called a loopback. The simplest way to do this is to connect the speaker-out/line-out to the microphone-in/line-in with a two-sided 3.5mm jack.

photo of male-to-male 3.5mm jack connecting speaker-out to mic-in

Now, when we send an audio wave down to the audio output, it'll show up on the audio input.

Hmm, what if we store the current time when we send the wave out, and compare it with the current time when we get it back? Well, that's the total end-to-end latency!

If we send out a wave periodically, we can measure the latency continuously, even as things are switched around or the pipeline is dynamically reconfigured.

Some of you may notice that this is somewhat similar to how the `ping` command measures latencies across the Internet.

screenshot of ping to

Just like a network connection, the loopback connection can be lossy or noisy, f.ex. if you use loudspeakers and a microphone instead of a wire, or if you have (ugh) noise in your line. But unlike network packets, we lose all context once the waves leave our pipeline and we have no way of uniquely identifying each wave.

So the simplest reliable implementation is to have only one wave traveling down the pipeline at a time. If we send a wave out, say, once a second, we can wait about one second for it to show up, and otherwise presume that it was lost.

That's exactly how the audiolatency GStreamer plugin that I wrote works! Here you can see its output while measuring the combined latency of the WASAPI source and sink elements:

The first measurement will always be wrong because of various implementation details in the audio stack, but the next measurements should all be correct.

This mechanism does place an upper bound on the latency that we can measure, and on how often we can measure it, but it should be possible to take more frequent measurements by sending a new wave as soon as the previous one was received (with a 1 second timeout). So this is an enhancement that can be done if people need this feature.

Hope you find the element useful; go forth and measure!

Thursday, March 22, 2018

Low-latency audio on Windows with GStreamer

Digital audio is so ubiquitous that we rarely stop to think or wonder how the gears turn underneath our all-pervasive apps for entertainment. Today we'll look at one specific piece of the machinery: latency.

Let's say you're making a video of someone's birthday party with an app on your phone. Once the recording starts, you don't care when the app starts writing it to diskas long as everything is there in the end.

However, if you're having a Skype call with your friend, it matters a whole lot how long it takes for the video to reach the other end and vice versa. It's impossible to have a conversation if the lag (latency) is too high.

The difference is, do you need real-time feedback or not?

Other examples, in order of increasingly stricter latency requirements are: live video streaming, security cameras, augmented reality games such as Pokémon Go, multiplayer video games in general, audio effects apps for live music recording, and many many more.

“But Nirbheek”, you might ask, “why doesn't everyone always ‘immediately’ send/store/show whatever is recorded? Why do people have to worry about latency?” and that's a great question!

To understand that, checkout my previous blog post, Latency in Digital Audio. It's also a good primer on analog vs digital audio!

Low latency on consumer operating systems

Each operating system has its own set of application APIs for audio, and each has a lower bind on the achievable latency:

GStreamer already has plugins for almost all of these¹ (plus others that aren't listed here), and on Windows, GStreamer has been using the DirectSound API by default for audio capture and output since the very beginning.

However, the DirectSound API was deprecated in Windows XP, and with Vista, it was removed and replaced with an emulation layer on top of the newly-released WASAPI. As a result, the plugin can't be configured to have less than 200ms of latency, which makes it unsuitable for all the low-latency use-cases mentioned above. The DirectSound API is quite crufty and unnecessarily complex anyway.

GStreamer is rarely used in video games, but it is widely used for live streaming, audio/video calls, and other real-time applications. Worse, the WASAPI GStreamer plugins were effectively untouched and unused since the initial implementation in 2008 and were completely broken².

This left no way to achieve low-latency audio capture or playback on Windows using GStreamer.

The situation became particularly dire when GStreamer added a new implementation of the WebRTC spec in this release cycle. People that try it out on Windows were going to see much higher latencies than they should.

Luckily, I rewrote most of the WASAPI plugin code in January and February, and it should now work well on all versions of Windows from Vista to 10! You can get binary installers for GStreamer or build it from source.

Shared and Exclusive WASAPI

WASAPI allows applications to open sound devices in two modes: shared and exclusive. As the name suggests, shared mode allows multiple applications to output to (or capture from) an audio device at the same time, whereas exclusive mode does not.

Almost all applications should open audio devices in shared mode. It would be quite disastrous if your YouTube videos played without sound because Spotify decided to open your speakers in exclusive mode.

In shared mode, the audio engine has to resample and mix audio streams from all the applications that want to output to that device. This increases latency because it must maintain its own audio ringbuffer for doing all this, from which audio buffers will be periodically written out to the audio device.

In theory, hardware mixing could be used if the sound card supports it, but very few sound cards implement that now since it's so cheap to do in software. On Windows, only high-end audio interfaces used for professional audio implement this.

Another option is to allocate your audio engine buffers directly in the sound card's memory with DMA, but that complicates the implementation and relies on good drivers from hardware manufacturers. Microsoft has tried similar approaches in the past with DirectSound and been burned by it, so it's not a route they took with WASAPI³.

On the other hand, some applications know they will be the only ones using a device, and for them all this machinery is a hindrance. This is why exclusive mode exists. In this mode, if the audio driver is implemented correctly, the application's buffers will be directly written out to the sound card, which will yield the lowest possible latency.

Audio latency with WASAPI

So what kind of latencies can we get with WASAPI?

That depends on the device period that is being used. The term device period is a fancy way of saying buffer size; specifically the buffer size that is used in each call to your application that fetches audio data.

This is the same period with which audio data will be written out to the actual device, so it is the major contributor of latency in the entire machinery.

If you're using the AudioClient interface in WASAPI to initialize your streams, the default period is 10ms. This means the theoretical minimum latency you can get in shared mode would be 10ms (audio engine) + 10ms (driver) = 20ms. In practice, it'll be somewhat higher due to various inefficiencies in the subsystem.

When using exclusive mode, there's no engine latency, so the same number goes down to ~10ms.

These numbers are decent for most use-cases, but like I explained in my previous blog post, this is totally insufficient for pro-audio use-cases such as applying live effects to music recordings. You really need latencies that are lower than 10ms there.

Ultra-low latency with WASAPI

Starting with Windows 10, WASAPI removed most of its aforementioned inefficiencies, and introduced a new interface: AudioClient3. If you initialize your streams with this interface, and if your audio driver is implemented correctly, you can configure a device period of just 2.67ms at 48KHz.

The best part is that this is the period not just in exclusive mode but also in shared mode, which brings WASAPI almost at-par with JACK and CoreAudio

So that was the good news. Did I mention there's bad news too? Well, now you know.

The first bit is that these numbers are only achievable if you use Microsoft's implementation of the Intel HD Audio standard for consumer drivers. This is fine; you follow some badly-documented steps and it turns out fine.

Then you realize that if you want to use something more high-end than an Intel HD Audio sound card, unless you use one of the rare pro-audio interfaces that have drivers that use the new WaveRT driver model instead of the old WaveCyclic model, you still see 10ms device periods.

It seems the pro-audio industry made the decision to stick with ASIO since it already provides <5ms latency. They don't care that the API is proprietary, and that most applications can't actually use it because of that. All the apps that are used in the pro-audio world already work with it.

The strange part is that all this information is nowhere on the Internet and seems to lie solely in the minds of the Windows audio driver cabals across the US and Europe. It's surprising and frustrating for someone used to working in the open to see such counterproductive information asymmetry, and I'm not the only one.

This is where I plug open-source and talk about how Linux has had ultra-low latencies for years since all the audio drivers are open-source, follow the same ALSA driver model, and are constantly improved. JACK is probably the most well-known low-latency audio engine in existence, and was born on Linux. People are even using Pulseaudio these days to work with <5ms latencies.

But this blog post is about Windows and WASAPI, so let's get back on track.

To be fair, Microsoft is not to blame here. Decades ago they made the decision of not working more closely with the companies that write drivers for their standard hardware components, and they're still paying the price for it. Blue screens of death were the most user-visible consequences, but the current audio situation is an indication that losing control of your platform has more dire consequences.

There is one more bit of bad news. In my testing, I wasn't able to get glitch-free capture of audio in the source element using the AudioClient3 interface at the minimum configurable latency in shared mode, even with critical thread priorities unless there was nothing else running on the machine.

As a result, this feature is disabled by default on the source element. This is unfortunate, but not a great loss since the same device period is achievable in exclusive mode without glitches.

Measuring WASAPI latencies

Now that we're back from our detour, the executive summary is that the GStreamer WASAPI source and sink elements now use the latest recommended WASAPI interfaces. You should test them out and see how well they work for you!

By default, a device is opened in shared mode with a conservative latency setting. To force the stream into the lowest latency possible, set low-latency=true. If you're on Windows 10 and want to force-enable/disable the use of the AudioClient3 interface, toggle the use-audioclient3 property.

To open a device in exclusive mode, set exclusive=true. This will ignore the low-latency and use-audioclient3 properties since they only apply to shared mode streams. When a device is opened in exclusive mode, the stream will always be configured for the lowest possible latency by WASAPI.

To measure the actual latency in each configuration, you can use the new audiolatency plugin that I wrote to get hard numbers for the total end-to-end latency including the latency added by the GStreamer audio ringbuffers in the source and sink elements, the WASAPI audio engine (capture and render), the audio driver, and so on.

I look forward to hearing what your numbers are on Windows 7, 8.1, and 10 in all these configurations! ;)

1. The only ones missing are AAudio because it's very new and ASIO which is a proprietary API with licensing requirements.

2. It's no secret that although lots of people use GStreamer on Windows, the majority of GStreamer developers work on Linux and macOS. As a result the Windows plugins haven't always gotten a lot of love. It doesn't help that building GStreamer on Windows can be a daunting task . This is actually one of the major reasons why we're moving to Meson, but I've already written about that elsewhere!

3. My knowledge about the history of the decisions behind the Windows Audio API is spotty, so corrections and expansions on this are most welcome!

4. The ALSA drivers in the Linux kernel should not be confused with the ALSA userspace library.