Nirbheek’s Rantings

Monday, February 26, 2018

Decoupling GStreamer Pipelines

This post is best read with some prior familiarity with GStreamer pipelines. If you want to learn more about that, a good place to start is the tutorial Jan presented at LCA 2018.

Elevator Pitch

GStreamer was designed with modularity, pluggability, and ease of use in mind, and the structure was somewhat inspired by UNIX pipes. With GStreamer, you start with an idea of what your dataflow will look like, and the pipeline will map that quite closely.

This is true whether you're working with a simple and static pipeline:

source ! transform ! sink

Or if you need complex and dynamic pipelines with varying rates of data flow:

The inherent pluggability of the system allows for quick prototyping and makes a lot of changes simpler than they would be in other systems.

At the same time, to achieve efficient multimedia processing, one must avoid onerous copying of data, excessive threading, or additional latency. Other features necessary are varying rates of playback, seeking, branching, mixing, non-linear data flow, timing, and much more, but let's keep it simple for now.

Modular Multimedia Processing

A naive way to implement this would be to have one thread (or process) for each node, and use shared memory or message-passing. This can achieve high throughput if you use the right APIs for zerocopy message-passing, but because of a lack of realtime guarantees on all consumer operating systems, the latency will be jittery and much harder to achieve.

So how does GStreamer solve these problems?

Let's take a look at a simple pipeline to try and understand. We generate a sine wave, encode it with Opus, mux it into an Ogg container, and write it to disk.



$ gst-launch-1.0 -e audiotestsrc ! opusenc ! oggmux ! filesink location=out.ogg

How does data make it from one end of this pipeline to the other in GStreamer? The answer lies in source pads, sink pads and the chain function.

In this pipeline, the audiotestsrc element has one source pad. opusenc and oggmux have one source pad and one sink pad each, and filesink only has a sink pad. Buffers always move from source pads to sink pads. All elements that receive buffers (with sink pads) must implement a chain function to handle each buffer.

Zooming in a bit more, to output buffers, an element will call gst_pad_push() on its source pad. This function will figure out what the corresponding sink pad is, and call the chain function of that element with a pointer to the buffer that was pushed earlier. This chain function can then apply a transformation to the buffer and push it (or a new buffer) onward with gst_pad_push() again.

The net effect of this is that all buffer handling from one end of this pipeline to the other happens in one series of chained function calls. This is a really important detail that allows GStreamer to be efficient by default.

Pipeline Multithreading

Of course, sometimes you want to decouple parts of the pipeline, and that brings us to the simplest mechanism for doing so: the queue element. The most basic use-case for this element is to ensure that the downstream of your pipeline runs in a new thread.

In some applications, you want even greater decoupling of parts of your pipeline. For instance, if you're reading data from the network, you don't want a network error to bring down our entire pipeline, or if you're working with a hotpluggable device, device removal should be recoverable without needing to restart the pipeline.

There are various mechanisms to achieve such decoupling: appsrc/appsink, fdsrc/fdsink, shmsrc/shmsink, ipcpipeline, etc. However, each of those have their own limitations and complexities. In particular, events, negotiation, and synchronization usually need to be handled or serialized manually at the boundary.

Seamless Pipeline Decoupling

We recently merged a new plugin that makes this job much simpler: gstproxy. Essentially, you insert a proxysink element when you want to send data outside your pipeline, and use a proxysrc element to push that data into a different pipeline in the same process.

The interesting thing about this plugin is that everything is proxied, not just buffers. Events, queries, and hence caps negotiation all happen seamlessly. This is particularly useful when you want to do dynamic reconfiguration of your pipeline, and want the decoupled parts to reconfigure automatically.

Say you have a pipeline like this:



pulsesrc ! opusenc ! oggmux ! souphttpclientsink

Where the souphttpclientsink element is doing a PUT to a remote HTTP server. If the server suddenly closes the connection, you want to be able to immediately reconnect to the same server or a different one without interrupting the recording. One way to do this, would be to use appsrc and appsink to split it into two pipelines:



pulsesrc ! opusenc ! oggmux ! appsink



appsrc ! souphttpclientsink

Now you need to write code to handle buffers that are received on the appsink and then manually push those into appsrc. With the proxy plugin, you split your pipeline like before:



pulsesrc ! opusenc ! oggmux ! proxysink



proxysrc ! souphttpclientsink

Next, we connect the proxysrc and proxysink elements, and gstreamer will automatically push buffers from the first pipeline to the second one.

g_object_set (psrc, "proxysink", psink, NULL);

proxysink also contains a queue, so the second pipeline will always run in a separate thread.

Another option is the inter plugin. If you use a pair of interaudiosink/interaudiosrc elements, buffers will be automatically moved between pipelines, but those only support raw audio or video, and drop events and queries at the boundary. The proxy elements push pointers to buffers without copying, and they do not care what the contents of the buffers are.

This example was a trivial one, but with more complex pipelines, you usually have bins that automatically reconfigure themselves according to the events and caps sent by upstream elements; f.ex decodebin and webrtcbin. This metadata about the buffers is lost when using appsrc/appsink, and similar elements, but is transparently proxied by the proxy elements.

The ipcpipeline elements also forward buffers, events, queries, etc (not zerocopy, but could be), but they are much more complicated since they were built for splitting pipelines across multiple processes, and are most often used in a security-sensitive context.

The proxy elements only work when all the split pipelines are within the same process, are much simpler and as a result, more efficient. They should be used when you want graceful recovery from element errors, and your elements are not a vector for security attacks.

For more details on how to use them, checkout the documentation and example! The online docs will be generated from that when we're closer to the release of GStreamer 1.14. There are a few caveats, but a number of projects are already using it with great success.

Saturday, February 3, 2018

GStreamer has grown a WebRTC implementation

In other news, GStreamer is now almost buzzword-compliant! The next blog post on our list: blockchains and smart contracts in GStreamer.

Late last year, we at Centricular announced a new implementation of WebRTC in GStreamer. Today we're happy to announce that after community review, that work has been merged into GStreamer itself! The plugin is called webrtcbin, and the library is, naturally, called gstwebrtc.

The implementation has all the basic features, is transparently compatible with other WebRTC stacks (particularly in browsers), and has been well-tested with both Firefox and Chrome.

Some of the more advanced features such as FEC are already a work in progress, and others will be too—if you want them to be! Hop onto IRC on #gstreamer @ Freenode.net or join the mailing list.

How do I use it?

Currently, the easiest way to use webrtcbin is to build GStreamer using either gst-uninstalled (Linux and macOS) or Cerbero (Windows, iOS, Android). If you're a patient person, you can follow @gstreamer and wait for GStreamer 1.14 to be released which will include Windows, macOS, iOS, and Android binaries.

The API currently lacks documentation, so the best way to learn it is to dive into the source-tree examples. Help on this will be most appreciated! To see how to use GStreamer to do WebRTC with a browser, checkout the bidirectional audio-video demos.

Show me the code! [skip]

Here's a quick highlight of the important bits that should get you started if you already know how GStreamer works. This example is in C, but GStreamer also has bindings for Rust, Python, Java, C#, Vala, and so on.

Let's say you want to capture video from V4L2, stream it to a webrtc peer, and receive video back from it. The first step is the streaming pipeline, which will look something like this:

v4l2src ! queue ! vp8enc ! rtpvp8pay !
    application/x-rtp,media=video,encoding-name=VP8,payload=96 ! 
    webrtcbin name=sendrecv

As a short-cut, let's parse the string description to create the pipeline.

GstElement *pipe;

pipe = gst_parse_launch ("v4l2src ! queue ! vp8enc ! rtpvp8pay ! "
    "application/x-rtp,media=video,encoding-name=VP8,payload=96 !"
    " webrtcbin name=sendrecv", NULL);

Next, we get a reference to the webrtcbin element and attach some callbacks to it.

GstElement *webrtc;

webrtc = gst_bin_get_by_name (GST_BIN (pipe), "sendrecv");
g_assert (webrtc != NULL);

/* This is the gstwebrtc entry point where we create the offer.
 * It will be called when the pipeline goes to PLAYING. */
g_signal_connect (webrtc, "on-negotiation-needed",
    G_CALLBACK (on_negotiation_needed), NULL);
/* We will transmit this ICE candidate to the remote using some
 * signalling. Incoming ICE candidates from the remote need to be
 * added by us too. */
g_signal_connect (webrtc, "on-ice-candidate",
    G_CALLBACK (send_ice_candidate_message), NULL);
/* Incoming streams will be exposed via this signal */
g_signal_connect (webrtc, "pad-added",
    G_CALLBACK (on_incoming_stream), pipe);
/* Lifetime is the same as the pipeline itself */
gst_object_unref (webrtc);

When the pipeline goes to PLAYING, the on_negotiation_needed() callback will be called, and we will ask webrtcbin to create an offer which will match the pipeline above.

static void
on_negotiation_needed (GstElement * webrtc, gpointer user_data)
{
  GstPromise *promise;

  promise = gst_promise_new_with_change_func (on_offer_created,
      user_data, NULL);
  g_signal_emit_by_name (webrtc, "create-offer", NULL,
      promise);
}

When webrtcbin has created the offer, it will call on_offer_created()

static void
on_offer_created (GstPromise * promise, GstElement * webrtc)
{
  GstWebRTCSessionDescription *offer = NULL;
  const GstStructure *reply;
  gchar *desc;

  reply = gst_promise_get_reply (promise);
  gst_structure_get (reply, "offer",
      GST_TYPE_WEBRTC_SESSION_DESCRIPTION, 
      &offer, NULL);
  gst_promise_unref (promise);

  /* We can edit this offer before setting and sending */
  g_signal_emit_by_name (webrtc,
      "set-local-description", offer, NULL);

  /* Implement this and send offer to peer using signalling */
  send_sdp_offer (offer);
  gst_webrtc_session_description_free (offer);
}

Similarly, when we have the SDP answer from the remote, we must call "set-remote-description" on webrtcbin.

answer = gst_webrtc_session_description_new (
    GST_WEBRTC_SDP_TYPE_ANSWER, sdp);
g_assert (answer);

/* Set remote description on our pipeline */
g_signal_emit_by_name (webrtc, "set-remote-description",
    answer, NULL);

ICE handling is very similar; when the "on-ice-candidate" signal is emitted, we get a local ICE candidate which we must send to the remote. When we have an ICE candidate from the remote, we must call "add-ice-candidate" on webrtcbin.

There's just one piece left now; handling incoming streams that are sent by the remote. For that, we have on_incoming_stream() attached to the "pad-added" signal on webrtcbin.

static void
on_incoming_stream (GstElement * webrtc, GstPad * pad,
    GstElement * pipe)
{
  GstElement *play;

  play = gst_parse_bin_from_description (
      "queue ! vp8dec ! videoconvert ! autovideosink",
      TRUE, NULL);
  gst_bin_add (GST_BIN (pipe), play);

  /* Start displaying video */
  gst_element_sync_state_with_parent (play);
  gst_element_link (webrtc, play);
}

That's it! This is what a basic webrtc workflow looks like. Those of you that have used the PeerConnection API before will be happy to see that this maps to that quite closely.

The aforementioned demos also include a Websocket signalling server and JS browser components, and I will be doing an in-depth application newbie developer's guide at a later time, so you can follow me @nirbheek to hear when it comes out!

Tell me more!

The code is already being used in production in a number of places, such as EasyMile's autonomous vehicles, and we're excited to see where else the community can take it.

If you're wondering why we decided a new implementation was needed, read on! For a more detailed discussion into that, you should watch Matthew Waters' talk from the GStreamer conference last year. It's a great companion for this article!

But before we can dig into details, we need to lay some foundations first.

What is GStreamer, and what is WebRTC? [skip]

GStreamer is a cross-platform open-source multimedia framework that is, in my opinion, the easiest and most flexible way to implement any application that needs to play, record, or transform media-like data across an extremely versatile scale of devices and products. Embedded (IoT, IVI, phones, TVs, …), desktop (video/music players, video recording, non-linear editing, videoconferencing and VoIP clients, browsers …), to servers (encode/transcode farms, video/voice conferencing servers, …) and more.

But what I like the most about GStreamer is the pipeline-based model which solves one of the hardest problems in API design: catering to applications of varying complexity; from the simplest one-liners and quick solutions to those that need several hundreds of thousands of lines of code to implement their full featureset.

If you want to learn more about GStreamer, Jan Schmidt's tutorial from Linux.conf.au is a good start.

WebRTC is a set of draft specifications that build upon existing RTP, RTCP, SDP, DTLS, ICE (and many other) real-time communication specifications and defines an API for making RTC accessible using browser JS APIs.

People have been doing real-time communication over IP for decades with the previously-listed protocols that WebRTC builds upon. The real innovation of WebRTC was creating a bridge between native applications and webapps by defining a standard, yet flexible, API that browsers can expose to untrusted JavaScript code.

These specifications are constantly being improved upon, which combined with the ubiquitous nature of browsers means WebRTC is fast becoming the standard choice for videoconferencing on all platforms and for most applications.

Everything is great, let's build amazing apps! [skip]

Not so fast, there's more to the story! For WebApps, the PeerConnection API is everywhere. There are some browser-specific quirks as usual, and the API itself keeps changing, but the WebRTC JS adapter handles most of that. Overall the WebApp experience is mostly 👍.

Sadly, for native code or applications that need more flexibility than a sandboxed JS app can achieve, there haven't been a lot of great options.

libwebrtc (Chrome's implementation), Janus, Kurento, and OpenWebRTC have traditionally been the main contenders, but after having worked with all of these, we found that each implementation has its own inflexibilities, shortcomings, and constraints.

libwebrtc is still the most mature implementation, but it is also the most difficult to work with. Since it's embedded inside Chrome, it's a moving target, the API can be hard to work with, and the project is quite difficult to build and integrate, all of which are obstacles in the way of native or server app developers trying to quickly prototype and try out things.

It was also not built for multimedia use-cases, so while the webrtc bits are great, the lower layers get in the way of non-browser use-cases and applications. It is quite painful to do anything other than the default "set raw media, transmit" and "receive from remote, get raw media". This means that if you want to use your own filters, or hardware-specific codecs or sinks/sources, you end up having to fork libwebrtc.

In contrast, as shown above, our implementation gives you full control over this as with any other GStreamer pipeline.

OpenWebRTC by Ericsson was the first attempt to rectify this situation, and it was built on top of GStreamer. The target audience was app developers, and it fit the bill quite well as a proof-of-concept—even though it used a custom API and some of the architectural decisions made it quite inflexible for most other use-cases.

However, after an initial flurry of activity around the project, momentum petered out, the project failed to gather a community around itself, and is now effectively dead.

Full disclosure: we worked with Ericsson to polish some of the rough edges around the project immediately prior to its public release.

WebRTC in GStreamer — webrtcbin and gstwebrtc

Remember how I said the WebRTC standards build upon existing standards and protocols? As it so happens, GStreamer has supported almost all of them for a while now because they were being used for real-time communication, live streaming, and in many other IP-based applications. Indeed, that's partly why Ericsson chose it as the base for OWRTC.

This combined with the SRTP and DTLS plugins that were written during OWRTC's development meant that our implementation is built upon a solid and well-tested base, and that implementing WebRTC features is not as difficult as one might presume. However, WebRTC is a large collection of standards, and reaching feature-parity with libwebrtc is an ongoing task.

Lucky for us, Matthew made some excellent decisions while architecting the internals of webrtcbin, and we follow the PeerConnection specification quite closely, so almost all the missing features involve writing code that would plug into clearly-defined sockets.

We believe what we've been building here is the most flexible, versatile, and easy to use WebRTC implementation out there, and it can only get better as time goes by. Bringing the power of pipeline-based multimedia manipulation to WebRTC opens new doors for interesting, unique, and highly efficient applications.

To demonstrate this, in the near future we will be publishing articles that dive into how to use the PeerConnection-inspired API exposed by webrtcbin to build various kinds of applications—starting with a CPU-efficient multi-party bidirectional conferencing solution with a mesh topology that can work with any webrtc stack.

Until next time!

Wednesday, August 17, 2016

The Meson build system at GUADEC 2016

For the third year in a row, Centricular was at GUADEC, and this year we sponsored the evening party on the final day at Hoepfner’s Burghof! Hopefully everyone enjoyed it as much as we hoped. :)

The focus for me this year was to try and tell people about the work we've been doing on porting GStreamer to Meson and to that end, I gave a talk on the second day about how to build your GNOME app ~2x faster than before.

The talk title itself was a bit of a lie, since most of the talk was about how Autotools is a mess and how Meson has excellent features (better syntax!) and in-built support for most of the GNOME infrastructure to make it easy for people to use it. But for some people the attraction is also that Meson provides better support on platforms such as Windows, and improves build times on all platforms massively; ranging from 2x on Linux to 10-15x on Windows.

Thanks to the excellent people at c3voc.de, the talks were all live-streamed, and you can see my talk at their relive website for GUADEC 2016.

It was heartening to see that over the past year people have warmed up to the idea of using Meson as a replacement for Autotools. Several people said kind and encouraging words to me and Jussi over the course of the conference (it helps that GNOME is filled with a friendly community!). We will continue to improve Meson and with luck we can get rid of Autotools over time.

The best approach, as always, is to start with the simple projects, get familiar with the syntax, and report any bugs you find! We look forward to your bugs and pull requests. ;)

Wednesday, July 27, 2016

Building and Developing GStreamer using Visual Studio

Two months ago, I talked about how we at Centricular have been working on a Meson port of GStreamer and its basic dependencies (glib, libffi, and orc) for various reasons — faster builds, better cross-platform support (particularly Windows), better toolchain support, ease of use, and for a better build system future in general.

Meson also has built-in support for things like gtk-doc, gobject-introspection, translations, etc. It can even generate Visual Studio project files at build time so projects don't have to expend resources maintaining those separately.

Today I'm here to share instructions on how to use Cerbero (our “aggregating” build system) to build all of GStreamer on Windows using MSVC 2015 (wherever possible). Note that this means you won't see any Meson invocations at all because Cerbero does all that work for you.

Note that this is still all unofficial and has not been proposed for inclusion upstream. We still have a few issues that need to be ironed out before we can do that¹.

Update: As of March 2019, all these instructions are obsolete since MSVC support has been merged into upstream Cerbero. I'm leaving this outdated text as-is for archival purposes.

First, you need to setup the environment on Windows by installing a bunch of external tools: Python 2, Python3, Git, etc. You can find the instructions for that here:

https://github.com/centricular/cerbero#windows

This is very similar to the old Cerbero instructions, but some new tools are needed. Once you've done everything there (Visual Studio especially takes a while to fetch and install itself), the next step is fetching Cerbero:

$ git clone https://github.com/centricular/cerbero.git

This will clone and checkout the meson-1.8 branch that will build GStreamer 1.8.x. Next, we bootstrap it:

https://github.com/centricular/cerbero#bootstrap

Now we're (finally) ready to build GStreamer. Just invoke the package command:

python2 cerbero-uninstalled -c config/win32-mixed-msvc.cbc package gstreamer-1.0

This will build all the `recipes` that constitute GStreamer, including the core libraries and all the plugins including their external dependencies. This comes to about 76 recipes. Out of all these recipes, only the following are ported to Meson and are built with MSVC:

bzip2.recipe
orc.recipe
libffi.recipe (only 32-bit)
glib.recipe
gstreamer-1.0.recipe
gst-plugins-base-1.0.recipe
gst-plugins-good-1.0.recipe
gst-plugins-bad-1.0.recipe
gst-plugins-ugly-1.0.recipe

The rest still mostly use Autotools, plain GNU make or cmake. Almost all of these are still built with MinGW. The only exception is libvpx, which uses its custom make-based build system but is built with MSVC.

Eventually we want to build everything including all external dependencies with MSVC by porting everything to Meson, but as you can imagine it's not an easy task. :-)

However, even with just these recipes, there is a large improvement in how quickly you can build all of GStreamer inside Cerbero on Windows. For instance, the time required for building gstreamer-1.0.recipe which builds gstreamer.git went from 10 minutes to 45 seconds. It is now easier to do GStreamer development on Windows since rebuilding doesn't take an inordinate amount of time!

As a further improvement for doing GStreamer development on Windows, for all these recipes (except libffi because of complicated reasons), you can also generate Visual Studio 2015 project files and use them from within Visual Studio for editing, building, and so on.

Go ahead, try it out and tell me if it works for you!

As an aside, I've also been working on some proper in-depth documentation of Cerbero that explains how the tool works, the recipe format, supported configurations, and so on. You can see the work-in-progress if you wish to.

1. Most importantly, the tests cannot be built yet because GStreamer bundles a very old version of libcheck. I'm currently working on fixing that.

Monday, May 23, 2016

GStreamer and Meson: A New Hope

Anyone who has written a non-trivial project using Autotools has realized that (and wondered why) it requires you to be aware of 5 different languages. Once you spend enough time with the innards of the system, you begin to realize that it is nothing short of an astonishing feat of engineering. Engineering that belongs in a museum. Not as part of critical infrastructure.

Autotools was created in the 1980s and caters to the needs of an entirely different world of software from what we have at present. Worse yet, it carries over accumulated cruft from the past 40 years — ostensibly for better “cross-platform support” but that “support” is mostly for extinct platforms that five people in the whole world remember.

We've learned how to make it work for most cases that concern FOSS developers on Linux, and it can be made to limp along on other platforms that the majority of people use, but it does not inspire confidence or really anything except frustration. People will not like your project or contribute to it if the build system takes 10x longer to compile on their platform of choice, does not integrate with the preferred IDE, and requires knowledge arcane enough to be indistinguishable from cargo-cult programming.

As a result there have been several (terrible) efforts at replacing it and each has been either incomplete, short-sighted, slow, or just plain ugly. During my time as a Gentoo developer in another life, I came in close contact with and developed a keen hatred for each of these alternative build systems. And so I mutely went back to Autotools and learned that I hated it the least of them all.

Sometime last year, Tim heard about this new build system called ‘Meson’ whose author had created an experimental port of GStreamer that built it in record time.

Intrigued, he tried it out and found that it finished suspiciously quickly. His first instinct was that it was broken and hadn’t actually built everything! Turns out this build system written in Python 3 with Ninja as the backend actually was that fast. About 2.5x faster on Linux and 10x faster on Windows for building the core GStreamer repository.

Upon further investigation, Tim and I found that Meson also has really clean generic cross-compilation support (including iOS and Android), runs natively (and just as quickly) on OS X and Windows, supports GNU, Clang, and MSVC toolchains, and can even (configure and) generate XCode and Visual Studio project files!

But the critical thing that convinced me was that the creator Jussi Pakkanen was genuinely interested in the use-cases of widely-used software such as Qt, GNOME, and GStreamer and had already added support for several tools and idioms that we use — pkg-config, gtk-doc, gobject-introspection, gdbus-codegen, and so on. The project places strong emphasis on both speed and ease of use and is quite friendly to contributions.

Over the past few months, Tim and I at Centricular have been working on creating Meson ports for most of the GStreamer repositories and the fundamental dependencies (libffi, glib, orc) and improving the MSVC toolchain support in Meson.

We are proud to report that you can now build GStreamer on Linux using the GNU toolchain and on Windows with either MinGW or MSVC 2015 using Meson build files that ship with the source (building upon Jussi's initial ports).

Other toolchain/platform combinations haven't been tested yet, but they should work in theory (minus bugs!), and we intend to test and bugfix all the configurations supported by GStreamer (Linux, OS X, Windows, iOS, Android) before proposing it for inclusion as an alternative build system for the GStreamer project.

You can either grab the source yourself and build everything, or use our (with luck, temporary) fork of GStreamer's cross-platform build aggregator Cerbero.

Update: I wrote a new post with detailed steps on how to build using Cerbero and generate Visual Studio project files.

Second update: All this is now upstream, see the upstream Cerbero repository's README

Personally, I really hope that Meson gains widespread adoption. Calling Autotools the Xorg of build systems is flattery. It really is just a terrible system. We really need to invest in something that works for us rather than against us.

PS: If you just want a quick look at what the build system syntax looks like, take a look at this or the basic tutorial.