Saturday, February 3, 2018

GStreamer has grown a WebRTC implementation

In other news, GStreamer is now almost buzzword-compliant! The next blog post on our list: blockchains and smart contracts in GStreamer.

Late last year, we at Centricular announced a new implementation of WebRTC in GStreamer.  Today we're happy to announce that after community review, that work has been merged into GStreamer itself! The plugin is called webrtcbin, and the library is, naturally, called gstwebrtc.

The implementation has all the basic features, is transparently compatible with other WebRTC stacks (particularly in browsers), and has been well-tested with both Firefox and Chrome.

Some of the more advanced features such as FEC are already a work in progress, and others will be too—if you want them to be! Hop onto IRC on #gstreamer @ Freenode.net or join the mailing list.

How do I use it?


Currently, the easiest way to use webrtcbin is to build GStreamer using either gst-uninstalled (Linux and macOS) or Cerbero (Windows, iOS, Android). If you're a patient person, you can follow @gstreamer and wait for GStreamer 1.14 to be released which will include Windows, macOS, iOS, and Android binaries.

The API currently lacks documentation, so the best way to learn it is to dive into the source-tree examples. Help on this will be most appreciated! To see how to use GStreamer to do WebRTC with a browser, checkout the bidirectional audio-video demos that I wrote.

Show me the code! [skip]


Here's a quick highlight of the important bits that should get you started if you already know how GStreamer works. This example is in C, but GStreamer also has bindings for Rust, Python, Java, C#, Vala, and so on.

Let's say you want to capture video from V4L2, stream it to a webrtc peer, and receive video back from it. The first step is the streaming pipeline, which will look something like this:

v4l2src ! queue ! vp8enc ! rtpvp8pay !
    application/x-rtp,media=video,encoding-name=VP8,payload=96 ! 
    webrtcbin name=sendrecv

As a short-cut, let's parse the string description to create the pipeline.

1
2
3
4
5
GstElement *pipe;

pipe = gst_parse_launch ("v4l2src ! queue ! vp8enc ! rtpvp8pay ! "
    "application/x-rtp,media=video,encoding-name=VP8,payload=96 !"
    " webrtcbin name=sendrecv", NULL);

Next, we get a reference to the webrtcbin element and attach some callbacks to it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
GstElement *webrtc;

webrtc = gst_bin_get_by_name (GST_BIN (pipe), "sendrecv");
g_assert (webrtc != NULL);

/* This is the gstwebrtc entry point where we create the offer.
 * It will be called when the pipeline goes to PLAYING. */
g_signal_connect (webrtc, "on-negotiation-needed",
    G_CALLBACK (on_negotiation_needed), NULL);
/* We will transmit this ICE candidate to the remote using some
 * signalling. Incoming ICE candidates from the remote need to be
 * added by us too. */
g_signal_connect (webrtc, "on-ice-candidate",
    G_CALLBACK (send_ice_candidate_message), NULL);
/* Incoming streams will be exposed via this signal */
g_signal_connect (webrtc, "pad-added",
    G_CALLBACK (on_incoming_stream), pipe);
/* Lifetime is the same as the pipeline itself */
gst_object_unref (webrtc);

When the pipeline goes to PLAYING, the on_negotiation_needed() callback will be called, and we will ask webrtcbin to create an offer which will match the pipeline above.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
static void
on_negotiation_needed (GstElement * webrtc, gpointer user_data)
{
  GstPromise *promise;

  promise = gst_promise_new_with_change_func (on_offer_created,
      user_data, NULL);
  g_signal_emit_by_name (webrtc, "create-offer", NULL,
      promise);
}

When webrtcbin has created the offer, it will call on_offer_created()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
static void
on_offer_created (GstPromise * promise, GstElement * webrtc)
{
  GstWebRTCSessionDescription *offer = NULL;
  const GstStructure *reply;
  gchar *desc;

  reply = gst_promise_get_reply (promise);
  gst_structure_get (reply, "offer",
      GST_TYPE_WEBRTC_SESSION_DESCRIPTION, 
      &offer, NULL);
  gst_promise_unref (promise);

  /* We can edit this offer before setting and sending */
  g_signal_emit_by_name (webrtc,
      "set-local-description", offer, NULL);

  /* Implement this and send offer to peer using signalling */
  send_sdp_offer (offer);
  gst_webrtc_session_description_free (offer);
}

Similarly, when we have the SDP answer from the remote, we must call "set-remote-description" on webrtcbin.

1
2
3
4
5
6
7
answer = gst_webrtc_session_description_new (
    GST_WEBRTC_SDP_TYPE_ANSWER, sdp);
g_assert (answer);

/* Set remote description on our pipeline */
g_signal_emit_by_name (webrtc, "set-remote-description",
    answer, NULL);

ICE handling is very similar; when the "on-ice-candidate" signal is emitted, we get a local ICE candidate which we must send to the remote. When we have an ICE candidate from the remote, we must call "add-ice-candidate" on webrtcbin.

There's just one piece left now; handling incoming streams that are sent by the remote. For that, we have on_incoming_stream() attached to the "pad-added" signal on webrtcbin.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
static void
on_incoming_stream (GstElement * webrtc, GstPad * pad,
    GstElement * pipe)
{
  GstElement *play;

  play = gst_parse_bin_from_description (
      "queue ! vp8dec ! videoconvert ! autovideosink",
      TRUE, NULL);
  gst_bin_add (GST_BIN (pipe), play);

  /* Start displaying video */
  gst_element_sync_state_with_parent (play);
  gst_element_link (webrtc, play);
}

That's it! This is what a basic webrtc workflow looks like. Those of you that have used the PeerConnection API before will be happy to see that this maps to that quite closely.

The aforementioned demos also include a Websocket signalling server and JS browser components, and I will be doing an in-depth application newbie developer's guide at a later time, so you can follow me @nirbheek to hear when it comes out!

Tell me more!


The code is already being used in production in a number of places, such as EasyMile's autonomous vehicles, and we're excited to see where else the community can take it.

If you're wondering why we decided a new implementation was needed, read on! For a more detailed discussion into that, you should watch Matthew Waters' talk from the GStreamer conference last year. It's a great companion for this article!

But before we can dig into details, we need to lay some foundations first.

What is GStreamer, and what is WebRTC? [skip]


GStreamer is a cross-platform open-source multimedia framework that is, in my opinion, the easiest and most flexible way to implement any application that needs to play, record, or transform media-like data across an extremely versatile scale of devices and products. Embedded (IoT, IVI, phones, TVs, …), desktop (video/music players, video recording, non-linear editing, videoconferencing and VoIP clients, browsers …), to servers (encode/transcode farms, video/voice conferencing servers, …) and more.

But what I like the most about GStreamer is the pipeline-based model which solves one of the hardest problems in API design: catering to applications of varying complexity; from the simplest one-liners and quick solutions to those that need several hundreds of thousands of lines of code to implement their full featureset. 

If you want to learn more about GStreamer, Jan Schmidt's tutorial from Linux.conf.au is a good start.

WebRTC is a set of draft specifications that build upon existing RTP, RTCP, SDP, DTLS, ICE (and many other) real-time communication specifications and defines an API for making RTC accessible using browser JS APIs.

People have been doing real-time communication over IP for decades with the previously-listed protocols that WebRTC builds upon. The real innovation of WebRTC was creating a bridge between native applications and webapps by defining a standard, yet flexible, API that browsers can expose to untrusted JavaScript code.

These specifications are constantly being improved upon, which combined with the ubiquitous nature of browsers means WebRTC is fast becoming the standard choice for videoconferencing on all platforms and for most applications.

Everything is great, let's build amazing apps! [skip]


Not so fast, there's more to the story! For WebApps, the PeerConnection API is everywhere. There are some browser-specific quirks as usual, and the API itself keeps changing, but the WebRTC JS adapter handles most of that. Overall the WebApp experience is mostly 👍.

Sadly, for native code or applications that need more flexibility than a sandboxed JS app can achieve, there haven't been a lot of great options.

libwebrtc (Chrome's implementation), Janus, Kurento, and OpenWebRTC have traditionally been the main contenders, but after having worked with all of these, we found that each implementation has its own inflexibilities, shortcomings, and constraints.

libwebrtc is still the most mature implementation, but it is also the most difficult to work with. Since it's embedded inside Chrome, it's a moving target, the API can be hard to work with, and the project is quite difficult to build and integrate, all of which are obstacles in the way of native or server app developers trying to quickly prototype and try out things.

It was also not built for multimedia use-cases, so while the webrtc bits are great, the lower layers get in the way of non-browser use-cases and applications. It is quite painful to do anything other than the default "set raw media, transmit" and "receive from remote, get raw media". This means that if you want to use your own filters, or hardware-specific codecs or sinks/sources, you end up having to fork libwebrtc.

In contrast, as shown above, our implementation gives you full control over this as with any other GStreamer pipeline.

OpenWebRTC by Ericsson was the first attempt to rectify this situation, and it was built on top of GStreamer. The target audience was app developers, and it fit the bill quite well as a proof-of-concept—even though it used a custom API and some of the architectural decisions made it quite inflexible for most other use-cases.

However, after an initial flurry of activity around the project, momentum petered out, the project failed to gather a community around itself, and is now effectively dead.

Full disclosure: we worked with Ericsson to polish some of the rough edges around the project immediately prior to its public release.

WebRTC in GStreamer — webrtcbin and gstwebrtc


Remember how I said the WebRTC standards build upon existing standards and protocols? As it so happens, GStreamer has supported almost all of them for a while now because they were being used for real-time communication, live streaming, and in many other IP-based applications. Indeed, that's partly why Ericsson chose it as the base for OWRTC.

This combined with the SRTP and DTLS plugins that were written during OWRTC's development meant that our implementation is built upon a solid and well-tested base, and that implementing WebRTC features is not as difficult as one might presume. However, WebRTC is a large collection of standards, and reaching feature-parity with libwebrtc is an ongoing task.

Lucky for us, Matthew made some excellent decisions while architecting the internals of webrtcbin, and we follow the PeerConnection specification quite closely, so almost all the missing features involve writing code that would plug into clearly-defined sockets.

We believe what we've been building here is the most flexible, versatile, and easy to use WebRTC implementation out there, and it can only get better as time goes by. Bringing the power of pipeline-based multimedia manipulation to WebRTC opens new doors for interesting, unique, and highly efficient applications.

To demonstrate this, in the near future we will be publishing articles that dive into how to use the PeerConnection-inspired API exposed by webrtcbin to build various kinds of applications—starting with a CPU-efficient multi-party bidirectional conferencing solution with a mesh topology that can work with any webrtc stack.

Until next time!