Sunday, May 3, 2015

A Transcoding Proxy for HTTP Video Streams

Sometime last year, we worked on a client project to create a prototype for a server that is, in essence, a "transcoding proxy". It accepts N HTTP client streams and makes them available for an arbitrary number of clients via HTTP GET (and/or over RTP/UDP) in the form of WebM streams. Basically, it's something similar to The terms of our work with the client allowed us to make this work available as Free and Open Source Software, and this blog post is the announcement of its public availability.

Go and try it out!

git clone

The purpose of this release is to demonstrate some of the streaming/live transcoding capabilities of GStreamer and the capabilities of LibSoup as an HTTP server library. Some details about the server follow, but there's more documentation and examples on how to use the server in the git repository.

In addition to using GStreamer, the server uses the GNOME HTTP library LibSoup for implementing the HTTP server which accepts and makes available live HTTP streams. Stress-testing for up to 100 simultaneous clients has been done with the server, with a measured end-to-end stream latency of between 150ms to 250ms depending on the number of clients. This can be likely improved by using codecs with lower latency and so on—after all the project is just a prototype. :)

The N client streams sent to the proxy via HTTP PUT/PUSH are transcoded to VP8/Vorbis WebM if needed, but are simply remuxed and passed through if they are in the same format. Optionally, the proxy can also broadcast each client stream to a list of pre-specified hosts via RTP/UDP.

Clients that want to stream video from the server can connect or disconnect at any time, and will get the current stream whenever they (re)connect. The server also accepts HTTP streams via both Chunked-Encoding and fixed-length HTTP PUT requests.

There is also a JSON-based REST API to interact with the server. There is also in-built validation via a Token Server. A specified host (or address mask) can be whitelisted, which will allow it to add or remove session id tokens along with details about the types of streams that the specified session id is allowed to send to the server, and the types of streams that will be made available by the proxy. For more information, see the REST API documentation.

We hope you find this example instructive in how to use LibSoup to implement an HTTP server and in using GStreamer for streaming and encoding purposes. Looking forward to hearing from you about it!

Saturday, May 24, 2014

Making things better

When I read Matthew’s post a week ago about creating desktop features that cater to developers, I found myself agreeing quite strongly with the sentiment put forth, and I started to wonder how we could better integrate development features into GNOME. We’ve so far focused strongly on general ease of use and the use-cases of non-technical users, but as we’ve seen time and again, FOSS projects tend to first become popular on the shoulders of technical users. One must be in a position to attract both kinds of users if one wants broad acceptance and use.

On the other hand, I found myself disagreeing very strongly with the sentiment in Philip Van Hoof’s post yesterday. I found it strange that Philip chose to state that greater focus on development integration goes hand-in-hand with a lesser focus on the Outreach Program for Women. Surely one’s immediate reaction would be to utilize the manpower (sic) OPW provides to make development integration happen, right? I’m left wondering what sets of biases, prejudices, or misconceptions one must have to conclude otherwise.

In fact, being in the position to call multiple former OPW participants friends and hence being intimately familiar with their work, I’ve begun to realize that the removal of the “programmers only” requirement that GSoC has, actually leads to a much more holistic approach towards patching the deficiencies that GNOME has.

Without OPW, would we have had a Usability Researcher for GNOME 3? Or a professional Typeface Designer improving the shapes of our UX font, Cantarell and expanding the character set? And surely as programmers and users we understand the importance of documentation? To those who want to see some code, there's plenty of that to see as well.

Over the years, GNOME as an organisation has accreted talent and expertise in a wide spread of technical domains. We have the ability to create the most “usably-featured” OS out there — but only with all our arms working together. Cutting one off in the hope that another will become stronger will only result in a gaping, bleeding, wound.

Saturday, November 9, 2013

A New Chapter

Yesterday, my 20-month-long stint at Collabora ended. The company culture, work environment, and perks were brilliant, and working with friendly and extremely competent colleagues was a pleasure.

Starting today, and for the next couple of months, I'll be spending most of my time on the various projects that I've been working on, and on tackling the enormous backlog of itches to scratch that I have accumulated over the past two years. In addition, I'll be looking for (and am available for) part-time consultancy gigs to fill the gaps in-between.

I'm excited about the possibilities that have opened up for me due to this, and I'm really looking forward to spending more time on GNOME!

Friday, May 3, 2013

A FOSS Devanagari to Bharati Braille Converter

Almost a year ago, I worked with Pooja on transliterating a Hindi poem to Bharati Braille for a Type installation at Amar Jyoti School; an institute for the visually-impaired in Delhi. You can read more about that on her blog post about it. While working on that, we were surprised to discover that there were no free (or open source) tools to do the conversion! All we could find were expensive proprietary software, or horribly wrong websites. We had to sit down and manually transliterate each character while keeping in mind the idiosyncrasies of the conversion.

Now, like all programmers who love what they do, I have an urge to reduce the amount of drudgery and repetitive work in my life with automation ;). In addition, we both felt that a free tool to do such a transliteration would be useful for those who work in this field. And so, we decided to work on a website to convert from Devanagari (Hindi & Marathi) to Bharati Braille.

Now, after tons of research and design/coding work, we are proud to announce the first release of our Devanagari to Bharati Braille converter! You can read more about the converter here, and download the source code on Github.

If you know anyone who might find this useful, please tell them about it!

Thursday, December 6, 2012

Recording VoIP calls using pulseaudio and avconv

For ages, I've wanted an option in Skype or Empathy to record my video and voice calls1. Text is logged constantly because it doesn't cost much in the form of resources, but voice and video are harder.

In lieu of integrated support inside Empathy, and also because I mostly use Skype (for various reasons), the workaround I have is to do an X11 screen grab and encode it to a file. This is not hard at all. A cursory glance at the man page of avconv will tell you how to do it:

avconv -s:v [screen-size] -f x11grab -i "$DISPLAY" output_file.mkv

[screen-size] is in the form of 1366x768 (Width x Height), etc, and you can extend this to record audio by passing the -f pulse -i default flags to avconv2but that's not quite right, is it? Those flags will only record your own voice! You want to record both your own voice and the voices of the people you're talking to. As far as I know, avconv cannot record from multiple audio sources, and hence we must use Pulseaudio to combine all the voices into a single audio source!

As a side note, I really love Pulseaudio for the very flexible way in which you can manipulate audio streams. I'm baffled by the prevailing sense of dislike that people have towards it! The level of script-level control you get with Pulseaudio is unparallelled compared to any other general-purpose audio server3. One would expect geeks to like such a tool—especially since all the old bugs with it are now fixed.

So, the aim is to take my voice coming in through the microphone, and the voices of everyone else coming out of my speakers, and mix them into one audio stream which can be passed to avconv, and encoded into the video file. In technical terms, the voice coming in from the microphone is exposed as an audio source, and the audio for the speakers is going to an audio sink. Pulseaudio allows applications to listen to the audio going into a sink through a monitor source. So in effect, every sink also has a source attached to it. This will be very useful in just a minute.

The work now boils down to combining two sources together into one single source for avconv. Now, apparently, there's a Pulseaudio module to combine sinks but there isn't any in-built module to combine sources. So we route both the sources to a module-null-sink, and then monitor it! That's it.

pactl load-module module-null-sink sink_name=combined
pactl load-module module-loopback sink=combined source=[voip-source-id]
pactl load-module module-loopback sink=combined source=[mic-source-id]
avconv -s:v [screen-size" -f x11grab -i "$DISPLAY" -f pulse -i combined.monitor output_file.mkv

Here's a script that does this and more (it also does auto setup and cleanup). Run it, and it should Just Work™.


1. It goes without saying that doing so is a breach of the general expectation of privacy, and must be done with the consent of all parties involved. In some countries, not getting consent may even be illegal.
2. If you don't use Pulseaudio, see the man page of avconv for other options, and stop reading now. The cool stuff requires Pulseaudio. :)
3. I don't count JACK as a general-purpose audio system. It's specialized for a unique pro-audio use case.