Thursday 6 December 2012

Recording VoIP calls using pulseaudio and avconv

For ages, I've wanted an option in Skype or Empathy to record my video and voice calls1. Text is logged constantly because it doesn't cost much in the form of resources, but voice and video are harder.

In lieu of integrated support inside Empathy, and also because I mostly use Skype (for various reasons), the workaround I have is to do an X11 screen grab and encode it to a file. This is not hard at all. A cursory glance at the man page of avconv will tell you how to do it:

avconv -s:v [screen-size] -f x11grab -i "$DISPLAY" output_file.mkv

[screen-size] is in the form of 1366x768 (Width x Height), etc, and you can extend this to record audio by passing the -f pulse -i default flags to avconv2but that's not quite right, is it? Those flags will only record your own voice! You want to record both your own voice and the voices of the people you're talking to. As far as I know, avconv cannot record from multiple audio sources, and hence we must use Pulseaudio to combine all the voices into a single audio source!

As a side note, I really love Pulseaudio for the very flexible way in which you can manipulate audio streams. I'm baffled by the prevailing sense of dislike that people have towards it! The level of script-level control you get with Pulseaudio is unparallelled compared to any other general-purpose audio server3. One would expect geeks to like such a tool—especially since all the old bugs with it are now fixed.

So, the aim is to take my voice coming in through the microphone, and the voices of everyone else coming out of my speakers, and mix them into one audio stream which can be passed to avconv, and encoded into the video file. In technical terms, the voice coming in from the microphone is exposed as an audio source, and the audio for the speakers is going to an audio sink. Pulseaudio allows applications to listen to the audio going into a sink through a monitor source. So in effect, every sink also has a source attached to it. This will be very useful in just a minute.

The work now boils down to combining two sources together into one single source for avconv. Now, apparently, there's a Pulseaudio module to combine sinks but there isn't any in-built module to combine sources. So we route both the sources to a module-null-sink, and then monitor it! That's it.

pactl load-module module-null-sink sink_name=combined
pactl load-module module-loopback sink=combined source=[voip-source-id]
pactl load-module module-loopback sink=combined source=[mic-source-id]
avconv -s:v [screen-size" -f x11grab -i "$DISPLAY" -f pulse -i combined.monitor output_file.mkv

Here's a script that does this and more (it also does auto setup and cleanup). Run it, and it should Just Work™.


1. It goes without saying that doing so is a breach of the general expectation of privacy, and must be done with the consent of all parties involved. In some countries, not getting consent may even be illegal.
2. If you don't use Pulseaudio, see the man page of avconv for other options, and stop reading now. The cool stuff requires Pulseaudio. :)
3. I don't count JACK as a general-purpose audio system. It's specialized for a unique pro-audio use case.


slashdotaccount said...

Wouldn't it be better to record a multistream audio/video file? or two video/audio files? That way you don't lose any data.

Anonymous said...

To combine sources graphically you can use pavucontrol and recordmydesktop for audio/video screen grabbing. Though do not know if it works with the mic...


Will Thompson said...

This is cool! I could see it being really useful for recording podcasts.

Nirbheek said...

@slashdotaccount: True, that could be done, but that's harder to play back, isn't it? A single video file is probably the most hassle-free thing.

@Marco: I'm sure there are other ways, but this seemed like the most easily scriptable way. Now I can run a single command to start recording. :)

@Will: Aha, I didn't think about podcasts. Good point!

Jonas Lihnell said...

If you're enjoying fiddling around with PA on your free time I'd be happy if you could supply instructions on how to downmix two stereo streams to mono and then combine them into a stereo stream such that you and those you talk with end up in different channels.

We tried this at work with various versions of PA but never reached a satisfactory solution due to PA incurring a remarkably high delay when piping streams around.

Arun Raghavan said...

Jonas, I'm a bit curious - what are you trying to achieve with this setup? As you observe, doing this with PulseAudio needs a bit of prodding with null sinks, remap and loopback. Perhaps I could come up with a simpler solution if I understood what you were trying to do.

Unknown said...

avconv can mix audio using the amix filter:

I hope it helps =)

maarten256 said...

This is a very useful page and I wish I'd stumbled upon this earlier.

As it stands I recreated something much akin to this - using some of the inputs here about simultaneous recording of microphone (input) and speakers (output).

As an addition to the discussion, I'm using recordmydesktop to create the video portion - this has the added benefit that it allows you to specify a specific window.

I created a little script that allows the user to select the window; that input is given to recordmydesktop. At the same time it also kicks off arecord to record the audio.

After both recordmydesktop and arecord are terminated, I use avconv to put together the audio and video files...

I'm testing that right now to see if it works (good indications so far).

maarten256 said...

This is a very helpful page! Unfortunately I did not find this until I'd figured my own way through this myself.

I did add yup using some of the suggestions on here for my current solution.

To add to the discussion - I'm using recordmydesktop to create the video recording. The advantage being that it allows me to specify a window to record.

I'm using arecord to create a simultaneous audio recording. With the tweaks on this page it allows me to record both the input (microphone) as well as output (speakers).

I created a little script that allows the user to select a window - it kicks of the recording applications...

Once arecord and recordmydekstop are terminated, I use avconv to put together the independent video and audio files.

I'm in the process of testing this...