Bogus paper pseudocode: Speex: A Free Codec For Free Speech (2006)

ferris blogs stuff Bogus paper pseudocode: Speex: A Free Codec For Free Speech (2006)

30 Jun 2021

Similar to my last post in this series, here’s another quickie about a strange filter implementation I found in Jean-Marc Valin’s “A Free Codec For Free Speech”. Of course, speex has been obsolete for some time now, but I think CELP (and LPC in general) is neat, and I’ve been studying up on some proper DSP theory lately, in particular filters and their corresponding z-transforms. So, when I found a simple filter description along with pseudocode in the speex paper, I thought I’d try deriving one from the other for practice.

The filter in question is described in section IV.C (“Common Layer 8 Problems”). It’s a bog-standard DC blocker with the following transfer function:

\[N(z)=\frac{1-z^{-1}}{1-\alpha z^{-1}}\]

Not terribly interesting, but the pseudocode listing (modified slightly for clarity) is:

#define ALPHA .98f
static float mem = 0f;
for(i = 0; i < frame_size; i++){
  mem = ALPHA * mem + (1 - ALPHA) * input[i];
  output[i] = output[i] - mem;
}

Right away, there’s something fishy here: output[i] is defined in terms of output[i], which makes no sense! But surely that’s the only issue, right?

Wrong.

If you convert the code to its corresponding difference equations (which only requires some explicit time shifts in this case) and work out/simplify the corresponding filter z-transform (like I tried to do), it won’t match the equation from the paper. TL;DR, the Output(z) terms end up cancelling, and you end up with everything being equal to zero. Fun times!

I even gave the paper the benefit of the doubt and tried with output[i] = input[i] - mem and got something super strange:

\[N(z)=-\frac{1-\alpha}{(1-z^{-1})(1-\alpha z^{-1})}\]

Yeah, this is clearly a very different filter.

So, I decided to go another route. From the original z-transform (which we know is good, because this kind of filter is listed all over the place), we can convert it to its corresponding time-domain difference equation:

\[output[n]=\alpha output[n-1]+input[n]-input[n-1]\]

Equivalently:

\[y[n]=\alpha y[n-1]+x[n]-x[n-1]\]

From this, I drew out the corresponding DF-I signal flow graph:

From that, I derived (by the commutative property of LTI systems) the corresponding DF-II graph:

Note that this part can be done entirely symbolically (and it’s arguably easier to do so), but I wanted to do it pictorially this time.

From that, I derived (by transposition) the corresponding TDF-II graph, which I suspected would be the closest to the original paper code, both because it only had one memory element (which suggested it was a type II form), and because it’s common to use TDF-II for numerical robustness:

Note that this step is something I actually don’t know how to do symbolically, but even if I did it would still probably be easier to do it pictorially. I also derived the TDF-I form and its corresponding difference equations, but this was just for additional practice.

Following the graph, the corresponding difference equations are:

\[mem[n]=\alpha mem[n-1]+(\alpha-1)input[n]\] \[output[n]=input[n]+mem[n-1]\]

Let’s see what the code looks like according to these equations instead:

#define ALPHA .98f
static float mem = 0f;
for(i = 0; i < frame_size; i++){
  // order has to be swapped to avoid contaminating mem before we output!
  output[i] = input[i] + mem;
  mem = ALPHA * mem + (ALPHA - 1) * input[i];
}

Aha, so some differences are already clear here. First, output[i] doesn’t look bogus anymore, and 1 - ALPHA has become ALPHA - 1 (among other things).

If you just want the punchline, I’ll save you some trouble: yes, this is the correct code and what should have been in the paper. Of course, I can’t make claims like that without proof, so let’s derive this new code’s corresponding z-transform so that we can compare it with what we expect for good measure.

From the difference equations, we have the following z-transforms:

\[Mem(z)=\alpha z^{-1}Mem(z)+(\alpha-1)Input(z)\] \[Output(z)=Input(z)+z^{-1}Mem(z)\]

If we move some terms around for Mem:

\[Mem(z)=\frac{\alpha-1}{1-\alpha z^{-1}}Input(z)\]

We can substitute that into our Output z-transform:

\[Output(z)=Input(z)+z^{-1}\frac{\alpha-1}{1-\alpha z^{-1}}Input(z)\]

This looks a bit janky, but if we simplify:

\[Output(z)=\frac{1-z^{-1}}{1-\alpha z^{-1}}Input(z)\]

Or, equivalently:

\[N(z)=\frac{1-z^{-1}}{1-\alpha z^{-1}}\]

Bingo.

Look, I get that this is a trivial filter that’s already listed/derived everywhere and I found it in a paper for an obsolete codec, and the filter isn’t even really related to the codec itself, and the only reason I was even looking was because I was curious about the outdated codec anyway and wanted to practice some new math I’m learning. Still, the paper was wrong, and here’s the fix.

30 Jun 2021

ferris blogs stuff

Bogus paper pseudocode: Speex: A Free Codec For Free Speech (2006)

Add a comment

Recent posts