07 Jun 2021
In our 64k intros, in addition to synthesis, we (among other groups) typically use samples for small one-shot sounds that are notoriously hard to synthesize, yet crucial for the overall impression of a track (for example, kick/snare drums, vocal chops, etc.).
Of course, we don’t just store raw PCM data, because not only is it quite large, it doesn’t compress well. Instead, since at least the mid-2000s, the most popular “good enough” approach has been to use a GSM 06.10 codec to compress the sample(s), and then use a decoder shipped with Windows via the ACM API to decode the sample(s) during initialization.
The resulting compression ratio and audio quality that we get by using this codec on this kind of data isn’t particularly great, in fact it’s really bad by today’s standards. On top of that, even decoding a single sample takes ~10 ACM API calls, plus of a bunch of data juggling. Gargaj and I actually looked into this a bit, and it’s apparently ~265b, which is admittedly decent, but not exactly free. It’s also not tweakable - you get the rate you get, and that’s it. Still, it pays for itself, it’s worked for a long time, and the fact that Windows ships an encoder for it as well means it doesn’t require using/integrating any additional 3rd party code into our tooling, which, when it comes to codecs, can be pretty important! So it’s no surprise that the technique has been left alone for so long, but we can surely do better.
30 May 2021
When implementing an algorithm from a paper/blog post/whatever, it’s not uncommon to transcribe the included (pseudo)code while trying to understand the ideas presented. Unfortunately, this code is often presented in a poor manner, and in some cases, is just plain wrong. This is particularly annoying when the paper is behind a paywall. I’ve encountered this enough times that I think I should start writing quick posts explaining my findings and how to fix them. This is the first of those posts.
The decoder pseudocode in T. Fischer’s “A pyramid vector quantizer” is awful. It’s presented in a near-unreadable fashion that’s difficult to transcribe into basically any programming language, and what’s worse, it’s missing a crucial detail in step 3 that doesn’t adjust
xb to account for the decoded sign for dimension
i, that causes the decoder to de-sync for the remaining dimension(s). The simplest pathological case I found was
b = 4, L = 2, K = 2.
Actually, a really funny thing is that if you transcribe the code directly, it will work, so long as your environment allows integer overflows and you don’t mind waiting a while. But of course, this is not usable or helpful.
31 Aug 2020
In an earlier post, I talked about building an rABS entropy decoder on 6502, which became the heart of my new 4k intro packer experiment for C64. While exploring several different entropy coder variants for simple LZ codecs, and in particular tANS or an approximation of it with adaptive probabilities (spoiler alert: this was a dead-end), I realized that rABS was not only a possible alternative, but actually an OK fit for the system, providing a high precision entropy coder with… well, good enough decode speed and a not-as-tricky-as-I-had-originally-feared implementation :) .
However, I intentionally left out several details of the rest of the codec. The earlier post only describes how the entropy decoder is implemented and some of the modeling, the combination of which is kindof the novel thing here. But essentially, it only gives us a way to pull modeled bits out of an encoded bitstream. The bits we actually encode/decode and how we model those is what a codec is really all about; the entropy coding part, while certainly important, is really just a piece of machinery that allows us to build the rest of the codec effectively and take as much advantage of the modeling choices we make as possible.
So, in this post I’d like to go into detail about the rest of the codec, including the LZ scheme chosen, how we model various symbols, how some of these techniques were chosen/tested, and several parts of the encoder. I’ll save the rABS-specific encoding details/intuitions for a future post where I can talk about that specifically in more depth.
11 Feb 2019
Of the various compression/packer experiments that I’ve toyed with, one of my favorites has to be a rABS-based packer on C64 for 4k intros. It’s an interesting challenge to try and beat state-of-the-art packers for the platform for this use case, and I’m quite pleased with construction I came up with, as well as the results. In particular, the way the decoder is built and how probabilities are modeled are particularly interesting, and that’s what I’d like to focus on in this post.
Quick note: while rANS is a more likely fit for a resource-constrained system like C64, I went with rABS instead, the difference being that we’re going to work with binary symbols. The reason for this is that we typically observe better compression with adaptive modeling, and updating symbol predictions is much, much simpler with a binary alphabet (look ma, no divides!). This of course makes per-symbol decoding much more expensive as we have to break symbols down into individual bit components and perform decoding/model updates on those (so, much more frequently), but for this particular use case, this is a tradeoff worth making for the higher compression ratio achieved.
10 Feb 2019
Hello world!! Time to start another blog. But yeah uhm why?
I’ve been working on a few (particularly compression-related) projects that are difficult to cover nicely in stream format, as I’d like to 1. have the information in writing, which should be easier to digest; and 2. not spend a lot of time preparing for such a stream/series that would be organized enough to be useful. I’d of course still like to do streams on these projects, and plan to do so, but they’d probably be more like high-level overviews with Q&A and would use the relevant blog post(s) as a rough outline and place to point to for learning more.