Home > Uncategorized > Abstraction, Compression, and Errors

Abstraction, Compression, and Errors

January 12th, 2007

Todd Veldhuizen‘s paper Software Libraries and the Limits of Reuse: Entropy, Kolmogorov Complexity, and Zipf’s Law keeps triggering insights. One is that high-level programming abstractions are essentially a compression mechanism: instead of writing a few dozen lines of C each time we want to invoke a function via a pointer embedded in a struct, for example, we derive one class from another. Similarly, if we want to construct one list by iterating over the elements of another, transforming each independently, we use a list comprehension instead of a loop, a conditional, and an assignment.

But here’s the thing. Suppose you represent an image with three bytes per pixel (red, green, and blue). The worst thing a one-bit error can do is mess up one color channel at one point in the image (e.g., turn low green to high green). If you compress the image, though, then a one-bit error will almost certainly have a much greater effect: it can change the number of pixels of a particular color (if you’re using run-length encoding), or affect every other pixel “after” a certain point (if you’re using a more sophisticated adaptive encoding scheme).

I think the same is true of programs. If you mess up a function in a C program, you’ve messed up one function. Mess up a method in a class near the root of your inheritance hierarchy, and you’ve affected dozens of things; mess up a metaclass, or a generic that you’re using in a bunch of different places, and the effects spread even more widely. I’m therefore wondering (after a particularly nasty debugging session) whether there’s some fundamental tradeoff at work: eventually, the cost of each error in the high-level program outweighs the time saved by using the abstractions involved. Equivalently, the redundancy in lower-level programs might actually be a good thing, for the same reasons that redundancy is good in other evolved and engineered artefacts: it limits the damage that can result from something going wrong at any particular point.

Uncategorized

  1. January 12th, 2007 at 16:35 | #1

    Another way to frame the question is the trade-off between the redundancy of code and the lack of a centrally managed operation. Changes to an abstraction that break sub-classes probably means that the constraints were not well defined. I suspect unit tests are the best way to buffer against this as well as limiting the interpretation of what a particular class or method does.

  2. January 12th, 2007 at 20:47 | #2

    Most of the problems I’ve wrangled into submission recently could be described as abstraction errors.

    Stated differently, many (most?) problems we have are rooted in one or both of:

    1) code that is intended for reuse, but has undesirable side effects in some situations. Sometimes this is because the code is implemented incorrectly compared to the intended design, and sometimes this is because the designer didn’t forsee some new re-use scenario

    2) code that is not intended for reuse, but gets reused anyway :) . This is much more insidious, since the maintainer of the code often doesn’t know that it is being used elsewhere, and makes incompatible changes that percolate throughout the code. Ugh.

    Have I mentioned that maintenance programming is fun?

Comments are closed.