Where Bugs Come From

April 27, 2004 12:13 AM

Over the weekend, my father was rather embarrassedly relating a story from the late 80's. At the time, there was a school of management that believed zero-defect software was possible, so long as we made the effort, and paid attention to detail.

The story was from a workshop for programmers. To cut a long story short, the workshop was a classic tale of slippery assimilation, trying to find that ridiculous cut-off point where a program went from being short enough to be bug-free, to long enough to be inevitably buggy. I vaguely remember the answer being 137 lines of code.

This, of course, is the promise of structured programming, of functions, of objects. If we can write 137 lines of code without a bug, then we can structure our programming style so that we're always writing units of fewer than 137 lines. We can build those units into components, and voila! No more bugs.

This is also the promise of unit testing. We divide our programs into small pieces, and test the buggery out of each piece individually. This is quite an effective approach (and a much better time/effort trade-off than a formal proof), but the failure conditions of unit testing are interesting:

  1. It is rarely possible (or at least practical) to unit-test exhaustively. The art of effective unit-testing lies in being able to predict where your code is likely to break, and testing those edge-cases. If you fail to predict where an edge-case might occur, you won't write a unit test for it, and it's a potential source of error.
  2. More dangerously, code is frequently rewritten in ways that do not change its behaviour. A better algorithm is chosen, or the code is modified to fix a bug. Accepted wisdom is that the continued running of the existing unit-tests is sufficient proof that the code continues to work as intended. However, changing the code is likely to move the important edge-cases in ways that render existing tests pointless, and additional tests necessary.

Ironically, at exactly the same time my father was flying around the Asia-Pacific giving these workshops, he recommended his son read Complexity by Mitchell Waldrop: a great little populist science book about the study of complex systems and emergent behaviours. The book described the fascinating and inherently unpredictable behaviour that arises when many small systems with clearly defined rules interact. I think you know where I'm going here.

Computer programs are complex. We manage this complexity by dividing programs into manageable pieces that can be isolated and scrutinised closely enough to find the errors of logic that will inevitably sneak in. We then create complexity by having each of these systems interact with each other. The more points at which these pieces can interact, and the more variations in that interaction, the more chance for some chaotic, unwanted behaviour.

Encapsulation in objects; presenting façades for components; design by contract: ways in which we attempt to keep some control over the scope of these interactions. Part of programming is fighting the battle against unnecessary complexity, because there's already more than enough necessary complexity waiting to trip us up.

Automated or scripted functional tests don't really help us here. Unlike our 137-line units, in which the edge-cases can be determined by inspection, the brittle edges of our complex systems are much harder to spot. Scripted functional tests serve a single purpose: they verify that some single path through the maze works, and continues to work. Rarely does anyone write a new functional test without being sure before-hand whether it will succeed or not.

Generally, functional tests end up working as regression tests. They demonstrate a particular base-line of stability where execution paths that must obviously work, or paths that have caused enough trouble in the past to warrant a test, continue to work.

A certain number of "complexity bugs" can be found through programmer vigilance. Get to know your code. Get to know how the pieces work and how they talk to each other. The more broad a view you have of the system being programmed, the more likely you will catch those pieces of the puzzle that don't quite fit together, or spot the place a method on some object is being called for some purpose it might not be fully suited.

Bruce Schneier makes a good case for the absolute need to examine code for security flaws from this position of knowledge. When considering security-related bugs, we have to ensure the system is proof against someone who knows how it works, and is deliberately trying to break it, again an order of magnitude harder to protect against than a user who might only occasionally stumble across the "wrong way to do things".

For the rest, you need QA. You need QA who know told what they are trying to accomplish with the software and how that is being accomplished, but who want to find a way to break it, and who don’t fall in the trap where their tests unconsciously follow the path of least resistance through the code.

The inevitablility of bugs also means we have to pay attention to the failure modes of the software we write. If an ACID database encounters a bug, it's vitally important that data integrity be maintained above all else. If a NullPointerException occurs in a consumer application, the user might even be happier not knowing anything went wrong. (I've lost count of the number of times I've been presented with a stack-trace in IDEA and just clicked "ignore" because there's absolutely nothing I could possibly do about the error. In that case, am I any better off having seen the dialog in the first place?)

Support is another vital way to manage failure. If a bug is discovered in production, how is the problem communicated back to the developers, and how quickly can the fix be deployed or patch be released?

Alternatively, of course, you could just make sure you never write any program longer than 137 lines.

Previously: Meditations on the First Version

Next: You Can't Go Home