Rob commented on my Degrees of ‘Works’ thus:
The next scary bit is how easy it is to think that you have something at a level 5, when really it didn't reach two....
How do you go about improving that.
As well as being a programmer, I am our company's network security guy. One of the central tenets of network security is defense in depth. Defense in depth is born of the inherent paranoia that comes from trying to secure a network, combined with a strong fear of the consequences of failure. It goes something like this:
Keeping the server patched may not find everything. A firewall may be circumvented. Vulnerability scanning is pretty useless on its own. And above all else, I could screw up.
Therefore, security is layered such that if one layer is got past, or screwed up, the next will stop it. So I keep my servers patched, have firewalls on my trust boundaries, and scan occasionally in case I've missed something.1 How much of this I do depends on how effective each is, and how much it's going to cost. And in turn, how much you spend on any security measure depends on how much you project you'd lose if you didn't have it.
So anyway, to bring this back to the point, the answer to ‘How do we ensure we're up the good end of the “works” spectrum?’ lies in Quality in Depth. You need automated unit tests. And functional tests. And a regular automated build. And tests against that automated build. And knowledgable testers whose job it is to find new ways to break the application. And useability testing.
Above all, you need a culture where quality is considered important. (This, once again, parallels security.) If the company doesn't value quality, you're never going to achieve it because everyone will be looking for ways to work around whatever measures you put in place, and nobody will ever be called to account for doing so. People have to want to produce quality software, and to do so there needs to be a culture where people are enthusiastic about testing, and where breaking the build or having a bug filed against you is a spur to fix the problem immediately, and make sure it doesn't happen again.
Unfortunately, software engineering is often an enterprise where quality is not the highest priority. All of the above measures cost money, and it's hard to demonstrate in a concrete fashion whether any of them actually saves the enterprise more than it costs in the long run. How do you measure the impact of a large bug, against the advantage of being quickest to market, for example?
1 Obviously I've skipped a lot of measures, such as issues of employee education and trust, here for the sake of brevity.
note I'm not incredibly happy how this essay turned out, but not quite unhappy enough to can it entirely. I'm going to have to rewrite it some day, though, when my mind is more focused.