Sunday, August 28, 2011

Learning About Robustness

Something I think about from time to time is, "What do Old Hands think about code, that is different from what rookies think?" Here are some examples.
  • When I was a rookie, I believed programs should be precise. If the program parsed some input, that input should conform. I believed that a program was robust if it printed a detailed error message and halted, so I could quickly fix the input.

    As an experienced developer, I realized that processing the input was far more important than ensuring that every comma was in the right place. I came to believe that a program was robust if it accepted a generous superset of the expected input syntax. I remember hearing this advice as a rookie and snorting with derision at the thought of anything so sloppy. This is another difference between Old Hands and rookies.

    When I became an Old Hand, I realized that any program you run frequently provids a service. Any time the program halts prematurely, it fails in its reason for being, which is to provide the service. I discovered that even a program that is not working correctly may be more useful than a program that won't run. I learned that a program is robust if it provides its service, all the time, under the widest range of conditions. A robust program should not fail, and if it does fail it should recover, and if it cannot recover, it should restart, and if it can't restart, another program should restart it. The code to provide all this robustness amounted to as much as 50% of the total, but I no longer viewed that as wasteful excess.
  • When I was a student, I never checked return codes. "How could a system function possibly fail?", I thought. I was naive enough to assume that the people who write system functions never made mistakes, and hardware never failed.

    When I was a rookie, I discovered that even if a function failed only once in a million calls, I would see it fail. A million calls just doesn't take that long when a program is running flat out all day. Such a program would terminate unexpectedly, usually saying no more than "segmentation fault".

    As an experienced developer, I learned that functions fail all the time, but I hadn't known it, because I wasn't checking the return codes! Usually functions failed because the argument values were illegal. I remembered spending hours looking for why my program wasn't working, when the functions were telling me exactly what was wrong. I meticulously checked every error return, and reported failure up to higher levels of the code. I also began writing functions that did more checking and logging, because when these functions failed, they told me what I was doing wrong.

    Once when I was old enough to know better, I wrote a library of meticulous functions tthat checked every return code, to check out some functionality in Windows. It took days. A colleague who was an Old Hand bodged together an informal but usable tool in a couple of hours because he didn't check return codes. I learned something that day.

No comments:

Post a Comment