About a month ago I decided the time had come to find out why, when I attempted to blank a cd in my cd rewriter, cdrecord (the program I was using to do this) hung – and then could not be killed off because the operating system thought it had outstanding I/O in progress.
This meant getting down to a copy of linux source code, building a system with some debug statements in it and finding out what was going on.
It was a hard three weeks, but I have eventually proved that there was a hardware problem with my drive. I must say, it was one of the most satisfying activities I have undertaken recently.
It reminded me of my time at college, when I would spend my evenings in the bar at the local halls of residence with a dump from the latest run of my program (this was the late ’60s so interactive debugging was not possible) and use the information in it to figure out what was wrong
Although I have been fascinated by computers since I was 11, and knew that I would have a career in them, it was not until much later that I realised that it was this aspect of computing which was my pleasure. In my early career building bespoke systems for customers was the norm, and I was in several projects where a team of us would do just that. Coding for me was a chore to get through to get the basics of a system in place. Module testing the code to ensure it worked as it should was much better, but when we got to final integration, then I really shone. I was always the one who had an almost instinctive feeling of where a bug lay (even if it was not my code) and was regularly asked by my colleagues for advice. But my most enjoyable moments were when tackling really difficult problems.
One such project was when microprocessors where first available. Our customer had some strange paper tape readers (made by Plessey) which would communicate their eight bit characters in a twelve bit code If an error was detected by the receiver, a signal via the modem carrier back would cause the paper tape reader to stop, back up 30 characters and restart (ignoring the 59 characters whilst the reader returned to the character in error).
Our job was to write a communications concentrator to receive multiple simulatenous tranmissions from these concentrators and convert to a protocol to talk to a mainframe (an ICL 2900). We used a minicomputer as the main controller, with special purpose microprocessor cards to handle the paper tape reader protocol. These processors had a count down timer that was set to go off in the middle of each bit (so the state of the bit could be read) and another interrupt on each state change on the data line (so we could synchronise from the bit edge).
Everything was working, except that every 20 minutes the paper tape reader would go into a loop continuously recyling back and forth its 60 character error correction. It took me two weeks of digging to find a 1 in a million problem (one bit in a million was every 20 minutes). In the end I discovered that the middle of the bit interrupt and the edge interrrupt were happening within 20 microseconds of each other (presumably because of noise on the line) and there was a small window were interrupts were enabled whilst in an interrupt service routine causing havoc.
I got a real buzz when a small correction to prevent this meant that every thing just worked. But despite the fact that each day had meant a two hour commute on the train to the clients site, and the long hours I had also really enjoyed that period of intense detective work.
So I appear to have different motivations than most people in the open source world. Debugging is my pleasure.