Best Practice - Checklists

There is a great article called The Checklist by Atul Gawande in the December 2007 New Yorker magazine. The full text of the article is available for free on-line. If you count yourself serious about software engineering you should probably take fifteen minutes and read it.

Here's the bit I read that made me have an epiphany - and resulted in me xeroxing the article and adding it to my required reading list for my managerial best practices training:

"Intensive Care medicine has become the art of managing extreme complexity - and a test of whether such complexity can, in fact, be humanly mastered."

The article goes on to provide fascinating detail about the complexity of modern medicine, as well as some old-school engineering analogs to this complexity. It then states a simple truth - checklists reduce rates of error.

Making software is a complex task with hundreds, maybe thousands of highly error-prone steps and tasks. As the article points out, humans have an annoying tendency to fail in their efforts to navigate such complexity.

The task of good engineering is to eliminate complexity and to break problems down into simpler more manageable sub-problems.

If you work in software engineering you should do this as a matter of course.

If you work in software engineering in the context of a global team, with language and cultural barriers, you should do this as a matter of dire necessity.

Here are a couple of examples of complex procedures or sub-procedures and a partially developed sample checklist for each:

Sample Checklist #1: Stuff to do before you check in code

I've had to debug several sub-par offshore development teams through the years... In my experience, the situation is usually that US-based management is unhappy with the performance of remote engineers. They break the build too often, and don't do what "we" expect them to do. Often, those expectations have been conveyed only verbally, if at all. This list (or one like it, tailored to your shop) can help.
  1. Compile and link locally
  2. Ensure all new functions have Unit Test coverage
  3. Run Unit Tests locally
  4. Ensure Peer Code Review (using a Code Review Form (that's another checklist all to itself)
  5. Archive Code Review Forms
  6. Do Check-In
  7. Manually check to ensure you didn't miss files
  8. Watch automated build mail, to ensure you didn't break the build
  9. Fix it if you did

Sample Checklist #2: Stuff to include in a defect report

Incomplete defect reports have always infuriated me. I've worked with QA engineers who routinely wrote defect reports like "software crashed, fix it." Great. Very helpful. Thanks. To eliminate such useless clutter in the bug database I've always requested that my QA managers develop checklists of necessary information to include with a defect report. Here's an example for the back end server for a client-server product:

If the error is on the Server:
  1. Server configuration (standalone, mirrored, clustered, etc.)
  2. Server version and build number
  3. Server Operating System and patch level
  4. Steps to reproduce, if applicable.
  5. If the server issue involves a connection / session with a client, turn on a session trace logging on both the server and the client. Include both client and server session logs.
  6. Userdump files.
  7. Zipped export of server logs (or of specific log entries)
If the error is on the Web management console:
  1. Server Operating System for core Server and Web Server (if hosted separately)
  2. Web management console configuration type (i.e., where is it installed?)
  3. Server version and build number
  4. IIS version and patch level
  5. IIS security setting details
  6. Internet Explorer (or other browser) version.
  7. Java VM version(s)
  8. Steps to reproduce, if applicable.
  9. Event logs from the server where IIS lives
  10. Userdump files, if applicable

These are just simple examples. These lists don't seem to represent highly complex tasks. But even these sample lists represents nine, seven and ten discrete things I expect an engineer to remember to do. Given the innate fallibility of busy humans, that's a lot to ask. People forget steps, or to be honest they sometimes decide not to do a step they dislike. A checklist - something each engineer can print out and leave on his or her cube wall - can go a long way toward eliminating goofs and unmet expectations alike.

Best Practice:
  • Turn your complex, high-expectation tasks and processes into checklists.
  • Create a culture that understands that checklists are tools, and that it's acceptable and encouraged to refer to them every single day.

No comments: