4.2.08

Doomsday, the board game

At various points in my career, I've done this exercise - sometimes alone sometimes with peers or with my staff.

I've come to refer to as the Doomsday Game.

You can play along at home... It works like this:

You get together at a white board, and you list out all the things that are important to you. You can do this at any level of granularity, so it might be about a feature or a module, a product release or a project, a contract you're setting up, or an entire branch of your business. Just write down the things you care about. It's a brainstorming exercise, so there are no wrong answers.

When you get a list, stack-rank it. Line up the most important all the way down to the most trivial.

For each of those things you care about, brainstorm about what could impact, damage, impugn, destroy, or otherwise muck up that thing.

That is to say, think of Doomsday Scenarios.

This is really just a chance to let your paranoid nature come to the forefront, and to get creative about what might make a mess of your world.

Put differently -- what keeps you up at night? If the phone rings (or your pager goes off, or the little red light on the blackberry lites up) at midnight, what would it be about...

Articulate that fear or anxiety, and put it on the list.

When you've exhausted your paranoid creativity, go to the next item on list. Lather, rinse, repeat.

If done well, after a couple of hours you've got a solid list of risks or potential risks that represents the previously unarticulated fears of you and your organization.

The last step in this exercise is to "test" your organization's resilience by seeing if there are already abatement measures, mitigation strategies or controls in place that in some way obviate the risks you listed out. If there aren't already measures in place, get them in place.

This might become more clear with a few examples...

Let's say you care about this list, in stack rank:
  1. Schedule Adherence
  2. Code Quality
  3. Data Privacy
For any project, there are likely to be many more things on the list, but for the sake of example, I'll develop these three.

Schedule Adherence could be impacted by:
  • Feature creep and poorly understood requirements - we'll build the wrong thing and have a lot of re-work
  • Connectivity loss - the half of the team in India lost the link to the repository and can't do check-ins
  • Vacation - The Diwali festival comes right in the middle of the first integration sprint
  • Attrition - The team in India loses people every other week
  • Slow computers - compile and link takes forever - we lose days on test builds alone
Code quality could be impacted by:
  • Bad code - those engineers in India don't follow our coding standards
  • Poorly understood internal requirements - the new engineers on the team don't know what they don't know about the product's modules and sub-systems
  • No trusted code -- nobody writes unit tests
Data privacy could be impacted by:
  • Inadvertent disclosure - test data might have social security numbers in it, and it might be seen by the wrong person
  • Theft - some bad guys might get into production data and just steal it
  • Trap doors - some bad guys might code in a trap door that steals data later
Now that you have a list of what could happen (hopefully your list will be longer and more creative), you can test your organization. Hypothetically, here's how you might stack up:
  • Schedule Risk due to Feature Creep: We have no way to isolate against this. It's the "nature of the beast".
    • Okay, so you just figured out something -- you think that this risk is going to impact your schedule and you have no way to control it. You have to do something here.
    • The only thing we can do is to validate our planned features with our customers, and document and socialize exactly what we're building.
    • Well, it sounds like you've started to develop a mitigation strategy. Assign it to a few people and have them come back in a couple of days with a full mitigation plan for this one.
  • Schedule slip due to connectivity loss: If the link between here and India goes down for more than an hour, we're hosed. We have to post the builds every night, they have to do check-ins. Without 4 MBpS, we're dead in the water.
    • Sounds like you've identified a pretty significant risk. What could you do about it?
    • We could implement a build farm in the lab in Bangalore, and let the team there do check-ins against a local repository. It would be tough, but it would mitigate the connectivity risk.
    • Or we could pay for redundant links, so if one provider drops a circuit we'd have a fall back.
    • Starting to see how this works? In this case, you have some work to do here to get a plan in place in advance of a network outage. Who ever draws the straw to develop this one should consider the cost of the solution compared to the likelihood of the event... Maybe it's enough to have a redundant link at a very low bandwidth, but which can be "turned up" on demand...
  • Schedule slip due to vacation:
    • We're covered here. Everyone knows their vacation plans for the next two quarters. We keep that information in a data file that Microsoft Project uses to track resource allocation. The only way we could get hosed here is if someone took an unexpected leave. That "death in the family" risk is the same across all our projects, so we're not really going to do anything about it yet.
    • It sounds like you're covered here, and while this is a real risk, you don't judge it to be significant enough to warrant action.
  • Schedule slip due to attrition
    • This one's tough. We keep losing our good people in India. We lost 2 last month, and some people are making noise about leaving this month. The only thing we can do is get better at on-boarding people. We've developed a training program that will bring someone up to speed on the basics in 2 weeks.
    • Also, if we lose too many people on the team in India, we can bring in short term "hired guns" here in the 'States, for the final few months of the project.
    • What about trying to make people want to stay?
    • Yeah, we don't do anything about that now. Maybe we should have all-hands, and buy the team in India dinner every night, like we do for ourselves...
    • Very quickly you've identified three actions -- training programs for easy on-boarding, contingency program for on-shore contractors, and a "hearts and minds" campaign to make people want to stay on your team. I think you're getting the hang of the game. Put this stuff into action and this risk gets less risky.
  • Schedule slip due to slow computers
    • It just takes too long to compile on our old P-3s, and the team in India has even crappier hardware.
    • Let's write a business case and take it to the CFO, to justify getting some big iron for all the developers. We should be able to make this problem go away by throwing some capital at it.
    • You're definitely getting the hang of it.
In the interest of brevity, I won't develop the hypothetical arguments and mitigations any further. But if you have an active imagination, and you're willing to spend some time letting yourself get paranoid, you can easily spot holes in your plans, see risks before they hit you, and put measures in place that will help insulate you.

It's a fun game, and can be a great team-building and perspective expanding exercise. And it ends up being an interesting way to approach work, and life in general. If you make this part of your approach to business, you'll find yourself doing it several times a year, maybe even more frequently, as your business or project evolves and your perception of risk changes...

If you got really creative, you could run a book and handicap the risks. I would never advocate gambling, but it would be an interesting and fun way to fund the release party...

No comments: