5 Whys

View Original

A Critical Chain of Bus Factors

I've spoken before about the meaning of a bus factor. It's basically the idea that if there is only one person in your team that can do a critical part of the work, and that person gets hit by a bus, the team stops functioning.

(I've been asked to call this the "Lottery" factor... which is a nice twist on it).

It's not just developers who are susceptible to this. In any team, IT teams in particular, this seems to happen very commonly. In larger organizations, you might even have an infrastructure group and an "IT" group, and they both have different areas of responsibility, and in each group, there's only one person who can do some critical task. 

 

A chain of bus factors happens when you have bus factors depending on bus factors:

Your one developer who knows how to configure the pipeline can't test the changes because the agent is down. The one guy in IT who has access to the agent needs to reboot it, but does not have access. The one person who has access to reboot it (in the Infra team) is sick, so now there are three people waiting, and there is nothing in this earth that can help that situation. 

Here is another example of a critical bus factor chain:

  • Developer can't check in the one feature only they can code so the application is released,
  • because the source control is down, and the source control person can't fix it,
  • because the private cloud node is down, and the one person who can fix it, can't, 
  • because the certificate is expired, and the one person who can fix it, can't,
  • because the person who acquires certificates is away etc...

 

Steps to avoid this hell on earth may include:

  • Giving more access to other groups for the same resource
  • Sharing knowledge and responsibility with other people
  • pairing while working on critical things so knowledge sharing is guaranteed
  • Create a company policy of (2 bus factors or more) required.

 

Look around you at work right now, and realize how many bus factors you see, and are a part of. Each one slows down the system more and more. Each one causes the system to be more and more fragile. 

When you are a bus factor you might think:

"I need to work only on this one thing, otherwise I'll have too much to do, and so will everyone else on my team".

But realize that you are also creating more work for yourself this way:

Whenever someone needs work done on this topic, you will be the one person they turn to. You will be the pain point for every project that needs this type of worl from now on.

Sharing the burden also means alleviating the load, and letting more people finish their work without needing you.

Yes, there is a balance to be had, but remember that balance means not only "I won't give admin rights to the whole world" but it also means "I won't be the only one with the ability to do something".

How much time, money, pain and sufferingin the workplace is caused by people just waiting for people who are waiting for people who are waiting for people?

The answer will shock you! (OK, bad ending, but hey, great punch lines are not my job...)