Everyone makes mistakes. CrowdStrike made a mistake. They will, in my opinion, most likely never make a mistake of this magnitude again.
Several years ago, I misconfigured a cyber security tool and blocked – in about a minute – more than 700 users from network access. I effectively shut down a call center with a few keystrokes. More than a few colleagues were unhappy with me. But we must learn from our missteps.
I learned from my mine.
The company Vice President responsible for the entire cyber security division asked me to a meeting the next day. It’s an understatement to say I was more than a little nervous about that meeting. The VP got on the call with me, along with one other person who was present to document the conversation. The VP’s first words were, “Everything is okay. The systems were all back up a few hours after the incident due to your corrective efforts. I just want your opinion on how we can stop this from happening again.”
I was shocked. I had expected him to tell me how much he didn’t appreciate my blunder and how much it had cost the company. I respected him for asking the key question, “How do you think we can stop this from happening again?”
I promptly told him that we needed more than one set of eyes on high impact configuration changes when they are happening. Changes at that level need to be discussed beforehand with 2-3 engineers and/or administrators as if reviewing any agenda to a meeting. The group needs to agree on all changes and why the change is necessary. As such, when going through the changes, more than one person would be checking for the correct inputs as they are made.
The VP was happy with my immediate answer. He knew he was asking the right person about operationalizing the processes and procedures to ensure the redundant and correct operation of the tool. This was a lesson I never forgot, and I teach it to operational teams today. Many times, there is pushback from an administrator stating there is no other administrator in the company that understands the tool, therefore, no one else is available or needed during the configuration changes. When I hear these excuses, I inform the manager or managers responsible in the organization to highlight its importance in support of getting additional resources on these critical cyber security tools.
What have we learned from the CrowdStrike incident?
There is a lot of consolidation going on in the cyber security space today. This means that larger companies are buying smaller companies. More successful companies are gaining market as smaller companies are disappearing or getting rolled into larger ones. This also means fewer vendors for us to choose from, which can be seen as a good thing.
However, as we can see from this most recent global technology outage, having all your security tools from one vendor is not always a good situation.
When we had to delete the CrowdStrike agent from the impacted endpoints, what was protecting those devices? How is our layered defense working when this happens?
What tool can I use to check the performance of my layered defense?
Another important issue we need to consider are the specific limitations of the tools we are using. We can then more effectively place them in our organization’s layered defense. Several bloggers commented on CrowdStrike’s Terms of Service agreement which states that placing CrowdStrike’s agent on critical systems was not its intended use (paraphrasing here for brevity). This was surprising as I have been informed by several people that they were considering utilizing CrowdStrike for their manufacturing systems (which are critical) in their operational technology (OT) networks. Another talking head on LinkedIn advised, “not to use non-OT cyber security tools on OT networks that they are not designed for.” I think this advice is key.
Was CrowdStrike using Productive Stupidity?
There is a great piece I read some time ago by a microbiology researcher at the University of Virginia, Martin A. Schwartz. It was called, “The Importance of Stupidity in Scientific Research.” His article outlined how we should embrace feeling stupid and really learn from it. The lessons learned are part and parcel of our success in defending our networks. Martin Schwartz’s article was about working by being productively stupid:
“Productive stupidity means being ignorant by choice. Focusing on important questions puts us in the awkward position of being ignorant. One of the beautiful things about science is that it allows us to bumble along, getting it wrong time after time, and feel perfectly fine as long as we learn something each time. No doubt, this can be difficult for students who are accustomed to getting the answers right. No doubt, reasonable levels of confidence and emotional resilience help, but I think scientific education might do more to ease what is a very big transition: from learning what other people once discovered to making your own discoveries. The more comfortable we become with being stupid, the deeper we will wade into the unknown and the more likely we are to make big discoveries.”
We are all students in learning how to keep our operational stability at the lowest risk per unit cost. Therefore, we can learn from CrowdStrike’s mistake and move forward with more confidence.
So, how do we remain positive?
We must learn from this mistake and pivot to adjust our security architecture to give us the most effective risk reduction per unit cost by layering our tools appropriately. We need to raise the level of conversation regarding these adjustments as this incident was a window that opened for attackers but should now be getting closed rapidly.
I will continue to recommend CrowdStrike to my clients. And I will also ask them what other tools they possess to protect the network if any one of the tools in their cyber security layered defense fails.