Changes Are Coming to the Microsoft Security Ecosystem
Morgan Holm
Sep 06, 2024
CrowdStrike Outage
On July 19th of this year CrowdStrike distributed a defective update to its Falcon Sensor security software that caused significant problems with Microsoft Windows computers running the software. Approximately 8.5 million systems crashed and were unable to reboot. This resulted in what is said to be the largest outages in the history of information technology.
This outage impacted a wide range of businesses and governments all over the world. This included airports, airlines, banks, hotels, emergency services, hospitals, manufacturing, stock markets, retail, and many more. Blue screens of death (BSOD) were appearing everywhere including public spaces such as airport terminals. Initially there were rumors this was the result of a cyberattack, but the CrowdStrike CEO confirmed it was due to a faulty CrowdStrike update. It is estimated that the worldwide financial damage is at least $10 billion USD.
Within a few short hours the bug was identified, and a fix was released. However, many of the impacted computers needed to be fixed manually. Outages continued for numerous large organizations and services for some time since many of the machines were not able to be fixed remotely. There are pending lawsuits and investigations by governments around the world due to the severity and scope of the incident.
Why was the impact so significant?
The Falcon Sensor from CrowdStrike is an endpoint sensor that operates at the system kernel level on each computer. The sensor monitors, stops threats and prevents malware from turning off security software. The downside of running in kernel mode is that if there is a failure, it can cause Windows to crash. In this case, it did crash causing the blue screen of death and went into a boot loop or booted into recovery mode rendering the effected computers unusable.
The reason so many computers were impacted was that it was a patch distributed by CrowdStrike simultaneously to all their clients. There was no phased rollout or special cases even for critical infrastructure and no way for IT at these organizations to intervene, schedule or test the patches. It appears that CrowdStrike’s testing was also lacking. There should have been testing in a sandbox with both valid and invalid data and regression testing with older data formats before pushing out the patch.
Windows Endpoint Security Ecosystem Summit
Microsoft wants to address the issues that lead to the outage and will host a Windows Endpoint Security Ecosystem Summit at their headquarters in Redmond, Washington on Sept. 10, 2024 with key partners. CrowdStrike and partners that provide endpoint security solutions have been invited to the summit along with government representatives. Topics will include safe deployment and resiliency to prevent similar outages like this in the future.
This will likely entail discussions of taking away kernel mode access from these vendors and moving to user mode. In user mode a crash would more likely only render the application unusable and not the entire operating system. This alone will not solve all the potential problems. Whatever comes of the summit will also have to apply Microsoft’s own Defender for Endpoint product or they would likely be in violation of antitrust agreements with the EU. Microsoft Corporate VP Aidan Marcuss said they would share updates of the conversations after the event. Stay tuned because changes are coming to the Microsoft security ecosystem.