The recent global IT outage underscores the complexity of today’s global digital infrastructure. Companies need to understand their dependencies and downstream impacts to prepare for future events, which could come from malicious threat actors.
Please subscribe to read future issues — and forward this newsletter to interested colleagues.
Contact us directly with any comments or questions: [email protected]
Insight Focus | This Is Not A Drill
Hospital appointments canceled, flights delayed, city services put on hold – last Friday’s IT outage highlighted the interdependence and vulnerability of critical services on digital systems. This time, the disruption came from a single bad update to a third-party piece of security software. A routine update without proper testing crashed over 8.5 million Windows computers costing customers, and by one estimate, up to $15 billion.
Foreign adversaries were likely taking close note of the nature and scale of impact. Many have spent years and significant national resources developing the capability to accomplish similar effects when ordered to do so. The discovery of Volt Typhoon on critical energy, transportation, and telecommunications infrastructure revealed the PRC’s intent to disrupt critical infrastructure with cyber capabilities in the face of a conflict with the West. The recent finding of a backdoor in XZ Utils, an open source piece of software used in critical systems, showed that bad actors are utilizing supply chain attacks. The next outage could come from an ill intentioned threat actor seeking the same level of chaos, or worse.
Companies need to understand where they fit in the complex relationships that make up the current global IT and infosec supply chain. They produce software that impacts other systems and/or rely on software from other companies to run their own operations. Companies also need to evaluate their roles and complex dependencies both upstream and downstream to prevent an accidental or intentional compromise from triggering a systemic risk. Architecture designs, update protocols, response and recovery plans, and resilience mitigations take on a renewed importance after this incident.
Producing Trusted Software
For companies that produce software, the recent events underscore the importance of trust and transparency. Business leaders should evaluate the risks their company could pose to their customers and partners and the potential for cascading impacts. If your company becomes an attack vector, it is crucial to understand who will be affected and how to address those impacts effectively. To build trust, software companies need to reconsider their approach to speed-to-market, prioritizing safety, stability, and accuracy instead. This involves a thorough review of Quality Assurance (QA) and Continuous Integration/Continuous Deployment (CI/CD) practices. CISA’s recent Secure-By-Design pledge is one way companies are committing publicly to holding their development standards to high security standards.
Preparing for Disruptions
At the same time, companies must understand the access their purchased products have into their networks and the risks associated with that access. In the face of potential disruptions, whether from product failures or malicious malware, companies must have robust Business Continuity (BC) and Disaster Recovery (DR) plans. These plans should outline clear procedures for maintaining operations when systems go offline.
It is crucial to understand increasingly complex tech stacks within the context of acquisitions and company growth to ensure integration and resilience keeps pace and closes (not expands) security gaps. Companies should regularly test and update their BC/DR plans to adapt to evolving threats and business changes.
Lessons Learned for Future Outages
The recent IT outage serves as a stark reminder that technology risk is inherently business risk. Boards and C-suites can no longer afford to overlook procurement and IT operations. Risk management must be integrated into the core of business operations, with a focus on understanding and mitigating the potential impacts of technology failures.
Policymakers should also take note of these disruptions to study their effects on a real-world scale. The tempo of cascading effects and recovery times should prompt increased scrutiny of the most mission critical software. Whether the next outage is from malintent or malpractice, we all have now seen the devastation it can cause on a global scale.