Data Center World 2024
Lessons Learned from a Data Center Incident
James Monek (Director, Technology Infrastructure & Operations, Lehigh University)
Pass Type: AFCOM Solution Provider, All Access Conference, Industry Conference, Standard Conference - Get your pass now!
Track: Design, Build, Operate, Control
Session Type: Conference Session
Vault Recording: TBD
Audience Level: All Audiences
What would you do if you received a phone call at 5am in the morning saying there is a fire in the data center (besides thinking you just woke up from a nightmare and considering going back to sleep)? While our incident wasn't that extreme, it was enough to trigger an emergency power down and halon system dumping of halon gas to protect the data center.
During this presentation, we'll cover the timeline, how we used our incident management and BCP/DR processes, the rapid response we administered to get all services back online, and the retrospectives that occurred afterwards. We'll explore the surprises that we encountered, lessons learned, and how well-prepared teams can work together under immense pressure.
Takeaway
- Tips on handling a major data center incident
- Closer look at how various processes, such as incident management, BCP/DR, and retrospectives, are crucial to data center operations
- Learn from our incident to strengthen your data center resiliency, redundancy, and operations