Last night, as over 100 million people watched, the lights went out for 34 minutes at Superbowl 47 in New Orleans. The incident prompted many to share their “wit and wisdom” (sarcasm intended) via twitter. Some thought Beyonce’s hot show blew a fuse. Great thought; but why didn’t that happen during her act? Others joked that there was some type of New Orleans voodoo involved. These folks are taking the Bud Light commercials a bit too seriously. Still others thought this was a conspiracy. Their point: Obviously, someone rooting for the 49ers, who trailed 28-6 at that point, was looking to shift the momentum. It’s hard to argue with conspiracy theorists. If you believe there’s a dark plot behind an event, no amount of evidence will you convince you otherwise.
If we put aside the joking and social media banter, there’s a serious issue underlying this event. The Super Bowl is the crown jewel of American sporting events. It is a major source of revenue for the NFL and for broadcast networks (in this case CBS). The power outage, that stopped play and silenced the broadcast booth, was an embarrassment for the league, the network, and for the Superdome. It’s hard to know whether it resulted in any financial losses for any of these entities. But there is little question that it had the potential to do so. Had it lasted much longer, folks might have started to tune out, impacting ratings. Had there been a significant amount of time needed to fix the power (several hours for example), it might have been necessary to postpone the game.
The NFL commissioner, Roger Goodell, boldly stated after the game that this incident wouldn’t impact New Orleans’ chances of hosting another Super Bowl. But I’d be very surprised if the NFL doesn’t have strong expectations of a thorough postmortem and set of remediation actions from SMG, the operators of the Superdome.
So far, SMG, along with Entergy, the utility that supplies power to the Superdome, issued only a brief statement regarding the outage. They indicated that a piece of monitoring equipment sensed an “abnormality” and opened a breaker, causing the partial power cut. While I don’t have access to deeper information about the outage, or the power architecture of the Superdome, I am led in the direction of two possible conclusions. First, that the Superdome was not planned as a classic highly available facility. Alternatively, it has design or operational flaws that do not allow it to meet its planned goal of being a highly available facility.
In a modern world, where “power outage equals service outage”, mission critical facilities are designed with the highest levels of reliability. Data centers, health care facilities and 911 dispatching centers all share a common goal: The elimination of downtime due to utility abnormalities or failures of individual components within their infrastructure. When faced with an abnormality or component failure, the infrastructure is designed to reroute power instantaneously, with zero impact to services. While the underlying fault is resolved in the background, services continue, customers blissfully unaware of any problem.
The irony here is that the Superdome, while originally built in 1967, just underwent a $320 million renovation in 2011. This money was spent both to repair the damage done by Hurricane Katrina and to modernize the facility. It’s quite possible that the modernization did not include a goal of making the facility truly highly available. Given the rarity of mid-game power failures, it might not have been considered economically practical. But given the high profile nature of last night’s outage, the economic calculus may change.
Traditionally, the NFL has had particular expectations of Super Bowl host cities. They looked at stadium capacity, number of local hotel rooms and climate. In the wake of last night’s incident, I would not be surprised to see an additional criterion added: robustness of critical stadium infrastructure.
Let’s zoom out for a second and generalize some key findings from this event. The availability goals of a facility need to match its intended use. In order for this goal to be met, designers must understand requirements, and customers must understand capabilities (and limitations!). This goes for data centers, health care facilities or stadiums. A high school football stadium doesn’t warrant the extra expense of uninterpretable power. But an Olympic Stadium very well might. A conventional office building doesn’t warrant that level of investment. But a medical office doing surgical procedures inside that building certainly would.
Another key takeaway from yesterday’s incident involves testing and ongoing management of critical facilities. It’s one thing to design a facility for levels of availability that meet your customer’s requirements. It’s another thing to make sure that the design is realized. Any critical facility needs to be “commission tested” prior to “going live” and servicing an actual population of customers. This typically involves simulating different types of failures or abnormal conditions and ensuring that the infrastructure continues functioning as designed.
While commission testing is important, it’s also critical to have a program of ongoing testing. For 24/7 facilities, there me never be a chance to replicate all aspects of commission testing. To do so would require an outage or put the supported services in jeopardy. There are, however, other types of tests that can be done to ensure that the facility continues to meet its design goals as it ages, and as it is upgraded.
The Superdome was originally designed back in the mid-60′s, in a time predating the first Super Bowl. Back then, no one could have imagined it would host a game watched by over a third of the country, with advertisers paying $4 million for a 30 second commercial. Firms frequently face the same dynamic. Infrastructure designed for another era is used to support services with markedly different availability requirements. As outages are relatively uncommon, this deficiency can continue unnoticed for years, like an old, buried landmine. But given enough time, it will get “detonated”, “shutting off the lights”, and causing serious disruptions.