Skip to main content

Keeping Mission-Critical Systems Online

Userful’s control room video wall helps make critical decisions to ensure efficient operation of public transport in the city of Malaga, Spain.
Userful’s control room video wall helps make critical decisions to ensure efficient operation of public transport in the city of Malaga, Spain. (Image credit: Userful)

The Montgomery County Emergency Operations Center (EOC) in Conroe, TX serves as a centralized hub for data analysis and the management of response efforts related to large-scale events and natural disasters. The facility—which serves as a resource for government agencies—recently upgraded its AV/IT systems with the deployment of a Christie Terra SDVoE (Software Defined Video over Ethernet) solution, which enables signal distribution to any displays in the EOC, via 10Gb Ethernet.

Chris Lawson, regional vice president at Conference Technologies out of St. Louis, MO, the AV design and integration firm charged with deploying the solution, said that prior to this project, the Montgomery EOC was equipped with a fixed chassis system that limited the number of sources and options for distributing information. “It was important for them to be able to add sources and displays, and push information to them in real time with low latency in any arrangement of video and data required,” he said. “Choosing Terra enabled that.”

While scalability and flexibility are high priorities for anyone charged with deploying AV support for mission-critical facilities, so too is the question of uptime. When one considers that a two-hour period of internet downtime can wreak significant havoc in a small to mid-sized business, it can be downright catastrophic in organizations responsible for dealing with situations that come down to a matter of life and death. But how can tech managers guarantee that their mission-critical systems will indeed work 24/7?

Sumanth Rayancha

Sumanth Rayancha (Image credit: PepperDash Technology Corp.)

For Sumanth Rayancha, CTO at PepperDash Technology Corp., an enterprise AV solutions provider headquartered in Salem, MA, it starts with accepting that they won’t. “Everybody on the team has to understand that you define the goal as the system never going down, but the reality is, at some point the system will go down,” he said. And, he added, the general rule is that system failures will occur when one least expects them to. “The first part is, regardless of what you’re doing from a technical standpoint, you have to be operationally prepared for the downtime.”

The key word here, Rayancha underlined, is “operationally”—it’s not enough to have all the right equipment in place unless those operating it have prepared for an inevitable failure. “You can’t just activate the room and [only] use the room when you have an emergency,” he said. Instead, mission-critical operations center teams should hold drills that simulate failures and how to best respond to them. “You need to be drilling, you need to be preparing for failure.” He said that in his experience, the two industries that have the most experience with this approach are broadcast (where dead air equals disaster), and the military (whose job is to deal with disaster).

Of course, it’s also necessary to select the right equipment for the job. “What we’ve seen from time to time is that people stick with what they know, and what they know depending on who the team is—and that might be conference rooms,” Rayancha said. “Typical conference room equipment might be good for a lot of circumstances, but it might not be the right product for something that’s really meant to be 24/7 mission critical.” 

Daniel Griffin

Daniel Griffin (Image credit: Userful)

This means that not only the technology has to be high-performing; so too does the manufacturer’s support team. While IT managers are accustomed to purchasing service-level contracts that include benefits such as parts replacement within two hours, this isn’t always possible with AV equipment. The solution: having cold spares on hand. “You figure out what has the heaviest hitting impact if you have a failure, keep spares around, and have your team trained on replacement,” Rayancha said.

As is the case with conducting drills, it’s also necessary to perform regular preventative maintenance to keep things up and running. “The other problem that we’ve run into is where the customer builds out the system and then the system sits there,” Rayancha said. In this scenario, the technology is performing well during relatively calm periods, but when “the customer has a real emergency and then they discover a problem, processes and procedures aren’t set up to actually work through the room properly.” Regular testing and maintenance contribute to preventing this situation.

Daniel Griffin is vice president of marketing at Userful, a visual networking platform developer headquartered in San Ramon, CA. He argues that his company’s software-based solution contributes to maintaining uptime because it’s built on the same standards that tech managers are used to working with—thus, keeping things simple. “What needs to be paid more attention is the human side,” he said. “What happens when you’ve made things so complicated that only one engineer understands how your video wall works? Keep this standards-based, simple, and intuitive, and operating on the infrastructure every other IT person in the world is accustomed to.” If the tech is 100-percent redundant, but only a limited number of team members know how to operate it, eventually there will be problems, he added.

Howard Nunes

Howard Nunes (Image credit: PepperDash Technology Corp)

Another issue that exists in mission-critical deployments is the tendency to over-engineer solutions in order to compensate for potential failures, Rayancha noted. “The problem with that is eventually, your engineered solutions for failure are more likely to cause a failure than to actually help you through one,” he said. One-hundred percent redundancy is one way people try to address failure, which isn’t necessarily a good approach when it comes to AV. “Part of the problem is, a lot of the equipment that we deal with isn’t designed for redundancy—not for true, active/active fail-over scenarios,” Rayancha said. “People will try to engineer their way around one, but there’s a reason [these solutions] don’t always exist off the shelf—they’re difficult to build.” Sometimes, he added, the best workaround is simpler: move the entire team to another equally equipped mission-critical space down the hall. 

At PepperDash—a technology firm, after all—tech obviously plays a key role in mission-critical deployments, but it’s not necessarily the silver bullet. “It comes back to the human aspect of it, especially in circumstances where you have lives on the line,” Rayancha said. “If you’re not mentally prepared for that, the technology is not going to get you out of that situation.”

One of the best ways to ensure uptime (almost) all the time is to drill team members on what to do when the system fails. Usually, this starts with defining likely failure scenarios, and then creating run books that detail things like who is responsible for what; what must be visible in the room at all times; and what outside sources—such as teleconferences—are implicated. Then, the AV/IT manager activates the failure scenario, and team members run through simulations.

Rayancha uses an example from the oil and gas sector: “They have specific emergencies that they know they’ll have to deal with—it could be a natural disaster, it could be drill rig problems—and they simulate it,” he said. “They run through: this is happening here—what do you do? And the operators in the room have to keep it together and do what they’re supposed to do, which could be: call this person up and get a video feed so we know what’s going on from here.”

“It’s basically scenario or business case-based,” said Howard Nunes, CEO at PepperDash. “And the concept of the run book is a very important one, where you define ahead of time what the likely scenarios are that you’ll have to face, and how you would address them, and then run drills against that. And the drill may not be perfect—your predictions may not be perfect—but you’re at least somewhat prepared.”