For IT teams, dealing with critical incidents is like a fire department’s response to a three-alarm blaze. Both need to have an immediate response from the right people with the right tools, so damage can be limited and the surroundings protected. Both can see what’s happening, but they need to determine what’s at the heart of the problem to properly frame their response. Then, once the crisis is addressed, both need to figure out how the incident happened so they can keep it from happening again.
Whether it’s a customer-facing e-commerce system that’s gone offline, a network that has stopped connecting users, or a natural disaster like a severe storm that has rendered part of the system unusable, the response is generally predictable. The right people come together in some kind of conference setting to identify the problem, devise a solution, and implement and test that solution. They need to keep stakeholders apprised throughout the process. Finally, they need to provide reports so top management understands what happened and how it was fixed.
To help our clients respond more quickly, Brightworks recently deployed a new platform that digitally transforms the critical incident management process. Exigence uses advanced technology to address every aspect of the response to an incident to add clarity, structure, and speed. The platform coordinates all stakeholders and systems, manages workflows, simplifies the port-mortem analysis, and feeds lessons learned into the plan for the next incident.
Exigence provides a virtual war room for dealing with major incident management. The platform integrates with existing IT monitoring systems and then uses automation to manage the response. Users identify the specific conditions that constitute a critical incident because they’re different for every company. Suppose the user is an e-commerce company that determines being offline for longer than 10 minutes represents a critical incident.
Once the monitoring determines an outage has reached the 10-minute mark, Exigence automatically launches a response. The platform establishes a war room for the incident and invites a predetermined list of people into that room. It also maintains a timeline of every action related to the response, starting with the detection of the critical incident. Exigence generates a record of who was notified, when those notifications were issued, and when each participant entered the war room. If the participants determine a need for additional help, such as a vendor, the platform issues invitations. As the participants work on the solution, the platform continues to collect updates from the system and can track conversations in chat systems like Slack.
Another important feature of the Exigence platform is a status page. As work proceeds in the war room, the participants can share updates through the status page with key stakeholders so they can keep up with what’s happening with no need to interrupt the team with calls or emails.
After the incident has been resolved, the team needs to perform a root-cause analysis. Because Exigence documents every step of the process with a timeline, it’s much easier to review what happened and how efficient the response was. With a couple of clicks, the platform generates executive incident management reporting, summarizing the length of the incident, who was involved, how long it took them to resolve it, and what was learned from the root-cause analysis.
In reviewing the analysis, the team can adjust how Exigence will respond to similar incidents in the future. For example, the team may determine that a specific participant or vendor needs to be added to the initial notification.
Exigence is a powerful tool that will improve any company’s management of critical incidents, which is why we decided to make it available to Brightworks clients. It’s another example of how our team finds ways to streamline and simplify operations and processes for their internal teams.