- Process engineering
- Operational resilience
- Program management
Windows, AIX, Solaris, Dell, HP, IBM, Sun, Cisco, Avaya, EMC, NetApp, VMWare
Client was rapidly growing and had reached the inflection point where a disaster recovery data center was needed. When migrating to a multiple data center environment with a single technical staff, systems infrastructure complexities began surfacing as outages. Prior to undertaking an operational resilience effort, the organization needed to quickly regain control of the environment.
What We Did
Using best practices as the target and building on existing processes where sensible, we implemented a full incident management program. All systems were classified along with associated severities, notification protocols established, communications norms established (including conference bridges) and management oversight roles defined. The incident management process including an incident report with root cause analysis and after actions review. With the program designed, we trained all IT department members in their goals, roles and responsibilities during an incident.
Close collaboration with the client teams was achieved by sourcing external developers for administrative systems changes on-site. Activities of an Ireland-based firm internal development team were used to deliver the voice response system and web front-end.
What We Achieved
The client began using the process before the training was completed with immediate favorable results. Incidents were escalated to the appropriate parties on a timely basis, with the technical teams working on technical issues and management teams focusing on business impacts/notifications, change management and staff allocation. With root cause analysis and mitigation, improvements were achieved in the stability of the environment. Staff morale improved with a structured approach to problem diagnosis and resolution.
|Scope:||Corporate level exposure, with $160M annual IT spend|
|Impact:||Reduced total number of incidents by 75% after 6 months|
|Duration:||2 quarters to analyze, design, deploy, test and document|