- Business analysis
- Project management
IT Assessment - Production Stability Assessment
Harvard Partners was engaged to provide Incident Management training after the company experienced a major data center outage. While there, we observed they continued constantly battling smaller outages preventing them from tackling the bigger issues. Outages (incidents) typically involved over 75 people, including senior executives. Communication was not effective. Staff was tired and frustrated and looking for management to help them "get out of the trenches."
What We Did
Harvard Partners quickly performed a production stability assessment identifying and prioritizing remediation efforts yielding the greatest relief for the staff. We interviewed staff, held Rapid Envisioning Sessions (RES), analyzed Remedy tickets, and reviewed incident and change logs.
Our findings were reviewed with the division head, and a decision was made to expand the Incident Management process and add a Communications Officer as a way to reduce the number of people involved in each incident. This was followed by the creation of an Incident Review Committee and strengthening of the Change and Resource Management processes.
What We Achieved
Adding the Communications Officer created managed and targeted communications to non-involved staff, executives, and customers. Incident participation dropped from 75 people to only those directly involved in the incident. Staff found incidents easier to manage and customers noticed improved communication and commended the company on delivering better service.
Implementing an Incident Review meeting and setting a target of always keeping remediation projects at a 50% level (at any point in time, half of all incident remediation projects were finished) started to deal with root-cause issues and reduce the overall number of incidents.
Customers commented they were seeing improvements in stability and response.
Harvard Partners was asked to provide data center capacity planning assistance.
|Industry:||Cloud and Web Hosting|
|Impact:||Incidents were reduced, people's time was used more efficiently, core issues were being addressed, staff was happier, customers were happier, management was happier.|