Some thoughts on Designing diagnostic escalation chains...
One of the major bones of contention in most service desks (and Incident management teams) is about separating the areas of accountability for different levels of escalation. People often get confused about which level exactly should be addressing a particular solution. Conflicts arise due to unclear demarcation of boundaries between an L1 and L2 issues, and what exactly will trigger escalation to L2, and such like.
Often, in training programs, learners ask about appropriate steps for designing an escalation chain for Incident Management or Service Desks. The two basic classifications simply talk about functional and hierarchical escalations. These two are merely nomenclature – i.e. they simply tell the names, but provide little guidance on how to set up good escalation chains.
Hence I am sharing my thoughts / experience here in the form of a K model. Of course, there is no such thing as a BEST DESIGN. I am presenting a hierarchical K model which I feel would serve most IT Service Management situations. Of course, other views are also possible, but, this one, is generic enough to be used as a guide in almost all situations. If reported properly, it would give a strong diagnostic view of the gaps in skill sets in managing infrastructures.
For escalating incidents, I propose the following factors for distinguishing between levels:
1. Level K1: Incident level: This level would cover the actual effort required to manage a particular incident. This effort would involve impact management and restoration of service levels.
2. Level K2: Systemic level: This level would involve identifying any systemic/systematic issues lurking in the system for which some investigation or root-cause analysis is required. ITIL Problem management process would map here.
3. Level K3: Architecture level: This level would identify and uncover any architecture or design issues or limitations which have to be addressed by changing the architecture of the system as a whole. Often, in the initial period, service levels are satisfactory because of the huge capacity margins installed in anticipation of volume increases. At these initial levels, even inadequately designed architectures deliver acceptable levels. But over a period of time, as volumes increase these architectural issues will present themselves, and it will be far beyond the capability of Incident / problem management teams to identify (and rectify!) these issues.
4. Level K4: Vendor level: This level identifies issues to be escalated to the vendors (Oracle, Microsoft, BMC, etc.) for fixes which can be corrected ONLY by vendors. Generally, this is likely to involve the next update cycle of the vendor, but in case of dedicated support situations, (like applications) fixes can have much faster turnaround times.
Notice that the different levels change based on the resolution approach. Every time the "WHAT NEEDS TO BE DONE" game changes, we have the next level in the model.
This K model of escalation, helps to understand the “WHAT is happening” part more than the “HOW to handle it” aspects. In this K model, escalation happens every time the game changes. Typically, it would be common to find two or more levels of “How to handle it” within each of these K levels.
Hope this helps to produce diagnostic management statistics about the stability and longevity of the infrastructure by producing trends on each K level. It also highlights the gaps in skill sets for managing a given infrastructure.
Comments welcome.
Friday, August 12, 2011
Subscribe to:
Posts (Atom)