The reality of IT support is that engineers cannot avoid downtime. No matter how responsible managers are in ensuring regular maintenance and repair, incidents will happen. Sites will fill. Servers will fill up. APIs will fail. When these incidents do occur, it is important that IT teams are well trained and have the necessary equipment to ensure a rapid incident response.
However, incident response is not as easy as simply creating a check list for teams to follow. When incidents occur, there are often conflicting priorities between restoring availability and investigating the causes of the incident. For example, Security incident response teams and infrastructure teams operate with different sets of assumptions and priorities when resolving issues. If these separate priorities are not effectively managed before-hand, there can lead to the duplication of work, delays in handoffs, and faulty results.