Incident Response: How to Conduct a Post-Mortem
A security incident post-mortem is an opportunity for improvement. A good post-mortem will highlight what went wrong — as well as what went right. The largest benefit may be learning from past experiences. It is also an opportunity to prevent future mistakes.
An incident postmortem brings people together to discuss the details of an incident: why it happened, its impact, what actions were taken to mitigate it and resolve it, and what should be done to prevent it from happening again. When the right people are brought to the table, processes and procedures can also be drafted in an attempt to prevent issues that have not happened yet.
Avoiding in Incident Post-Mortem
There are many things that come into play to help prevent or avoid the need for a post-mortem. Inevitably they need to happen because an incident has occurred that requires teams to work together to address in a more long term manner.
Avoiding them requires the right type of atmosphere. Openness and honesty is a virtue of each employee, but the workplace has to set the tone that allows for such. Employees should not be punished for mistakes unless they are frequent and negligent. Punishing them tends to lead to people trying to hide mistakes and a company culture of doing that. These mistakes can lead to incidents.
From a technical perspective, version control and change management go hand in hand to help quickly reverse a change that caused an incident. Storing configuration changes and code in version control allows you to easily review and rollback code. On the other hand, Change management helps to catch issues before they are put into production. It also lends a hand in the planning of rollback as every good change management program requires every change to have a rollback plan. Sometimes these rollback plans are never intended to be used. However, it is easier to plan for them ahead of time than the moment you need it.
Automated deployments also help avoid the manual errors when trying to avoid code or app related incidents. The automated deployments can also make it easier to roll back once your organization has attained the status of being able to do automated deployments.
Having feature flags in the code may allow you to dynamically enable and disable new functionality and features without the need to redeploy code. This is not always possible but when it is it allows much shorter rollback time as it can be done on the fly.
Benefits of an Incident Post-Mortem Analysis
Inevitably every organization will have some sort of post-mortem analysis. This does not mean the organization failed. It does mean that a process, procedure or in some cases teams of people are not working effectively and a post-mortem needs to be called.
These can be a great opportunity for organizations to learn from the mistake. Plan out how to avoid similar related issues that may not have even happened yet and work on a go forward plan.
Perhaps the issue was not a complete failure and some things went right. This is an opportunity to discuss those positive notes and reaffirm policies and procedures that worked for this incident. Sometimes when that is neglected, employees may not believe they are working or why they are being done and forget to follow them in future incidents.
Incident post-mortems can also help build customer trust as the output of a post-mortem can be disclosed to clients. They may be frustrated that the incident happened but knowing the details of the incident and what is going to be done to help prevent it going forward can go a long way. Transparency can go a long way with gaining customer trust and retaining it.
Best Practices for an Incident Post-Mortem Analysis
The first best practice for an incident post-mortem analysis is to have a policy or procedure on what triggers such a meeting. That policy or procedure should also indicate the person responsible for calling the meeting and organizing it. It may be the Chief Information Security Officer (CISO) or the Chief Technology Officer (CTO) in many organizations.
Timing of this is important. It should be called as quickly as possible while the issue is still fresh. At the same time, you also want the people involved to be in the right frame of mind. For example, if the incident occurs over the weekend but staff is not usually accessible over the weekend, Monday may be the best fit for the post-mortem. If a post-mortem cannot happen immediately, notes should be taken by those involved in the resolution so it can be properly discussed.
Touching on the work atmosphere, this post-mortem should be blameless. People are human and do make mistakes. An open atmosphere should be provided so that people feel comfortable coming forward with any mistakes they have made. Focus on the issue and the breakdown and not necessarily the person that caused it. Constructive criticism should be given but it should firmly be constructive and with the goal of assisting someone that needed it. It is important to focus on the lessons learned. A mistake one person made may be a mistake another person never has to make from this lesson learned session.
The main desire and goal should always be to gain understanding of the event first. Then try to address that issue if it is correctable. An issue cannot be properly addressed if it is not fully and adequately understood.
This process should be an open door process in terms of allowing discussions of security issues. This may be the first time certain people have a voice. They may have had concerns for some time but no platform to voice it. Not every concern needs to be addressed but it should be heard at least once and documented.
Incident Post-Mortem Documentation
When dealing with the incident, investigate whether the security response plans need to be updated. Is the threat intelligence sufficient? Are better tools needed? Are the preventative measures adequate or can they be tuned?
Documentation is key. Format a template that captures details important to the business and the process. This will help keep the documentation on point and avoid tangents. Any of those tangents may need to be investigated but separate meetings and discussions should be scheduled for them. Post-mortems always tend to bring up those tangents. They can be very important, even if not related to the current incident. Set up a follow-up meeting dedicated to those issues.
Someone should be tasked with a continual review of the post-mortems. This can be a review board or an incident manager. It can be easy to get into a rut of having post-mortems but the same issues keep coming up. Someone needs to oversee that to ensure issues are not recurring and that the process is working.
In any environment, a post-mortem happens because something broke down. The post-mortem seeks to understand that breakdown and bring the right people to the table to allow that to happen. An effective post-mortem analysis not only helps prevent the past issue from occurring but bears fruit of helping to prevent other incidents as well. It helps to promote teamwork by allowing the team to openly discuss the issue in a manner where blame is not placed. The focus is on finding solutions and lessons learned.
Many times, things go right and a post-mortem should be used to solidify those procedures and policies that worked. Many times people see the need for a post-mortem as a failure. They may want to try to avoid having one but in today's world of data breaches and strict Service Level Agreements to have them. They are better viewed as a process of continual improvement and transparency to clients.