In Part 1 of this blog series, we highlighted the benefits of CrowdStrike’s investigative approach and the CrowdStrike Falcon® Real Time Response capabilities for avoiding a significant incident in the first place, and minimizing the damage should an attacker gain entry into your environment. We also explored a range of governance and process-oriented steps that are often left out of technology-centric discussions on incident response preparedness.
In this post, we cover containment and recovery considerations in the context of large-scale enterprise remediation efforts. MOXFIVE’s lessons learned in this area are based on responding to over 700 incidents across a range of industry sectors and company sizes.
Before beginning containment, the incident response team should intentionally define a strategy based on factors such as the known impact to the environment, the threat actor’s likely motivation, and experts’ experience with this or similar threat actors in other environments.
If it’s likely that the attacker desires to maintain a long-term presence in the environment, and the scope of the attacker’s access is not well understood, responders should avoid piecemeal containment actions. To avoid a lengthy “whack-a-mole” game in those situations, it is crucial to first understand all of the mechanisms by which the attacker maintains persistence and execute a package of containment actions simultaneously.
For many of today’s attacks, particularly those with obvious destructive impacts, it’s wise to immediately begin containment and recovery activities in parallel. Simply stated, the goal of containment is to deny the attacker access to the environment. While there is a variety of means to accomplish this, some of the most important activities to consider are listed below.
It’s important to strike the right balance between the sometimes competing goals of preserving evidence and containing the threat. The most effective way to do this is to establish close coordination between the investigation and containment teams. By working from a single source of truth related to system status, the investigation team can flag systems for immediate triage data collection and be included in decisions that may affect evidence. Keep in mind that the evidence referred to here is primarily going to be used by the investigation team to fully understand the intrusion and its impact — it’s rare that such evidence is used in criminal legal proceedings. If you’re preserving forensic images of every system that may have been accessed by an attacker, you’re likely going above and beyond what is truly needed.
When planning containment, focus on measures that are immediately disruptive to the attacker and are relatively low effort. Beyond those immediate containment measures, resist the urge to implement additional enhancements until business operations have substantially been restored.
Think of this as triaging an injured accident victim. One of the most immediate concerns is to remove them from the overturned vehicle and stabilize them for treatment. While they may benefit from additional treatment en route to the hospital, the key is getting them safely on their way.
The objective of recovery is to restore the business to a pre-incident level of function. Success directly correlates with how closely the IT teams align with their business colleagues and the degree to which business priority is infused into all recovery activities. The remainder of this section is written from the perspective of a typical ransomware scenario, where the victim organization’s systems are widely disrupted after being encrypted with ransomware.
During lengthy, high-pressure projects, you may hear the adage “it’s a marathon, not a sprint” used to emphasize setting a pace that can be sustained over the long haul. While generally apt, that analogy does not precisely fit enterprise incident response, which does indeed require periods of “sprint” among the steady pace of the “marathon.” If you’ve never been through such an incident, the unknown factor is unsettling. Speaking of running analogies, if we are about to run a race, what does the course look like?
Recovery efforts follow a predictable pattern of activity, as illustrated in Figure 1, which is based on MOXFIVE’s experience assisting large enterprises to recover from ransomware attacks. This chart depicts days across the X axis — a typical large enterprise remediation effort lasts around 80 calendar days. The Y axis depicts the relative level of effort for the incident management and engineering functions, with 10 being maximum. (Note that the raw levels of effort will vary between environments and incidents, with engineering requiring many more resources in terms of person-hours than incident management.)
Note the following when reviewing the chart:
Now onward to the nuts and bolts of recovery. In a ransomware scenario, three key considerations when developing the initial recovery plan are:
With the rubber about to meet the road, ensure that engineering teams operate with a shared understanding of the environment. As discussed in the planning section (Part One), this system tracker is a single source of truth and is critical to running an effective recovery. In the absence of a robust asset management system, a shared spreadsheet hosted in a collaboration environment will suffice. Every server and end-user system should have a row, and then populate the initial version with your best asset management snapshot from prior to the incident. If there isn’t an asset list, an Active Directory computers list can provide a rough substitute for Windows systems. It should have at least the following fields:
For optimal usage with a large team, put together a tutorial on how engineers and project managers should update it. (If using a spreadsheet, use picklists to normalize available choices for as many fields as possible to keep the data clean and usable.)
These major sub-workstreams typically comprise the core of the recovery effort:
When starting restoration from backups, have the most seasoned engineers develop detailed work instructions that describe the process in detail, including updating the tracking document at key points so others will know that system’s status.
Identify tasks that can be done in parallel so more engineers can be added to speed up progress where possible. This is not always possible, however. For example, if available bandwidth only supports pulling down two virtual system images from cloud backup simultaneously, adding 10 more engineers to this process will only add confusion.
Include phases in which a restored server is handed off to application teams for configuration and validation, since server engineers may not have application-specific knowledge. The final validation phase should include signoff by the business users. Consider breaking down the planning by functional group, tied to priority, and then further broken down by operating system, which may determine which teams are involved.
For server decryption, create detailed work instructions as done for restorations. Plan to run the decryption tool on a copy of the image rather than the encrypted system or files themselves, wherever possible. In some cases, the tool may encounter errors, so it’s important to be able to have the original available to make another copy and try again. Most organizations need additional storage capacity at this stage.
The approach for end-user systems will differ from servers because it is more efficient to reimage or replace encrypted workstations and laptops.
Systems that are functional but contain some remnants of the attack should have a next-generation cybersecurity platform like CrowdStrike Falcon installed and should be cleaned up — there’s no need to reimage or replace these systems. These systems should be recovered remotely, using real-time response capabilities that can perform clean up actions directly on the compromised systems. Systems that are rendered unusable due to encryption will need to be reimaged. This process will differ between users in offices versus those that are remote. For offices, a common approach is to set up an imaging server and then mass reimage systems site by site. Remote users are more complicated, typically needing IT-managed system swaps.
When considering purchasing replacement end-user system hardware, which can sometimes be attractive from the perspective of minimizing delays, be prepared that your cyber insurance policy may not cover these purchases.
Common pitfalls related to recovery include those below. We strongly recommend solving these challenges prior to the stressful days of an incident.
Without an accurate system inventory, your responses to these questions will be a moving target.
Many organizations execute a range of preparation activities that focus on the response components of incident response — for example, red/purple team tests of security controls, incident tabletop simulations, and after-action reviews. When it comes to the recovery part of incident response, however, few organizations proactively prepare. Consider adding the below activities to your IT and security plan.
MOXFIVE provides the clarity and peace of mind needed for attack victims during the incident response process. Our platform approach enables victims of attacks to work with a Technical Advisor who provides the expertise and guidance needed in a time of crisis, and facilitates the delivery of all technical needs required, consistently and efficiently.
Learn MoreWith experience on the front lines responding to incidents daily, MOXFIVE Technical Advisors have the unique ability to connect the dots between business, information technology, and security objectives to help you quickly identify the gaps and build a more resilient environment.