As we’ve talked about many times before on this blog, speed is of the essence during incident response and endpoint triage. To get quick answers, digital forensics and incident response (DFIR) tools can’t collect or analyze all data. They need to make decisions and focus on the data most likely to be relevant. Sometimes incident response tools don’t collect an arbitrary file, and you later realize that it could be relevant. The latest Cyber Triage release (2.8) adds, among other features, the ability to go back to a live host and collect additional files.
This blog post talks about how this feature is useful and how to use it.
Precision and Recall in Incident Response
Let’s start with a concrete example.
An intruder has installed malware and various tools in an AppData folder. Some files in that folder are referred to in the registry as a persistence mechanism (RunOnce or a scheduled task for example). There are other configuration files in that same folder that may contain passwords, IP addresses, or other relevant data. The responder will likely want to see everything in that folder.
Ideally, all of those files are collected during the initial response. But, not all collection tools are capable of that. Each has its own way of determining what to collect. For example:
- A simple tool maybe hardcoded to collect all files from only C:WindowsSystem32. None of the relevant files in the previous example would be collected.
- A more advanced tool will parse registry keys and configuration files to identify data to collect (from any folder). The startup items in the previous example would be collected but perhaps not the configuration files.
- An even more advanced tool will use static and dynamic analysis techniques to reverse engineer the executables to find all of their dependencies and data they use. This may collect all items in the previous example.
- A continuous monitoring system may be watching process and file creation actions and focus on them. This may collect all items in the previous example.
Often, the initial endpoint collection does not include all of the evidence, and it can include non-relevant data.
Balancing Precision and Recall
This problem is not unique to DFIR. The information retrieval community refers to this as Precision and Recall:
- Precision: Refers to the percentage of results that were relevant. If a lot of non-relevant data was collected (i.e. data not related to an incident), then we have low precision.
- Recall: Refers to the percentage of the relevant items that were collected. If we get only a small number of the total items from the incident, then we have low recall.
Ideally, we want high precision (all of the results are evidence) and high recall (we get all of the evidence).
But, often to get high recall (all of the evidence) tools will provide low precision (lots of non-evidence). This forces the investigator to sift through a lot of non-relevant data.
Triage is all about getting results quickly and that often means focusing on high precision to make sure that the responder spends their time reviewing only relevant data.
With high precision, often comes low recall and not all of the evidence is collected. That’s where the need to go back and get more data comes into play.
Once they’ve decided the endpoint is worth the time, the responder needs to go back and get that relevant evidence.
Going Back for Seconds with Cyber Triage
To save time, Cyber Triage collects file content only for files that it thinks are suspicious. It collects metadata for all files, but it doesn’t waste time reading and transmitting every file.
You may see a reference to a file that was not collected. This most often happens when you’re in the timeline or file explorer views. For example, you may see a file in the timeline that was created before some other suspicious activity. Or, you may see other files in a suspicious folder while in the file explorer view.
To get the content for that file, simply right click on the file and choose “Collect File Content.”
This works if Cyber Triage still has access to the endpoint data. Namely, if the initial collection was performed using:
- “Live – Automatic” and the host is still online. In this scenario, PsExec will be used again to get the file and send it back to Cyber Triage over the network.
- “Disk Image.” In this scenario, Cyber Triage will open the disk image and extract the file.
If the initial collection was performed from a USB drive, then Cyber Triage does not have access to the endpoint to get more data.
Once the file is collected, it will be analyzed as though it was collected during the initial pass. This includes malware analysis, Yara rules, and other analytics.
Try it Out
If you want the ability to remotely collect endpoint data and go back for more files, then try out Cyber Triage.
You can get a free evaluation download of Cyber Triage Standard from here.