Do you know the differences between MUICache, ShimCache, AMCache, and PMCache without the help of Google?
Did you know that one of them is made up?
You’re not alone if you didn’t. It’s hard to keep track of all of the artifact types that can be relevant for a forensics investigation.
As part of our efforts to make investigations easier, we avoid the above terms as much as possible. We instead use more simple terms, such as “Process” and “Inbound Logons”.
We call these higher-level concepts “Information Artifacts.” This post maps out why we did this and how you can use them too in your work.
At a high level:
- Data Artifacts represent data stored by an application on disk or in memory, such as Prefetch or MUICache.
- Information Artifacts represent one or more data artifacts that have the same meaning and are for the same event, such as a “Process.”
First Principles: Basic Computing Concepts
At the end of the day, all of us are investigating what happened on various types of computers, such as laptops, servers, or cell phones. To make investigations easier, it can help to focus on the core computing concepts.
There is no single list of core computing concepts, but I think of them as concepts that:
- Can be found in a computer science textbook or an operating system user manual
- Are not unique to any single operating system or application
- Answer a digital investigative question
This list will evolve, but here are a few examples:
- Operating System Configuration: All computers have an operating system with various settings, some of which are relevant during an investigation.
- Users: Nearly all operating systems have a concept of user accounts, which define who can access the system and what permissions they have.
- Processes: All computers run processes with the permissions of a given user.
- Network Connections: Process can connect to other hosts by making network connections
- Messages: Users can send messages to each other’s user protocols and applications.
The benefit of using these simple concepts is that:
- They are similar across all computers (Windows, Linux, MacOS, etc.).
- There are less than 20 of them (versus hundreds of other artifact types).
- We use them during our normal daily lives, and they become natural. We login, launch processes, connect to servers, etc.
You should be thinking about investigations with these high-level concepts in mind and not always worrying about Prefetch and Event ID 4688.
Data Artifacts Are About Observations
Artifact is a word frequently used in the DFIR space to represent data from a system being investigated. We use the term “data artifact” to refer to these items.
Data artifacts are data stored by the operating system or other process for its own purposes. The application observed something and recorded it. Digital investigators leverage that data for our own purposes. We call these data artifacts because it’s data that was written by some application.
For example, Prefetch exists because Microsoft wanted to track which processes recently ran so that it can preload data into memory. Event ID 4688 exists because audit settings cause Windows to record when a process launches.
When a process launches, there are other applications besides Prefetch that also record that launch. Each application may record its data at a slightly different time and use a different format. They are all ultimately different observations of the same event and mean the same thing: a process launched.
We define a data artifact based on two concepts:
- Data Structure: Where and how the data is stored. If the application moves the data or changes the format, it’s a new data artifact (or at least a new version of it).
- Meaning: What the data means. These map to the high-level computing concepts we previously discussed. Is it a process launch, user logon, or OS configuration?
Data artifacts are critical to an investigation because they will ultimately be the evidence that something happened, but they are not efficient to use:
- They have confusing and complicated names.
- They overlap each other when they are observations of the same event.
- The locations roll over at different times so sometimes sources may overlap, and sometimes they do not.
It’s not efficient for a human to de-dupe and process all of the lower-level data artifacts.
Information Artifacts Are About Meaning
Information artifacts are representations of one or more data artifacts that have the same computing meaning and are merged if they are for the same event.
For example, if we have data artifacts from Prefetch, Event ID 4688, UserAssist, and a list of currently running processes that all represent that a process launched at 1:34PM on Dec 2, 2024 at a given path, then we’d create a single Information Artifact of type “Process” with that time stamp and path.
An investigation tool that uses information artifacts must:
- Map a data artifact to at least one information artifact.
- Show the data artifacts used to define the information artifact (we call them “sources”).
The benefit of using Information Artifacts is that you do not need to de-dupe the overlapping data artifacts, and you can focus on the high-level concepts.
For example, if you want to investigate who had access to the system, focus on “Inbound Logins.” You do not need to look at the individual security, Local Session Manager, and Terminal Services event logs.
Cyber Triage Example
Here’s an example in our Cyber Triage application. It collects and parses dozens of data artifacts and represents them as information artifacts.
You can find the Information Artifacts on the left-hand side of the application, organized under headings based on user activity or processes.
Let’s say you want to see what processes ran in the past 30 days. Instead of looking at all the data artifacts representing processes, you can simply choose “Processes” and pick the Run Date filter for “Last 30 days.”
That will show you the unique combinations of executables and arguments.
If any of these are of interest to the investigation, we can look at the “Sources” tab to see where the data came from.
In this case, we can see it was from Prefetch:
Conclusion
Information Artifacts are a more efficient way to conduct investigations, especially for people who do not do DFIR daily. They allow you to identify what happened on a host and not get bogged down in terminology 99% of the population doesn’t know about.
If you want to do investigations at a higher level, try out Cyber Triage. You can get a 7-day eval from here.