Intro to ImpHash for DFIR: “Fuzzy” Malware Matching

June 10, 2024

The Cyber Triage 3.10 release introduced the ability to use ImpHash for malware scanning and correlation. Our goal with introducing ImpHash was to allow customers to get a sense of if an unknown binary is interesting and worth a deeper dive without having to upload file content to our malware service provider (Reversing Labs). We have had questions from users regarding what ImpHash is and how to interpret ImpHash results, so we are going to share more information about it. 

The goal of this blog is to give users a better understanding of ImpHash by discussing the following:

  • What is an ImpHash
  • Why ImpHash is Useful for DFIR
  • When to use ImpHash
  • Using ImpHash with Cyber Triage

We will have a follow up post in the coming weeks that will delve into some of the common issues users should be aware of when using ImpHash.

What is an ImpHash

ImpHash (import hash) is a hashing method for PE executable files, such as EXEs and DLLs, that allows you to make fuzzy matches. It allows you to easily identify executables that are similar, but not necessarily exactly the same. It was defined by Mandiant

The traditional way to fingerprint files is to use a cryptographic hash, such as MD5 or SHA256, on the entire file content. Changing any bit in the file results in a very different hash value. 

ImpHash focuses only on the executable’s import table, which contains information about the external functions and libraries used by the executable. Executables will have the same import table if they are variations of each other or compiled using the same basic build infrastructure. 

As a result, binaries do not need to be exact matches to match based on ImpHash. This allows investigators to find malware samples that are likely to be the same at their core but have been altered slightly over time.

The ImpHash computation process involves the following steps:

  1. Extract the DLL names and function names from the import table
  2. Lowercase the DLL and function names
  3. Concatenate the DLL and function name as a comma separated list while preserving the order in which they appear in the import table
  4. Calculate the MD5 hash of the resulting string

For a more detailed description of the ImpHashing process check out Mandiant’s post.

Why ImpHash is Useful for DFIR

The primary problem that ImpHash solves is that it is easy for an attacker to change a bit in a file, which will result in a different content-based cryptographic hash. Yet, the file still has the same malicious behavior. To a security analyst or tool, it is hard to know if this file with a never before seen hash value is good or bad.

ImpHash allows you to find similar files, which may or may not be derivatives of the one you are focusing on. It’s important to note that this fuzzy hash is not based on logic behavior or code sequences in the executable. It’s entirely based on what the executable declares that it depends on (which could be a lie) and the order that it is declared. 

When To Use ImpHash

ImpHash can be used in many ways. However, if one does not understand what ImpHash is and its pitfalls (which will be discussed more in our next blog post) then it may not be as effective as one hopes. The following are scenarios we believe ImpHash can best be leveraged:

  • Malware scanning and file upload restrictions
  • Hunting for malicious binaries
  • Threat intelligence sharing 
  • Tracking threat actor tooling

Malware Scanning and File Upload Restrictions

The main use case for ImpHash for DFIR is for malware analysis when the investigator can’t upload files to external analysis platforms, such as ReversingLabs or VirusTotal. 

An example of this is as follows: 

  • An alert is raised for a process executing from a suspicious location. An analyst performs a SHA256 hash lookup and finds nothing. 
  • Company policy dictates that files are not allowed to be uploaded to 3rd party services like VirusTotal and no on-prem solutions are available. 
  • Instead, an ImpHash lookup is performed and the service returns SHA256 hashes of other files that it has seen that have that ImpHash. 
  • Those hashes can then be looked up to see if they were good or bad. 

The above scenario illustrates the power of ImpHash. It allows for a quick triage of a file when normal hash based lookups fail without having to upload the file and potentially tipping off the threat actor.

Hunting For Malicious Binaries

Our second DFIR scenario where ImpHash can provide significant value is hunting malware. The primary advantage that ImpHash provides is that it can find matches in a network even when each host has a unique variation of the file. This can be illustrated in the following two scenarios. 

  1. An EDR alert goes off and malware is detected on a system within your environment. You grab its SHA256 hash, filename, and path and do a search across your systems to find if it’s spread. Nothing is found. You do an ImpHash search and get hits on 6 systems. Each instance of the malware has a unique file hash but is functionally identical. As a result, the SHA256 hash failed but ImpHash was able to find all of the instances.
  2. Your threat intelligence feed indicates that there is a new campaign targeting the energy sector. You note down the SHA256 and ImpHash IOCs. A week later you get several hits based on your ImpHash IOC but nothing for SHA256. This is because the threat actor had unique binaries made for each targeted organization, however, the tools that were used were the same ones mentioned in the original report.

Threat Intelligence Sharing

ImpHash facilitates sharing of malware intelligence among DFIR teams and organizations. Instead of exchanging actual malware samples, which may be sensitive or restricted, professionals can share ImpHash values to identify common threats. Furthermore, sharing just file hashes as an IOC may not be helpful in situations where targeted attacks are producing unique tool binaries for each target. Adding an ImpHash to the report allows others to find those same tools in their environment regardless of the minor binary changes made. 

An example of this might look like the following: A threat actor launches a campaign against the education sector where each target is attacked using the same set of tools but have unique file hashes. One organization discovers the attack and puts out some of the IOCs for both file hash and ImpHash. Other educational institutions see the intel report and start looking for the IOCs. Organizations that only focused on the file hashes failed to find anything but the few that looked at ImpHash were able to find hits within their network and perform remediation. 

Tracking Threat Actor Tooling

Lastly, ImpHash can be used to help track tools used by various threat actor groups. This was one of the use cases Mandient first wrote about in its ImpHash blog to help track backdoors used by various groups. The advantage comes back to the fact that ImpHash can find similar binaries even though they are no longer exact matches. Threat groups make modifications to binaries as new IOCs are shared, add new functionality, or create targeted binaries for each target. As long as the import table stays the same, research will be able to track the binaries and group them together as the same tool.

Using ImpHash with Cyber Triage

There are four places in Cyber Triage where you will benefit from ImpHash: 

  • Collecting ImpHashes
  • Malware Scanning
  • File Correlation and Other Occurrences Viewer
  • Searching

Collecting ImpHashes

The Cyber Triage Collector adaptive collection tool will now always collect ImpHash values for PE files. Previously, the collector would report MD5, SHA1, and SHA256 hash values only for files that were thought to be interesting and relevant to a system along with any custom file collection rules set up by a user. 

In 3.10 we now additionally report the ImpHash for any PE files. This is true even if you specify to the Collector that you do not want to save file content. 

Malware scanning 

Cyber Triage malware scanning integration with Reversing Labs previously worked by first checking NSRL hits, then doing hash lookups to Reversing Labs, and finally a file upload for analysis if the user enabled file upload and the hash was unknown. 

The issue that many customers ran into was that file upload was not an option due to security and privacy policies and regulations. This meant that customers missed out on getting any additional insight on files that had unknown hash values.

With ImpHash these customers can now request an ImpHash lookup in place of a file upload to get some sort of context about the file without having to share its content. The result will show up to 50 related SHA1 hashes that match based on the provided ImpHash. 

We will score the file as suspicious if at least 5 of the related SHA1 hashes are known to be bad. More details on how to use ImpHash in malware scanning can be found in our 3.10 release blog

File Correlation and Other Occurrences Viewer

We also integrated ImpHash into our file correlation. This allows you to view if another file is found on any of your ingested hosts with the same ImpHash.

In the below example we have a file (adfr.exe) that has a scheduled task artifact and a process artifact. The artifacts are flagged because they ran from unusual locations and the file is unsigned. Unfortunately this hash is unknown and we are unable to perform any malware analysis.

We can check the “Other Occurrences” tab  to see if this file has been seen anywhere else within our incident to get more context. An exact match for a file is based on its SHA256 hash while a partial match is based on a file’s ImpHash or file path. We can see that two other hosts have a hit for this ImpHash with an item score of bad.

 

If we jump to one of those hosts we can see that the file is identified as a Meterpreter backdoor which has a different file hash but a matching ImpHash. The “File Details” tab has the various hashes of the file on the other system:

And the Malware Scan Results tab has the conclusion from the various engines:

Searching

Both our host level and Incident level searches have been updated to include ImpHash as a searchable field. This allows our customers to quickly find any items within their incidents or hosts that have a matching ImpHash. From there they can dig deeper to see if the match is something they are actually interested in or an unrelated match (which we will discuss more in our follow up blog). 

Using our previous example for the Meterpreter backdoor we can search for the ImpHash “b4c6fff030479aa3b12625be67bf4914” and find all items and all hosts that have a matching ImpHash.

Conclusion

In summary, ImpHashing is a method of fingerprinting a binary based on its import table. This has an advantage over traditional hashing as it allows for matching of  binaries that are functionally or near functionally identical but are not exact copies of each other. ImpHashing can be useful for threat intel sharing, tracking threat actor tooling, and hunting down ever changing malware. 

If you are looking to speed up your investigations with features like ImpHash, then try Cyber Triage. You can download a free one week trial version here: https://www.cybertriage.com/download-eval/