While there are many tools and resources for performing automated dynamic analysis on malware samples there are few that focus on automation of static analysis. At Shmoocon this year we were please to find that there is a project focused on this specifically called MASTIFF.
Created by Tyler Hudak (@SecShoggoth) MASTIFF aims to be the framework for automating static analysis. To learn more about this project please visit the SourceForge page and their blog. Over at TekDefense I demoed utilizing MASTIFF with Maltrieve via video format. If interested please check out TekTip ep 23.
Of course the largest news story of last week was Mandiant’s report on APT1. As Mandiant released IOCs along with this report, users quickly discovered that many of the hashes mentioned in the report were samples available on VirusShare. VirusShare was nice enough to put out a torrent that has 281 samples matching APT1 hashes.
A better use case could not present itself. Let’s show how efficient MASTIFF is at performing static analysis on a large number of samples. If you would like to play along at home download the samples from VirusShare. If you don’t have an account email VirusShare to request one.
To appreciate what MASTIFF does for us, you must understand the manual process of basic static analysis first. Typical analysts will hash a sample first, then run a file identification tool against it, and then based on the file type the analyst will run tools specific to that file against it. As you can probably imagine, this can take time. For 200 samples this could take hours, perhaps days if done manually.
I am not going to get into how to install and configure MASTIFF as there are other documents and articles that cover that. I will instead show you how to run MASTIFF against the APT1 samples and review the results. With all the APT1 samples downloaded and extracted to a directory (I used /opt/malware/), you can simply run:
sudo mas.py filename
This will run MASTIFF against a single sample. MASTIFF currently (v0.5) does not natively run against more than one sample at a time. So while you could run MASTIFF individually for each of the almost 300 samples, I wouldn’t recommend it. What I did is created a quick python script that would run mas.py against any file in a specific directory, in my case /opt/malware/.
Here is the script:
#!/usr/bin/python import os # MASTIFF Autorun # @TekDefense # www.TekDefense.com # Quick script to autorun samples from maltrieve to MASTIFF malwarePath = '/opt/malware/' for r, d, f in os.walk(malwarePath): for files in f: malware = malwarePath + files print malware os.system ('mas.py' + ' ' + malware)
In order for this to run you must have this script in the same directory as MASTIFF’s mas.py. This will now run MASTIFF against all of the files in that directory. For me it took MASTIFF 4 minutes and 50 seconds to churn though all 281 samples.
Pro-Tip: In your MASTIFF config set the zip password to infected to have MASTIFF auto extract most shared malware samples. Also when working on a project like this, give yourself a separate work log directory than your normal use in order to keep yourself more organized.
With all of these samples run through MASTIFF, get yourself acquainted with the samples by looking through the sqllite database (mastiff.db by default) with your favorite sqllite manager. I prefer to use the Firefox plugin “SQLite Manger.” In this database there will be two main tables “files” and “mastiff”. The files table will have information about the path, size, and frequency of the samples. The mastiff table on the other hand will show you the hashes, file type, and fuzzy hash. Looking through the file types you can start to get a feel for how this group (APT1) packages their malware.
Running the following SQL Query will show a count of the file types used:
SELECT type, count(*) AS total FROM mastiff GROUP BY type ORDER BY total
The results of this query show that the majority of the samples are simply PE32 standard executables, with a small amount being archives or containing archives.
"['Generic', 'EXE']","274" "['Generic', 'EXE', 'ZIP']","7" "['Generic', 'ZIP']","1"
With a better understanding of what we are looking at let’s take a look at the analysis results for one of the samples. For those still playing at home I will be looking at eef80511aa490b2168ed4c9fa5eafef0. The results file for this sample contains the following files:
- fuzzy.txt: This will show us the fuzzy hash of the sample as well as tell us how close of a match it is to other samples we have scanned.
- mastiff.log: This will show a log of mastiff running. Any errors that occurred during the scan will show here.
- mastiff-run.config: This is a copy of the config file from mastiff that it used to scan the sample.
- peinfo-full.txt: Will show the full PE details.
- peinfo-quick.txt: Will show a condensed version of the PE details.
- strings.txt: This is a dump of the strings command against the file.
- VirusShare_eef80511aa490b2168ed4c9fa5eafef0.VIR: This is a copy of the actual sample that was scanned.
If this file was a PDF, there would be a different set of artifacts to work with as MASTIFF would have analyzed the sample with different tools. In other files you may sometimes see a sig.txt which is a copy of certificate details if any were found, also you may see a resources folder if MASTIFF pulled out any resources such as icons or cursors.
Running cat against fuzzy.txt shows us that this particular sample is an 85% match for another APT1 sample we scanned fb671e6de6e301c892d2fdaa58f9cd9a:
tekmalinux@TekMALinux:/opt/work/apt1/log/eef80511aa490b2168ed4c9fa5eafef0$ cat fuzzy.txtFuzzy Hash: 384:AuBQ7dQNUO0COZqjPM7mV+fEdSn7XDOyP1i1JIC3crcUEG299Wy7HVqYJ4+izujI: 1BQeNUOViqjAmV+f0SXdiJNsYU0WwVqfThis file is similar to the following files:MD5 Percent fb671e6de6e301c892d2fdaa58f9cd9a 85
A cat on peinfo-quick shows us a lot about the DLLs that this malware leverages, and the particular functions it is likely to call. The first few are very telling to what we can expect this file
KERNEL32.dll Sleep 0x406010KERNEL32.dll GetTempPathA 0x406014KERNEL32.dll GetTempFileNameA 0x406018KERNEL32.dll CreateProcessA 0x40601cADVAPI32.dll RegOpenKeyExA 0x406000ADVAPI32.dll RegSetValueExA 0x406004ADVAPI32.dll RegCloseKey 0x406008WININET.dll InternetOpenA 0x40607cWININET.dll InternetOpenUrlA 0x406080WININET.dll InternetReadFile 0x406084WININET.dll InternetCloseHandle 0x406088urlmon.dll URLDownloadToFileA 0x406090
This is great info to have when we want to setup an environment for dynamic analysis later. For instance we know that the sample will attempt to create files, create processes, manipulate the registry and try to connect to a URL.
Reviewing strings.txt we are able to pick out some more clues as to what this sample will do. Some examples are:
5728 A http://www.rbaparts(.)com/images/li.gif : This will be a URL of interest56dc A IMSCMig.exe : Perhaps this is the process it will create.
While basic static analysis techniques are not going to give us the full story, we are able to learn much that can be applied to our basic dynamic analysis as well as advanced static and dynamic analysis.
For your convenience I have made the APT1 MASTIFF. Results available over at https://www.tekdefense.com/downloads/reports/.
Today’s post pic is from Twitter.com.