Subparse is a module framework that Josh Strochein and Aaron Baker developed. This framework can parse malware files, index them and display the results in a web-viewer searchable. It is modular and uses a core parsing engine as well as parsing modules. There are also enrichers which add more information to the malware indexes.
These are the main inputs for the framework. They include directories of malware file, which either the core or user-specified engine parses. Then they add additional information from user-specified enrichment engines before indexing it into an elasticsearch index.
You can search and view the information collected via a webviewer. This allows you to filter on values gathered from files. There are 3 default parsing engines (ELFParser OLEParser) and 4 enrichment module (ABUSEEnricher CAPEEnricher STRINGEnricher and YARAEnricher) currently available.
Getting Started
Software requirements
Before you can use Subparse, there are some prerequisites/recommend programs that must be installed before your software can work.
Other Requirements
There are several steps you need to take to install Subparse after installing the recommended software. Python Requirements Python needs to install some packages that Subparse depends on. You can complete the Python setup by going to the folder *parser* in the Subparse directory. To install Python, you’ll need the following commands: Docker Requirements Subparse relies on Docker to provide its web interface and backend. You must set up Docker containers before the program can be used. Navigate to the Subparse root directory and run the following command: To set up docker instances
Installation steps
Use
Optional Command Lines
Command line options that are available for subparse/parser/subparse.py:
Argument | Alternate | Requirements | Description |
---|---|---|---|
-h | -help | No | Displays the help menu |
-d SAMPLES_DIR | -directory SAMPLES_DIR | Yes | A directory of sample to be parsed |
-e ENRICHER_MODULES | -enrichers ENRICHER_MODULES | No | Modules to enrich the parsing |
-r | -reset | No | All data from the Elasticsearch cluster can be reset/destroyed |
-v | -verbose | No | Show verbose commandline output |
-s | -service-mode | No | Service mode is entered, allowing mode samples to be included in the SAMPLES_DIR during processing |
View Results
Navigate to localhost.8080 to view Subparse’s results. You may have trouble seeing the site if you don’t have Docker running and there are no other processes on port 8080.
Collecting general information
General information about the sample is collected before any parser can be executed, regardless of its file type. These information include:
- Sample MD5 hash
- The sample hash is SHA256
- Example name
- Size of the sample
- Extend the sample
- Extension of the sample derived from
Parser Modules
Only files that are compatible with the specified file types can have parsers executed. PE files, for example, will automatically have the PEParser run against them because the PEParser can examine the same file types.
Modules default
ELFParser is the default module for parsing ELF files. Information collected: OLEParser. This default parsing module will execute against OLE or RTF-formatted files. It uses OLETools to retrieve data. Information that has been collected: PEParser. This default parsing module will execute against PE files matching or including the following file types: PE32, MS-Dos. The following information is available:
Enricher Modules
-enrichers flag on the command line.
Modules default
ABUSEEnricher The enrichers use the [Abuse.ch]()() API and the [Malware Bazar]()() to gather more information about subparse(s). This information is then aggregated in the Elastic Database. CAPEEnricher The enrichers are used to communicate to a CAPEv2 sandbox instance to gather more information through dynamic analysis. This information is then aggregated into the Elastic Database using the Kafka Messaging Service to do background processing. STRINGEnricher The enricher is smart and will analyze the sample to find interesting strings. This enricher will search for the following strings: Images, Audio, Text, Files Executable, Code Calls and Compressed Files. IP addresses, IP address + port, website URLs and Command Line arguments. YARAEnricher This ericher uses a pre-compiled yara file located at: parser/src/enrichers/yara_rules. The pre-compiled file contains rules starting with and.
Developing Custom Parsers & Enrichers
Subparse’s website view was created using Bootstrap as its CSS. This allows any Bootstrap CSS to work when creating your custom Parser/EnricherVue.js file. To make it easier for you to get started, we have provided an example of each and also created a few widgets that will help to standardize the information being presented. Each Vue.js file is used to dynamically display information from the Parser/Enricher. They are also used as data templates.
Notice: You must strictly adhere to the naming conventions for both file and class names. This is what you should do if your Parser/Enricher fails to execute. Your Parser/Enricher name must be the same across all files and classes.
Logging
The singleton Python default logger implementation is the logger object. The provides more information. Subparse uses the output logging level. These are:
- Debug
- Warning
- Error
- Critical
- One exception
- Log
- Information
ACKNOWLEDGEMENTS
- This research was supported and co-authored by NSA Grant H98230-20-1-0426.