A medium-data framework for security research and development teams.
Workbench focuses on simplicity, transparency, and easy on-site customization. As an open source python project it provides light-weight task management, execution and pipelining for a loosely-coupled set of python classes.
The workbench project takes the workbench metaphore seriously. It’s a platform that allows you to do work; it provides a flat work surface that supports your ability to combine tools (python modules) together. In general a workbench never constrains you (oh no! you can’t use those 3 tools together!) on the flip side it doesn’t hold your hand either. Using the workbench software is a bit like using a Lego set, you can put the pieces together however you want AND adding your own pieces is super easy!.
The workbench server is extremely robust to worker failure. In fact it can run without many of the dependencies so you can setup a server quickly with ‘Minimum Install’ and than later do a ‘Full Install’.
$ brew install mongodb
$ sudo apt-get install mongodb
$ sudo apt-get install python-dev
$ sudo apt-get install g++
$ pip install workbench --pre
$ workbench_server
That’s it, the workbench server will come up and is ready to start servicing requests. Note: Some workers will fail to load but that is fine, to have all workers run see ‘Full Install’.
$ pip install workbench --pre
$ workbench (this runs the Workbench CLI)
That’s it!
If you have a workbench server setup (somewhere) you can now start the workbench CLI client, or any of the existing clients (in workbench/clients) or even start writing your own clients against that server (see Making your own Client)
$ brew install mongodb $ brew install yara $ brew install libmagic $ brew install broImportant
Put the bro executable in your PATH (/usr/local/bin or wherever bro is)
$ sudo apt-get install mongodb $ sudo apt-get install python-dev $ sudo apt-get install g++ $ sudo apt-get install libssl0.9.8
- Bro IDS: In general the Bro debian package files are WAY too locked down with dependencies on exact versions of libc6 and python2.6. We have a more ‘flexible’ version Bro-2.2-Linux-x86_64_flex.deb.
sudo dpkg -i Bro-2.2-Linux-x86_64_flex.deb
- If using the Debian package above doesn’t work out:
- Check out the Installation tutorial bro_install
- or this one bro_starting
- or go to offical Bro Downloads www.bro.org/download/
Important
Put the bro executable in your PATH (/opt/bro/bin or wherever bro is)
The indexers ‘Neo4j’ and ‘ElasticSearch’ are optional. We strongly suggest you install both of them but we also appreciate that there are cases where that’s not possible or feasible.
$ brew install elasticsearch $ pip install -U elasticsearch $ brew install neo4j
- Note: You may need to install Java JDK 1.7 Oracle JDK 1.7 DMG for macs.
- Neo4j: See official instructions for Neo4j here
- Note: You may need to install Java JDK 1.7. If you have Java 1.7 installed and error says otherwise, run
$ update-alternatives --config java and select Java 1.7
ElasticSearch:
- wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.2.1.deb
- sudo dpkg -i elasticsearch-1.2.1.deb
- sudo update-rc.d elasticsearch defaults 95 10
- sudo /etc/init.d/elasticsearch start
- Any issues see elasticsearch_webpage
Note: Workbench is continuously tested with python 2.7. We’re currently working on Python 3 support (Issue 92).
For quick spinup just pull Workbench down from pip. If you’re going to do development
$ pip install workbench --pre $ workbench_serverOR
$ cd workbench $ python setup.py develop $ workbench_server
Robomongo
Robomongo is a shell-centric cross-platform MongoDB management tool. Simply, it is a handy GUI to inspect your mongodb.
- http://robomongo.org/
- download and follow install instructions
- create a new connection to localhost (default settings fine). Name it as you wish.
Python Modules
Note: If you get a bunch of clang errors about unknown arguments or ‘cannot link a simple C program’ add the following FLAGs:
$ export CFLAGS=-Qunused-arguments $ export CPPFLAGS=-Qunused-arguments **Errors when running Tests**If when running the worker tests you get some errors like ‘MagicError: regexec error 17, (illegal byte sequence)’ it’s an issue with libmagic 5.17, revert to libmagic 5.16. Using brew on Mac:
$ cd /usr/local $ brew versions libmagic # Copy the line for version 5.16, then paste (for me it looked like the following line) $ git checkout bfb6589 Library/Formula/libmagic.rb $ brew uninstall libmagic $ brew install libmagic
$ pip install workbench --pre
$ workbench_server
There are about a dozen example clients showing how to use workbench on pcaps, PEfiles, pdfs, and log files. We even have a simple nodes.js client (looking for node devs to pop some pull requests :).
$ cd workbench/clients
$ python simple_workbench_client.py [-s tcp://mega.server.com]
PCAP to Graph (A short teaser)
Adding a new Worker (super hawt)
WIP Notebooks
Workers should adhere to the following naming conventions (not enforced)
that
Examples: pcap_bro.py, pe_features.py, log_meta.py
(x_pcap_razor.py)
‘view_’
Examples: view_log_meta.py, view_pdf.py, view_pe.py
Although the Workbench repository has dozens of clients (see workbench/clients)there is NO official client to workbench. Clients are examples of how YOU can just use ZeroRPC from the Python, Node.js, or CLI interfaces. See ZeroRPC.
import zerorpc
c = zerorpc.Client()
c.connect("tcp://127.0.0.1:4242")
with open('evil.pcap','rb') as f:
md5 = c.store_sample('evil.pcap', f.read())
print c.work_request('pcap_meta', md5)
Output from above ‘client’:
{'pcap_meta': {
'encoding': 'binary',
'file_size': 54339570,
'file_type': 'tcpdump (little-endian) - version 2.4 (Ethernet, 65535)',
'filename': 'evil.pcap',
'import_time': '2014-02-08T22:15:50.282000Z',
'md5': 'bba97e16d7f92240196dc0caef9c457a',
'mime_type': 'application/vnd.tcpdump.pcap'
}}``
brew install freetype
brew install gfortran
pip install -r requirements\_notebooks.txt
Go to Starbucks..
Unit testing, sub-pipeline tests, and full pipeline tests
$ tox
We have no idea why occasionaly you see this pop up in the server output. To our knowledge it literally has no impact on any functionality or robustness. If you know anything about this please help us out by opening an issue and pull request. :)
ERROR:zerorpc.channel:zerorpc.ChannelMultiplexer, unable to route event:
_zpc_more {'response_to': '67d7df3f-1f3e-45f4-b2e6-352260fa1507', 'zmqid':
['\x00\x82*\x01\xea'], 'message_id': '67d7df42-1f3e-45f4-b2e6-352260fa1507',
'v': 3} [...]
The vt_query.py worker uses a shared ‘low-volume’ API key provided by SuperCowPowers LLC. When running the vt_query worker the following warning happens quite often:
"VirusTotal Query Error, no valid response... past per min quota?"
If you’d like to use the vt_query worker on a regular basis, you’ll have to put your own VirusTotal API key in the workbench/server/config.ini file.
When you first run workbench it copies default.ini to config.ini within the workbench/server directory, you can make local changes to this file without worrying about it getting overwritten on the next ‘git pull’. Also you can store API keys in it because it never gets pushed back to the repository.
# Example/default configuration for the workbench server
[workbench]
# Server URI (server machine ip or name)
# Example: mybigserver or 12.34.56.789
server_uri = localhost
# DataStore URI (datastore machine ip or name)
# Example: mybigserver or 12.34.56.789
datastore_uri = localhost
# Neo4j URI (Neo4j Graph DB machine ip or name)
# Example: mybigserver or 12.34.56.789
neo4j_uri = localhost
# ElasticSearch URI (ELS machine ip or name)
# Example: mybigserver or 12.34.56.789
els_uri = localhost
# DataStore Database
# Example: customer123, ml_talk, pdf_deep
database = workbench
# Storage Limits (in MegaBytes, 0 for no limit)
worker_cap = 10
samples_cap = 200
# VT API Key
# Example: 93748163412341234v123947
vt_apikey = 123
The developers of Workbench feel like Medium-Data is a sweet spot, large enough to be meaningful for model generation, statistics and predictive performance but small enough to allow for low latency, fast interaction and streaming ‘hyperslabs’ from server to client.
Many of our examples (notebooks) illustrate the streaming generator chains that allow a client (python script, IPython notebook, Node.js, CLI) to stream a filtered subset of the data over to the client.
Once you efficiently (streaming with zero-copy) populate a Pandas dataframe you have access to a very large set of statistics, analysis, and machine learning Python modules (statsmodel, Pandas, Scikit-Learn).
Workbench server will run great on a laptop but when you’re working with a group of researchers the most effective model is a shared group server. A beefy Dell server with 192Gig of Memory and a 100 TeraByte disk array will allow the workbench server to effectively process in the neighborhood of a million samples (PE Files, PDFs, PCAPs, SWF, etc.)
As you’ve noticed from many of the documents and notebooks, Workbench often defaults to using a local server. There are several reasons for this approach:
All clients have a -s, –server argument:
$ python pcap_bro_indexer.py # Hit local server
$ python pcap_bro_indexer.py -s = my_server # Hit remote server
If you always hit a remote server simply change the config.ini in the clients directory to point to the groupserver.:
server_uri = localhost (change this to whatever)
Okay I’ve changed my config.ini file, and now it shows up when I do a ‘$ git status’. How do I have git ignore it?:
git update-index --assume-unchanged workbench/clients/config.ini
git update-index --assume-unchanged workbench/server/config.ini
In general workbench should be treated like any other python module and it shouldn’t add any complexity to existing development/QA/deployment models. One suggestion (to be taken with a grain of salt) is simply to use git braches.:
$ git checkout develop (on develop server)
$ git checkout master (on prod server)
Please go to the GitHub Issues page: https://github.com/SuperCowPowers/workbench/issues.
Warning
Caution!: The repository contains malcious data samples, be careful, exclude the workbench directory from AV, etc...
git clone https://github.com/supercowpowers/workbench.git
Workbench uses the ‘GitHub Flow’ model: GitHub Flow
- Fork the repo on GitHub
- git clone git@github.com:your_name_here/workbench.git
$ git checkout -b my-awesome $ git push -u origin my-awesome $ <code for a bit>; git push $ <code for a bit>; git push $ tox (this will run all the tests)
- Go to github and hit ‘New pull request’
- Someone reviews it and says ‘AOK’
- Merge the pull request (green button)
Warning
Make sure workbench/data/memory_images/exemplar4.vmem isn’t there, remove if necessary!
$ pip install -e .
$ python setup.py sdist
$ cd dist
$ tar xzvf workbench-0.x.y.tar.gz
$ cd workbench-0.x.y/
$ python setup.py install
$ workbench_server
$ pip install tox
$ tox (pass all tests)
$ python setup.py publish
$ pip install workbench --pre
$ workbench_server (in one terminal)
$ pip install pytest-cov
$ cd workbench/workers
$ ./runtests (in another terminal)
This client generates customer reports on all the samples in workbench.
This client calls a bunch of help commands from workbench
This client gets metadata about log files.
This client pushes PCAPs -> Bro -> ELS Indexer.
This client gets the raw bro logs from PCAP files.
This client gets extracts URLs from PCAP files (via Bro logs).
This client pulls PCAP ‘views’ (view summarize what’s in a sample).
This client pulls PCAP meta data.
This client pushes PCAPs -> MetaDaa -> ELS Indexer.
This client pushes PE Files -> ELS Indexer.
This client looks for PEid signatures in PE Files.
This client generates a similarity graph from features in PE Files.
Add the given file_list to workbench as samples, also add them as nodes.
Parameters: |
|
---|---|
Returns: | A list of md5s. |
Compute Jaccard similarities between all the observations in the feature list.
Parameters: | feature_list – a list of dictionaries, each having structure as { ‘md5’ : String, ‘features’: list of Strings } |
---|---|
Returns: | list of dictionaries with structure as {‘source’: md5 String, ‘target’: md5 String, ‘sim’: Jaccard similarity Number} |
This client pushes a big directory of different files into Workbench.
This client pushes a file into Workbench.
This encapsulates some boilerplate workbench client code.
This client shows workbench extacting files from a zip file.
Workbench Clients.
This module handles the mechanics around easily pulling in Bro Log data.
The read_log method is a generator (in the python sense) for rows in a Bro log, because of this, it’s memory efficient and does not read the entire file into memory.
DataStore class for WorkBench.
Bases: object
DataStore for Workbench.
Currently tied to MongoDB but making this class ‘abstract’ should be straightforward and we could think about using another backend.
Initialization for the Workbench data store class.
Parameters: |
|
---|
Store a sample into the datastore.
Parameters: |
|
---|---|
Returns: | Digest md5 digest of the sample. |
Clean data in preparation for serialization.
Deletes items having key either a BSON, datetime, dict or a list instance, or starting with __.
Parameters: | data – Sample data to be serialized. |
---|---|
Returns: | Cleaned data dictionary. |
Clean data in preparation for storage.
Deletes items with key having a ‘.’ or is ‘_id’. Also deletes those items whose value is a dictionary or a list.
Parameters: | data – Sample data dictionary to be cleaned. |
---|---|
Returns: | Cleaned data dictionary. |
Get the sample from the data store.
This method first fetches the data from datastore, then cleans it for serialization and then updates it with ‘raw_bytes’ item.
Parameters: | md5 – The md5 digest of the sample to be fetched from datastore. |
---|---|
Returns: | The sample dictionary. |
Raises: | RuntimeError – Either Sample is not found or the gridfs file is missing. |
Get a window of samples not to exceed size (in MB).
Parameters: |
|
---|---|
Returns: | a list of md5s. |
Checks if data store has this sample.
Parameters: | md5 – The md5 digest of the required sample. |
---|---|
Returns: | True if sample with this md5 is present, else False. |
List all samples that meet the predicate or all if predicate is not specified.
Parameters: | predicate – Match samples against this predicate (or all if not specified) |
---|---|
Returns: | List of dictionaries with matching samples {‘md5’:md5, ‘filename’: ‘foo.exe’, ‘type_tag’: ‘exe’} |
Store the output results of the worker.
Parameters: |
|
---|
Get the results of the worker.
Parameters: |
|
---|---|
Returns: | Dictionary of the worker result. |
Return a list of all md5 matching the type_tag (‘exe’,’pdf’, etc).
Parameters: | type_tag – the type of sample. |
---|---|
Returns: | a list of matching samples. |
Run periodic operations on the the data store.
Operations like making sure collections are capped and indexes are set up.
ELSIndexer class for WorkBench.
Bases: object
ELS Stub.
Stub Indexer Initialization.
Bases: object
ELSIndexer class for WorkBench.
Initialization for the Elastic Search Indexer.
Parameters: | hosts – List of connection settings. |
---|
NeoDB class for WorkBench.
Bases: object
NeoDB Stub.
NeoDB Stub.
Bases: object
NeoDB indexer for Workbench.
Initialization for NeoDB indexer.
Parameters: | uri – The uri to connect NeoDB. |
---|---|
Raises: | RuntimeError – When connection to NeoDB failed. |
Add the node with name and labels.
Parameters: |
|
---|---|
Raises: | NotImplementedError – When adding labels is not supported. |
Checks if the node is present.
Parameters: | node_id – Id for the node. |
---|---|
Returns: | True if node with node_id is present, else False. |
A simple plugin manager. Rolling my own for three reasons: 1) Environmental scan did not give me quite what I wanted. 2) The super simple examples didn’t support automatic/dynamic loading. 3) I kinda wanted to understand the process :)
Workbench: Open Source Security Framework
Bases: object
Workbench: Open Source Security Framework.
Initialize the Framework.
Parameters: |
|
---|
Store a sample into the DataStore. :param filename: name of the file (used purely as meta data not for lookup) :param input_bytes: the actual bytes of the sample e.g. f.read() :param type_tag: (‘exe’,’pcap’,’pdf’,’json’,’swf’, or ...)
Returns: | the md5 of the sample. |
---|
Get a sample from the DataStore. :param md5: the md5 of the sample
Returns: | A dictionary of meta data about the sample which includes a [‘raw_bytes’] key that contains the raw bytes. |
---|
Get a sample from the DataStore. :param type_tag: the type of samples (‘pcap’,’exe’,’pdf’) :param size: the size of the window in MegaBytes (10 = 10MB)
Returns: | A list of md5s representing the newest samples within the size window |
---|
Do we have this sample in the DataStore. :param md5: the md5 of the sample
Returns: | True or False |
---|
List all samples that meet the predicate or all if predicate is not specified.
Parameters: | predicate – Match samples against this predicate (or all if not specified) |
---|---|
Returns: | List of dictionaries with matching samples {‘md5’:md5, ‘filename’: ‘foo.exe’, ‘type_tag’: ‘exe’} |
Index a stored sample with the Indexer. :param md5: the md5 of the sample :param index_name: the name of the index
Returns: | Nothing |
---|
Index worker output with the Indexer. :param worker_name: ‘strings’, ‘pe_features’, whatever :param md5: the md5 of the sample :param index_name: the name of the index :param subfield: index just this subfield (None for all)
Returns: | Nothing |
---|
Search a particular index in the Indexer :param index_name: the name of the index :param query: the query against the index
Returns: | All matches to the query |
---|
Add a node to the graph with name and labels. :param node_id: the unique node_id e.g. ‘www.evil4u.com’ :param name: the display name of the node e.g. ‘evil4u’ :param labels: a list of labels e.g. [‘domain’,’evil’]
Returns: | Nothing |
---|
Does the Graph DB have this node :param node_id: the unique node_id e.g. ‘www.evil4u.com’
Returns: | True/False |
---|
Add a relationship: source, target must already exist (see add_node) ‘rel’ is the name of the relationship ‘contains’ or whatever. :param source_id: the unique node_id of the source :param target_id: the unique node_id of the target :param rel: name of the relationship
Returns: | Nothing |
---|
Make a work request for an existing stored sample. :param worker_name: ‘strings’, ‘pe_features’, whatever :param md5: the md5 of the sample :param subkeys: just return a subfield e.g. ‘foo’ or ‘foo.bar’ (None for all)
Returns: | The output of the worker or just the subfield of the worker output |
---|
Store a sample set (which is just a list of md5s).
Note: All md5s must already be in the data store.
Parameters: | md5_list – a list of the md5s in this set (all must exist in data store) |
---|---|
Returns: | The md5 of the set (the actual md5 of the set |
Store a sample set (which is just a list of md5s).
Parameters: | md5_list – a list of the md5s in this set (all must exist in data store) |
---|---|
Returns: | The md5 of the set (the actual md5 of the set |
Gives you the current datastore URL.
Returns: | The URI of the data store currently being used by Workbench |
---|
rekall_adapter: Helps Workbench utilize the Rekall Memory Forensic Framework. See Google Github: http://github.com/google/rekall All credit for good stuff goes to them, all credit for bad stuff goes to us. :).
Bases: object
RekallAdapter: Helps utilize the Rekall Memory Forensic Framework.
Initialization.
JSON Meta worker
Logfile Meta worker
Memory Image base worker. This worker utilizes the Rekall Memory Forensic Framework. See Google Github: http://github.com/google/rekall All credit for good stuff goes to them, all credit for bad stuff goes to us. :)
Memory Image ConnScan worker. This worker utilizes the Rekall Memory Forensic Framework. See Google Github: http://github.com/google/rekall All credit for good stuff goes to them, all credit for bad stuff goes to us. :)
Memory Image DllList worker. This worker utilizes the Rekall Memory Forensic Framework. See Google Github: http://github.com/google/rekall All credit for good stuff goes to them, all credit for bad stuff goes to us. :)
Memory Image Meta worker. This worker utilizes the Rekall Memory Forensic Framework. See Google Github: http://github.com/google/rekall All credit for good stuff goes to them, all credit for bad stuff goes to us. :)
Memory Image ProcDump worker. This worker utilizes the Rekall Memory Forensic Framework. See Google Github: http://github.com/google/rekall All credit for good stuff goes to them, all credit for bad stuff goes to us. :)
Memory Image PSList worker. This worker utilizes the Rekall Memory Forensic Framework. See Google Github: http://github.com/google/rekall All credit for good stuff goes to them, all credit for bad stuff goes to us. :)
Meta worker
MetaDeep worker
PcapBro worker
pcap_graph worker
pcap_http_graph worker
PE Classify worker (just a placeholder, not a real classifier at this point)
PE SSDeep Similarity worker
PE Features worker. This class pulls static features out of a PE file using the python pefile module.
Bases: object
Create instance of PEFileWorker class. This class pulls static features out of a PE file using the python pefile module.
Init method
Set the dense feature list that the Python pefile module should extract. This is really just sanity check functionality, meaning that these are the features you are expecting to get, and a warning will spit out if you don’t get some of these.
Set the dense feature list that the Python pefile module should extract.
Set the sparse feature list that the Python pefile module should extract. This is really just sanity check functionality, meaning that these are the features you are expecting to get, and a warning will spit out if you don’t get some of these.
Set the sparse feature list that the Python pefile module should extract.
This python class codifies a bunch of rules around suspicious static features in a PE File. The rules don’t indicate malicious behavior they simply flag things that may be used by a malicious binary. Many of the indicators used were inspired by the material in the ‘Practical Malware Analysis’ book by Sikorski and Honig, ISBN-13: 978-1593272906 (available on Amazon :)
Description:
PE_WARNINGS = PE module warnings verbatim MALFORMED = the PE file is malformed COMMUNICATION = network activities CREDENTIALS = activities associated with elevating or attaining new privileges KEYLOGGING = activities associated with keylogging SYSTEM_STATE = file system or registry activities SYSTEM_PROBE = getting information from the local system (file system, OS config) SYSTEM_INTEGRITY = compromises the security state of the local system PROCESS_MANIPULATION = indicators associated with process manipulation/injection PROCESS_SPAWN = indicators associated with creating a new process STEALTH_LOAD = indicators associated with loading libraries, resources, etc in a sneaky way ENCRYPTION = any indicators related to encryption COM_SERVICES = COM functionality or running as a service ANTI_DEBUG = anti-debugging indicators
Bases: object
Create instance of Indicators class. This class uses the static features from the pefile module to look for weird stuff.
Note: All methods that start with ‘check’ will be automatically included as part of the checks that happen when ‘execute’ is called.
Init method of the Indicators class.
Checking for a checksum that doesn’t match the generated checksum
Checking if the reported image size matches the actual image size
Checking if any of the sections go past the total size of the image
Checking if the PE imports known methods associated with elevating or attaining new privileges
Checking if the PE imports known methods associated with elevating or attaining new privileges
Checking if the PE imports known methods associated with changing system state
Checking if the PE imports known methods associated with probing the system
Checking if the PE imports known methods associated with system security or integrity
Checking if the PE imports known methods associated with anti-debug
Checking if the PE imports known methods associated with COM or services
Checking if the PE imports known methods associated with process manipulation/injection
Checking if the PE imports known methods associated with spawning a new process
PE peid worker, uses the peid_userdb.txt database of signatures
Strings worker
SWFMeta worker: This is a stub the real class (under the experimental directory has too many dependencies)
Unzip worker
URLS worker: Tries to extract URL from strings output
view worker
view_customer worker
view_log_meta worker
view_memory worker
view_meta worker
view_pcap worker
view_pcap_details worker
view_pdffile worker
view_pe worker
view_zip worker
VTQuery worker
Yara worker