Monday, March 6, 2017

The CrossSearch Algorithm

When I first started building PacketTotal I wanted to give users the ability to find packet-captures similar to the one they uploaded.

Starting early on in the project I began writing a suite of algorithms to accomplish this in a reasonable amount of time. Unfortunately, at the time of the initial release, I did not have enough data to test the algorithm effectively, and was hesitant to release it to the community. However, with the number of submissions growing, I have been able to more effectively test this tool.

 The algorithms themselves are fairly straight forward, but in a nutshell they:
  1. Use a simple heuristic to identify promising search terms within the metadata of a particular capture.
  2. De-duplicate and aggregate these terms, constructing Lucene queries which can be used to search the Elastic backend.
  3. Run the queries against Elasticsearch.
  4. Group results by matched packet-captures, sorted by captures which contain the most matched terms.

The result - An incredibly powerful view, potentially allowing the users to pivot between captures that have similar attributes.
From a malware analysis standpoint the CrossSearch algorithm could give researchers the ability to tie together two or more seemingly unrelated campaigns - If, say, they shared a common C2 server or another IOC that could easily be missed in a manual analysis.

The first version of CrossSearch will be released later this week once final integration testing is completed, and a bit more tuning of the heuristic is done. The tool will be made available within the analysis console and will work against the currently selected view. Meaning, if the "Connections" tab is open,  running a CrossSearch will use the terms found within "Connections" to search the backend.

CrossSearch is the first of many tools that will greatly improve the value of the data being gathered on