Your data model can make or break your application, but it is secondary to your infrastructure. In theory your infrastructure would dictate your initial design decisions and influence your data model. However, as a developer you often find yourself tweaking both, in parallel. This often leads to bizarre and terrible design patterns.
|PacketTotal 1.0 ran on a two node ElasticSearch cluster, with a local based retention backend for raw pcaps.|
This was the case with the original version of PacketTotal, and it has lead to some scalability issues. Since the initial release we have addressed several of these by adding redundancy, load-balancing, and caching, but it doesn't solve the underlying issue - that our infrastructure was designed for a few concurrent users, not dozens.
|PacketTotal 2.0 ran on a much more robust ElasticSearch cluster, and migrated much of it's raw PCAP processing and retention to AWS serverless infrastructure.|
To date PacketTotal has focused very much on static based PCAP analysis. As we collect, categorize, and enrich this data it becomes obvious that there is a holistic value to this it as well. A few use-cases:
|Malware Archive gives you insight into malicious traffic from a variety of sources|
- Understanding how tactics, threats, and procedures (TTPs) of malicious adversaries evolve over time.
- Identifying top threats and their targeted sectors.
- Dynamically detecting IOCs through heuristic based approach.
- Dynamically creating new signatures based around "known bad" and "likely bad"
- Creating archives to categorize types of traffic interesting to students and researchers
To accomplish this we have begun the process of firstly migrating our existing data to a higher availability ElasticSearch cluster and removing some previous bottlenecks on our network. Secondly, we've re-indexed our data, and mapped it to field specific data-types. This dramatically increases search performance and accuracy as well as our ability to correlate across datasets, allowing us to start delivering on some of the use-cases above.
The new infrastructure is still undergoing testing, and will not be put into production until mid-march. In the meantime, stay tuned for the beta API release later this month which we will be making available to those interested!