My doctoral research is on 'Applications of Deep Learning in Cyber Security using Ensemble Methods', and is being supervised at the Institute for Artificial Intelligence, De Montfort University (DMU), Leicester. This focusses on deep learning with blended machine learning techniques to improve anomaly detection with large event flow datasets. There are a range of sub-problems within the scope, including: optimising feature selection, topology optimisation; analysis of optimal hybrid models, and large scale geo-temporal data handling. This extends a project started during my masters in Intelligent Systems & Robotics, also at DMU. Related papers will be posted here.
Flow Simulation
The supporting project uses a mix of live traffic feeds, public datasets and simulated data. It includes a discrete event protocol flow simulator and a modular detection engine, written in Go and Python, and makes use of Go's excellent concurrency model.
This framework generates high rates of network traffic as aggregated flows, with modelled protocol states and behaviour. All timing, protocol definitions, anomalies, and event distributions can be controlled using user-defined policies, including the ability to model endpoint distributions and classes. The simulator provides the ability to introduce subtle changes in asymmetry, burstiness, entropy, misuse of state etc. It also allows examination of command sequencing for investigating masquerade detection strategies.
Models can be calibrated directly with, and even fed from, live traffic traces, including translation and statistical analysis of public datasets. The framework includes both machine learning and domain expert detection and classification modules, interfaced using standardised APIs. Modules can be arranged as linear 'stacks' or clusters, with or without feedback. enabling tests to be performed on the relative merits of ensemble detection models verses isolated detectors.
Further Work
It is likley that some of the code will be made available as open source once the simulation model is sufficiently well calibrated.