Do the flow-bot: applying machine learning to internet-scale security analytics
BlogThe analysis of network flows for security is not new and has been adopted in both network and security industries for more than a decade. It was originally invented for high speed switching but has also been used in Distributed Denial-of-Service (DDoS) attack detection.
Recently, with the help of technological advancements in machine learning and streaming analytics, it is getting renewed attention as a countermeasure to rapidly evolving cyber-attacks by globally syndicated adversaries beyond DDoS attacks. Additionally, as internet traffics are encrypted at an accelerated rate, the meta information, such as network flow, is becoming the only available data for the analysis anyway.
New large-scale network analytics
By applying machine learning to the internet scale network analytics, it is possible to produce high quality, real-time blacklists of Command and Control (C&C) servers that are detected up to two weeks earlier than major vendors.
The internet scale network analytics provides a more complete understanding of botnet infrastructures as they are being formed in real-time. This includes the location and nature of C&C servers, bots under the control, and ultimately who is behind the infrastructure. It facilitates the quick detection of targeted attacks, flagging if a given attack is random (i.e., it flows to many organisations) or is targeted (i.e., flows to a specific organisation).
The analysis enables us to block and ultimately take down the botnet infrastructure from which attackers launch many types of attacks, such as DDoS and ransomware distributions.
Advancements in network flow analysis
So what is the key to effectively enhancing security network flows? Access to a large amount of flow data helps, however, advancements made in these two areas prove to be crucial:
Big streaming analytics platform
Recently, open source projects in streaming analytics, such as Apache Kafka, Spark and Flink, are becoming very active and producing innovative software. NTT Group, including companies such as NTT Security, Dimension Data and NTT DATA, is actively participating in those projects and has built a scalable, fast-streaming platform by leveraging the best of breeds of open source software. The platform can easily handle pipeline processing of hundreds of thousands of flows per second and enables us to apply advanced analytics to a large amount of data streams in a massively scalable manner.
Data quality improvement
Arguably, for machine learning, the quality of data used in training is the most important factor that determines the overall performance. Working on correlation with results from passive Domain Name System (DNS) data and flow analysis to improve accuracy and coverage is imperative to achieving that security utopia.
There also now exists the possibility of integrating with internet scale active scanning (in the IPv6 space), as well as catering for Operation Technology (OT). In OT environments, non-IP proprietary protocols are often used but flow patterns are statistically predictable. Because of these characteristics, OT environments lend themselves well to flow-based anomaly detection.
The number of ways in which IoT devices can help people and organisations is boundless. However, IoT devices pose new and unique security challenges due to their massive and ubiquitous installed base, as well as the limitation in their computing resources. Continuing to invest in enhancing large-scale network analytics is essential, not only for IoT but also other disruptive technologies.
For more information on our new botnet infrastructure detection capabilities, click here.