There are gaps in your cyber security, and they are not where you think they are. No doubt you have invested in all the required protection elements, and everything should be just fine. But when an attack happens, the defenses may fail, system crashes and services go down.
Why does this happen? Especially in large networks, there are different specialists contributing to building the system at different times, which may lead to gaps. Software updates may be lagging. Most importantly, the world evolves fast and so do the hackers.
How can this be avoided? What can be improved? The answer is to test your network when it is actually operational. Cyber security can be tested, gaps can be found and removed with a controlled version of a real attack on an operational network. The following example shows how unexpected gaps were found, and how fixing them will not only lead to stronger security but also improved capacity.
Usually, systems are stress tested before they are in operational use. Sometimes early testing is omitted due to lack of time, or resources. Stress testing a network can seem like a waste of time, however, the time spent fixing problems in a crashed network is not only longer but costlier. On the bright side, even an operational network can be tested, and the results can be used to improve the cyber resilience but also to improve the capacity of the system.
Case study: DDoS attack on an operational network
An attack against an operator’s network had knocked down the defense systems. We suggested testing the operational network, to find out what had caused the crash. The time was set at night, during the least amount of traffic. Our equipment was set up, experts from both parties were there and a good amount of coffee was available. The cyber security stress test was ready to start.
Together we selected Top5 priority DDoS scenarios to be run against the system. We agreed to drive each scenario at a load level steps of 20%, 40%, 60%, 80% and 100%. Each sprint would take a few minutes, allowing us to get a clear picture of how the network would react to the attack load. We monitored various components in the network, CPU load, memory consumption, the number of transmitted and received packets, alarms, stateful message responses, etc.
As the load started to grow, the network operation became unstable, but the alarm systems did not indicate it. The first problems began already at 30-40% load levels. We observed that the network alarm system was not indicating the problem and that there were clear individual bottlenecks in the network. We also observed some peculiar packet drops, memory leaks, and a significant performance slow down during the test run. In an extreme example, network performance was worse using a 40% load than 80% load.
The result of the test were two important findings, namely, that the alarm configuration did not alert the attack early enough and that a bottleneck was limiting the capacity. Reconfiguring the alarm system will increase the security before the next attack. The bottleneck was found in an unexpected place. Although it was not the main cause of the crash, it was limiting network performance. By fixing this individual bottleneck, the network capacity could be doubled with a minimal cost. As a summary, the stress testing provided tangible results on how to build up the security and provided a view on capacity bottlenecks as well.
Hannu Saarenpää, Rugged Tooling