These days, many organizations have established reasonably efficient Security Operations capabilities, collecting security events from their systems, profiling user and machine behavior, and running investigative searches to support their incident response.
A major cybersecurity incident normally would require an organization to involve digital forensics experts, who usually need to spend significant amount of time to analyze a specific system and present their findings in a way of a timeline (when it happened), forensic artifacts (what happened), and — if they are lucky enough to get data for it — attacker tools, techniques and procedures (how it happened).
Traditionally, digital forensics assumes a paradigm of a ‘crime scene’ that needs to be investigated, a ‘post mortem’ analysis of prior events in an attempt to re-construct how an endpoint or a user account was compromised. It is usually performed under the presumption that the threat has already been contained and that the affected endpoint is now isolated from the network. A forensic examination of a single endpoint might take hours and days, and it really does not scale well as the number of affected or suspected endpoints increases. Furthermore, these days the affected endpoints can be anywhere , from physical PC’s to virtual containers in a multi-cloud environment.
What if you have 200 compromised endpoints (out of your 10,000 endpoint fleet), and you don’t know what else the threat actor could have deployed to your environment to ensure persistence?
Most organizations just do not have capabilities (tools and resources) to conduct security incident investigation at a deeper, forensic level — they have to accept the paradigm of an opportunistic attack, and assume that the problem is completely solved if they can simply wipe the affected endpoint remotely.
At the same time, many threat actors these days are actively seeking for long-term persistence. Can you ever be confident that the threat has not propagated to other parts of your network, in a completely different form now? If you were ever involved in an incident response, you know how hard it is to rule out all the possibilities, and how much guesswork is usually involved in the process as everyone is rushing to declare that everything is back to normal. How many times have you been wondering if other endpoints and users might also be affected, and if there is a practical way to qualify it?
While SIEM platforms usually provide good coverage in terms of the breadth of visibility, there are many security events which usually remain uncovered. Do you know many organizations monitoring all east-west network traffic? or all browser history for all users? full PowerShell payload analysis? every file that was accessed by every user on the network? all software running on remote workers’ laptops?
It is a significant challenge by itself to build an efficient Security Operations and Detection & Response capability. Whatever your efficiency criteria is, it is likely that your SecOps is running in a naturally established balance between the volume of available information (security events data) and its utility (ability of the organization to practically extract value in a way of high fidelity detections).
Getting more data means more visibility, which is generally a good thing (your chances to detect something malicious increase as you can see more events across your attack surface). However, the main problem comes with the fact that SOC operators have to detect anomalies (catch the fish) in a very large volume of data (big ocean), so more data also means more noise (false positives). In practice, the information utility is a function of available people, processes and technologies. Not being able to process all available data, day-to-day SecOps simply seeks to cover as much of the attack surface as possible (more log sources) hoping to detect something there, while forensic-like investigation capability relies on deeper visibility.