In the “Hot” Seat: Stuck with Cold Data Tiering
There seems to be no limit to the ingenuity of today’s cyber attacks, which are increasingly capable of finding and exploiting any weakness in your infrastructure. To prevent the next generation of cyber security threats you may need instant access to the totality of your company’s data. This means rethinking some very entrenched paradigms in information management, such as data temperature tiering, to make all data immediately accessible all-the-time–or always ‘hot.’
Before we talk about “rethinking” data temperature tiering it’s important to understand the reasons behind how it’s currently done. Simply put, “tiering” is prioritizing storage according to the importance of the data. Its primary purpose is to reduce the cost of storage, and its second and related purpose is to ensure that critical data can be accessed quickly. To understand how these two purposes are connected, let’s talk about data temperature.
Hot and Cold data
Hot data is typically critical information that needs to be accessed frequently and quickly, and consequently is stored on the fastest (and most expensive) medium available. At the other end of the spectrum “cold” data is so designated because it’s accessed less frequently, and in those instances where it does need to be accessed it’s typically OK if it takes a while to retrieve it. As logic would dictate it’s stored on cheaper, slower mediums–often even tape. And of course there’s “warm” data that falls somewhere in between these two. The tiering process, according to MapR’s William Peterson, requires a deep understanding of how data is used, and analyst John O’Brien points out that it often requires multiple departments to reconcile conflicting policies.
Discarding traditional notions of Hot vs. Cold data
So far we’ve talked about tiering in terms of the internal importance of the data, but how do you tier for security purposes? The hard truth is that when it comes to security, our notions of hot vs. cold are irrelevant because they’re based on how critical the datasets are to us. But, if you think like a hacker, you’re much less concerned with how important the data might be to an organization than you are with finding a way into the system. So the data targeted by hackers might not be what you’d normally access all of the time. In fact, in some cases data that’s considered “cold” might be a preferred attack vector simply because it’s less likely to be detected (like how a burglar is more likely to sneak in through your basement window than your front door). Additionally, malware “sleeper cells” may sit dormant alongside infrequently used data waiting for the opportune time to launch an attack–ProjectSauron, for instance, went undetected for five years.
In order to analyze and prevent attacks, organizations must be able to very quickly identify anomalies in all kinds of datasets that might have come from any endpoint. To gain a clear picture they have to immediately be able to compare it to historical data to determine what’s normal and what isn’t, and the lengthy procedure of retrieving datasets from cold or warm storage is just not tenable under these circumstances. This may help explain the results of a recent survey we conducted with 451 Research which found that “speed” topped the list of companies’ requirements for machine data analytics–69 percent of respondents want machine real-time (less than 5 second latency).
If all data is to be ‘always hot’, how do you manage the system without bankrupting your company paying for premium storage? Isn’t it kind of like saying “every employee has to fly first class all the time”? And, if our survey with 451 Research is any indicator, it’s a challenge that companies aren’t anywhere near ready to take up–53 percent said their technology wasn’t even capable of human real-time analytics (five seconds to five minutes latency).
This major hurdle must be addressed at the architecture level. In building the Logtrust platform we tackled this problem by leveraging high data compression (20:1 to 15:1) combined with native cloud compute to enable our clients to store extremely high volumes of data in an immediately queryable form. By doing so, we’ve lowered the cost per volume for storage to a point where you can have “always on” service and “always hot data” that’s immediately queryable without any hot/warm/cold storage segregation. In sum:
- The complexity and ingenuity of today’s attacks has rendered traditional data tiering unfeasible
- An architecture that allows you to keep all of your data hot all of the time is one that will allow you to gain the kind of clear, complete picture necessary to combat the next generation of cyber attackers
- Keeping data in “always on, always hot” mode is economically unfeasible, unless you address some fundamental issues at the architecture level
- Logtrust has leveraged data compression along with native cloud compute to solve this problem and deliver a truly “tierless” system
Blowing the lid off of security analytics with an always on, always hot system
With the Logtrust platform, storing all data in always on, always hot mode makes it feasible to analyze historical data alongside streaming data in real-time, performing forensic operations that detect, hunt and counter hackers in their cyber kill chain progression. Additionally, this supports the ability to boost the speed and effectiveness of investigative activities through interactive, intuitive visual analytics that allow you to correlate a variety of datasets, and can be quickly deployed and modified as hackers alter their tactics. And as time is of the essence, companies gain maximum benefit from the fact that our architecture is overlaid with the ability to conduct real-time advanced search and analytics without complex coding.
For example, engineers scrambling to secure their organizations against the impact of the “Heartbleed” flaw mostly looked at the problem in terms of the secure web server. However, as the flaw made it possible to steal encryption keys, they really needed to look at the whole infrastructure. Logtrust’s “always on, always hot” storage architecture allows visibility over the entire infrastructure regardless of when the data was generated or its volume, so that it can be analyzed through a single point. Because Logtrust can look at data over very long periods of time, organizations can see patterns associated with Heartbleed that might otherwise go undetected.
Eliminating the need for tiered storage opens up a world of possibilities for real-time security analytics. So while attackers may be getting more sophisticated, keeping data always on, always hot will enable you to stay ahead by employing a variety of new real-time security analytics techniques to stop them in their tracks.