Data Management at Scale: What’s Possible vs. What’s Practical
When it comes to executing at scale for technology solutions, enterprises typically end up with two main considerations: what’s possible, given no constraints, and what’s practical. Possible scale is what’s within the range of do-ability, technically, financially or otherwise. Practical scale is what’s actually achievable within real-world constraints. Just because one has the ability to throw a hundred servers at a data operations solution does not mean it’s practical to do so. Total cost of ownership (TCO), including hardware, people, and infrastructure costs, is among the criteria that define what is practical for implementing a solution at scale.
The requirements for ingesting, storing and analyzing machine data have changed dramatically over the last decade-plus. Fifteen years ago, having to manage 100MB of inbound data per day, as well as terabytes of data at rest, was a big deal. Even then, while it was viable to use a relational database (RDBMS) as a data management solution, the practicalities of cost, resources, and infrastructure for supporting those solutions became a problem. Eventually, technology evolved, for example by engineering out relational databases in an effort to provide more practical approaches to managing large-scale data, both real-time streams and data at rest.
We see similar issues today when talking about managing multiple terabytes of streaming data per day with petabytes and more at rest. While it’s technically viable for some solutions to handle this type of scale, the practicality of their requirements makes those solutions a non-starter. Hundreds of servers and a team of experts to manage these solutions really eats into TCO and the bottom line.
Logtrust was created as an evolutionary solution that takes a new approach to the underlying technologies needed to address today and tomorrow’s scale requirements in both viable and practical ways. For example, Logtrust has moved beyond older technologies typically used for ingest, such as parsing and indexing during loading, to create an extremely optimized and efficient architecture that provides the ability to ingest more than 2TB/day on each server, with sub-second data availability times for query. This is what true real-time data access at scale looks like – and it offers much lower TCO.
The graphs below, produced during an internal Logtrust performance benchmark, show data ingest rates hitting 100TB/day with only 20% max CPU utilization. This was achieved on six standard data node servers.
When it comes to data retention, Logtrust compression capabilities bring storage costs way down. The platform’s average 10:1 compression ratio against all types of data, combined with extremely efficient data classification, squeezes the utmost capacity out of storage solutions. Logtrust does not index data at ingest; instead, it tags stream data and writes it to disk, compressing the raw data files on disk. For almost zero processing cost, Logtrust creates a first level natural index using the tagging information to generate the directory structure used for storing those compressed files. Logtrust also creates secondary indexes that tokenize all data. This process uses in the range of 2% of the space consumed by the raw data. For example, if you bring in 100 GB of data, Logtrust compresses it to 10 GB; this creates a secondary index of 2 GB of data, for a total storage cost of about 12 GB. In contrast, other solutions would consume 50 GB, or more, for the same input.
Query and analysis are the drivers behind data operations requirements. Because of the efficiencies of the Logtrust ingest, compression and indexing architectures, query engines can reside on the same servers that provide those services. No external search servers are required, all while providing 1M+ EPS scan rates over petabytes of data at rest. Again, this lowers TCO at scale. This is in addition to Logtrust’ s easy to use self-service analytics that eliminate the need for specialized analytics expertise.
The graph below shows a typical per-server query performance of 1.5 Million EPS scan rates, against 340 GB of data.
Can your solution provide the same performance? Find out by speaking with a Logtrust expert, or learn more about the Logtrust performance stress test by downloading the first Logtrust Performance Perspectives report at https://www.logtrust.com/logtrust-perspectives-performance/ or reaching out to firstname.lastname@example.org.