Ten “Design First” Principles from Strata/Hadoop World NYC
Several themes emerged from my conversations with more than 35 vendors at the Strata/Hadoop World show in New York City this week. (See list of vendors I visited at the end.) Topping the list: automated machine learning services, simplified streaming platforms, end-to-end data lake management software, and powerful, low-cost data and hardware infrastructure. More importantly, I spotted the emergence of ten “design first” principles that will guide the development of data-driven applications in the future.
If you want to see and feel the pace of innovation in the analytics field, just spend a few hours at a Strata+Hadoop World conference.
I’ve been covering business intelligence, analytics, and data management for 20+ years and Strata+Hadoop World made me feel like a novice. Of the more than 100 exhibitors on the show floor, I only recognized about two-thirds. Where did all the new vendors come from? Whose needs are they serving?
For that matter, I hardly recognized many established vendors. The oldest ones are pivoting abruptly, embracing open source, cloud, streaming, subscription pricing, and freemium business models. Most speak a language of Apache projects that I hardly understand or can keep up with. It’s all quite dizzying.
Themes and Market Segments
Before I mention the ten “design-first” principles that emerged from the show, let me categories in more details the types of capabilities that vendors are delivering:
- Simplified streaming and stream-based analytic processing and alerting for the real-time enterprise.
- Automated creation of machine learning models that eliminate the need for data scientists or increase their productivity significantly.
- End-to-end machine learning development and operational environments that make predictive analytics more accessible to non-data scientists.
- Rent-a-data scientist services via auction or competitions (Kaggle and Experfy).
- Software that simplifies and manages the population of data lakes with trustworthy, governed data that is easily accessible to business users.
- Software or infrastructure that reduces the complexity of big data (i.e. Lambda) architectures
- Hybrid transaction-analytic processing databases (HTAP) that deliver fast queries against real-time data.
- Low-cost, fast, scalable databases and hardware designed for large volumes of streaming data.
In my years covering the space, I’ve discovered that the vendor community is usually about five to seven years ahead of the early majority market. But given the velocity of change in the technology and the eagerness of companies to reduce their IT costs with faster, better, cheaper tools, platforms, and infrastructure, I’m going to halve that number.
It’s pretty clear that there is a revolution going in the analytics space. There is a lot of dust and debris flying everywhere but the silhouette of the future is gradually emerging. The development of data-driven applications is starting to adhere to a number of key design principles.
Ten Design-First Principles
In the next several years, organizations will begin designing analytic environments with the following “design first” principles. Design first for….
- Real-time. Even if you need batch applications, build them on a streaming, event-driven infrastructure. It’s fast, cheap, and flexible.
- Prediction. Build analytic models into all business applications, creating a proactive data-driven enterprise that monetizes its data assets.
- APIs. Build applications using microservices and integrate them via standard application programming interfaces, creating highly flexible, extensible applications supported by a community of developers.
- Platform. With API-based applications, your environment is ultimately flexible. It can integrate with or support a multiplicity of internal and third party applications or be embedded in other applications to create high-value, customized data-driven applications and ecosystems.
- Multiple Engines. Rather than force one engine to support many diverse workloads, run each workload on an optimal engine.
- Stationary Data. Once you ingest data, never move it. Query or process data where it lies, using query federation to unify disparate data on the fly or push-down optimization to match workloads with embedded engines.
- Multiple Analytic Tools. Standardize where it counts—on flexible semantic models—rather than toolsets. But where possible, choose tools with open APIs that don’t replicate data.
- Cloud. Design your application for the cloud and hybrid data processing.
- Web. This is an oldy-but-goody that is now just a given: never use a desktop client.
- Mobile. Another oldy-but-goody: design your application for mobile delivery using a responsive design…
…Logtrust – Analytical tool for log and text data; a more modern Splunk