OPEN
 

Have a Question?

Not able to find something on the site? Contact us today. We will get back to you promptly.

 
Please type your full name.

Invalid email address.

Invalid Input

Invalid Input

Invalid Input

CAPTCHA
Invalid Input

Have A Question? Call Us Today. 847.871.0379  Contact

The Two Layers of Managed Analytics: Why Only One Is Solved

By Ravi Chandran, CEO, XtremeData

When Redshift first appeared, the value proposition was simple: we'll take the operational burden of running an analytical data warehouse off your plate. No hardware to provision. No software to patch. No clusters to manage. You write SQL; we run it. The market responded. Redshift, Snowflake, BigQuery, and Databricks have together transformed enterprise analytics from an infrastructure project into a managed service.

That model is genuinely valuable, and we should be precise about what it accomplishes. But we should also be precise about what it does not accomplish, because the gap shows up in every analytics bill as a hidden line item paid not to the cloud vendor but to the customer's own staff.

 

Two layers of management

Running a database, whether for transactions or analytics, requires two distinct categories of operational work. The first category is what we'll call infrastructure management: provisioning servers, installing and patching software, configuring replication, taking backups, handling failover, monitoring system health. These tasks are essential, and they don't depend on what's stored in the database; they are the same whether the workload is a billing system or a sales-forecasting warehouse.

The second category is what we'll call performance management: designing schemas, choosing sort keys, deciding partitioning and distribution strategies, defining indexes, tuning queries, redistributing data as scale grows. This work is intimately tied to the specific workload. The right partitioning strategy for an event-analytics workload is different from the right one for a multi-table BI workload; the right sort key for one query pattern is the wrong one for another.

Today's managed analytical services handle the first category beautifully. The second category is still on the customer. Read carefully through the documentation of Snowflake's clustering keys, Redshift's distribution and sort keys, or Databricks' Z-ordering and partition design, and the pattern becomes clear. The engine helps, sometimes automates parts of the work, but the strategic choices and ongoing maintenance remain a customer-side responsibility. They require staff, expertise, and continuous attention.

The hidden cost

The economic consequence of this gap is significant but rarely shows up as a line item on the analytics-service invoice. It shows up on the customer's payroll, in the form of a database engineering team whose job is to make the managed service perform.

A mid-market enterprise running a substantial analytics workload typically carries a team of DBAs or data engineers whose primary job is performance tuning. As data volumes grow, that team grows with it. The cost compounds with data: the bigger the workload, the more performance-tuning labor is required, and that labor sits on the customer's books, not the cloud provider's.

There is a second hidden cost that is harder to quantify but real. Every new batch of data ingested has to be assimilated into the existing performance design. If data is to remain sorted, sort-merge operations are required against each batch. As data distribution shifts, clustering keys go stale and have to be redesigned. New query patterns require new tuning. This isn't a one-time setup cost. It's a continuous tax that grows with the size and complexity of the workload.

Customers pay this cost because they have to. There's no engine on the market today that has eliminated it. The assumption, until now, has been that this is just how analytical databases work.

What an engine that solves both layers looks like

The technical question worth asking is whether the performance layer can be designed away. The answer is that it can, but only if the engine is built from the ground up around that goal. Retrofitting an engine is difficult, since certain assumptions are deeply embedded in the algorithms that create and execute the query plan. The architectural choices in the original design create the dependency on customer-side tuning.

An engine designed to automatically manage the performance layer must be based on an architecture that directly addresses the issues underlying the need for DBA intervention via sorting, clustering, and hash distribution. All of these interventions are needed in order to minimize data volumes on two hardware subsystems — storage and network — that are relatively slow compared to the compute subsystems (CPU and memory). Sorting and clustering are designed to prune out irrelevant rows and columns before reading from storage. Hash distribution reduces data movement over the network between cluster nodes by co-locating data for join operations. These goals of minimizing data volumes are on the right track, but unfortunately they require skilled DBAs with a priori knowledge of both data and query patterns. And as data volumes grow, these interventions need constant re-assessment and re-tuning, which are expensive both in terms of labor and system time consumption.

An engine that provides performance without needing such DBA interventions must have several key properties. First, in order to enable efficient data access at the storage level, data must be stored natively in compressed columns; async I/O must be implemented to overlap storage access with processing; and the engine must support flexible and dynamic partitioning. Second, in order to mitigate the effect of slow network speed, compressed columns are again essential, along with efficient transfers using optimal payload packets and overlapping of network transfers with processing. While these properties are easily understood, the implementation requires careful up-front design and cannot be bolted on later.

This is why most analytical engines stop short of solving the performance layer. An engine that does solve it eliminates the DBA interventions entirely. No sort keys. No hash distribution choices. No clustering keys to maintain.

Why this matters now, for three distinct industries

The reason this is worth writing about in 2026, rather than 2020 or 2030, is that three industries are simultaneously confronting the same downstream consequence.

Services firms face AI-driven pressure on their labor-leverage business model. They need software IP that produces non-linear, managed-service revenue, and analytical engines are the natural adjacency. HCL's acquisition of Actian in 2018 was the early instance of this play; the playbook is now visible to every services-firm CEO who reads industry analysis.

Storage vendors face commoditization of their core business and hyperscaler competition. Their customers' analytical workloads defect to Snowflake or Redshift, capturing analytics spend that should have stayed on the storage vendor's platform. NetApp's acquisition of Instaclustr in 2022, and VAST Data's build-out of VAST DB, are responses to the same observation.

Specialized clouds, particularly GPU and AI clouds, face the same defection problem at the analytical-workload layer. AI customers need data-prep workloads at scale; today that prep happens on Snowflake or Databricks outside the specialized cloud, even though the customer's data and compute live inside it.

All three industries are looking, today, for the same kind of asset: an analytical engine that runs as a fully managed service in the operating pattern they already have, on the customers they already serve. That engine has to solve both layers of management, because solving only the first one is what created the gap in the first place.

Closing

XtremeData has built dbX as exactly this kind of engine. It is in production at Walmart, running a 25 TB analytical data warehouse on Microsoft Azure, and at Shirley Ryan AbilityLab, supporting predictive outcomes management on Cerner EMR data on AWS. In both deployments, the property that matters most is the same: the engine performs without DBAs.

If you are a services firm, a storage vendor, or a specialized cloud thinking about what managed-analytics offering would extend your business, we'd welcome the conversation.

Ravi Chandran is co-founder and CEO of XtremeData. Reach him at This email address is being protected from spambots. You need JavaScript enabled to view it..