Complexity Simplified

Engine for high-speed scanning and transforming data at scale

Collect detailed metadata in single-pass
Easily customizable via UDF
No learning curve: PostgreSQL-compatible SQL

Highlights

Simplicity: No requirements for query-aware data models (star-schema) or data placement (distribution key, sort key): performance out-of-the-box
Scale: Scale up or out, indpendently scale compute and storage
Speed: Very fast ingest and execution of complex SQL on high volume data

Parallel Everything Architecture

Like many competitors, XtremeData began with an open-source database software package. But unlike others, we then re-engineered the core query execution code with a truly parallel, vectorized SQL engine developed from first principles. The reasons for this are simple. Legacy database software, including all open-source packages, were developed decades ago and are not optimized for the key computing resources of today: many-core CPUs, large amounts of memory and high-speed networks.

Unlike “federated” systems, where multiple complete instances of a database run in parallel, XtremeData offers a single instance of a database that within itself contains a truly parallel SQL execution engine. The core software layer manages all peer-to-peer communication and data exchange between nodes. It has been designed to excel at what other databases find difficult or impossible to do: handle big data issues of complex SQL against complex schema. XtremeData is data model agnostic and does not require careful data partitioning or placement to deliver performance. This enables us to excel at performing complex n-way joins and aggregates against multiple big tables, at scales of 1-100's of TB.

At XtremeData we have benchmarked our engine against federation-based competitors and also against “NoSQL” solutions like Hive, and measured performance gains of 10x. What does this mean? Put simply, the federated systems will need 10x the hardware resources in order to match XtremeData.

Scale vertically and horizontally to meet workload requirements

Decoupled from hardware, logical nodes can be mapped onto differently sized physical nodes, with different resources, such as the number of cores and disks. This enables customizing CPU-to-I/O ratios. And, systems can be scaled one node at a time to support growth.

Vector Execution Model

XtremeData is built to leverage the technology of today and tomorrow.

The core engine uses a vector execution model to maximize the payload at every stage of the processing and minimize the overheads. This model is fundamentally different from the row-at-a-time execution model of traditional database engines. Legacy engines, open-source as well as proprietary, were all developed during computing eras where CPU cycles, memory and network bandwidth were all in short supply.

These constraints drove the architecture of the execution engines towards operating on a few rows at a time and minimizing Network traffic.

Since these hardware constraints no longer apply, XtremeData architecture leverages modern processor capacity. The dbX software stack is highly multi-threaded which allows it to leverage as many CPU cores are available. The software operates on large vectors of rows and all external I/O to Disks or Network is performed via large memory buffers to minimize overheads and maximize throughput.

Dynamic Data Redistribution

Legacy databases were architected to minimize data movement and exchange. This legacy remains in today's market solutions and has manifested as a significant performance penalty when data is exchanged between nodes in a parallel system. This penalty has imposed implementation constraints: data models that are query aware, and data placement that is sympathetic to queries. This has serious consequences and has resulted in large amounts of time, effort and money being expended in developing and supporting one off point solutions. What’s worse is these point solutions are still unable to support newer types of analyses and ad-hoc queries.

XtremeData implements an innovative and highly efficient system for dynamically redistributing data as needed, using industry standard network technology. Data exchanges between nodes occur as peer-to-peer transient transfers at runtime for query needs. The data exchanges are carefully pipelined with processing stages, such that the transfer times on the network are effectively hidden and do not significantly affect query execution time. XtremeData ensures that all joins perform at near the speeds of co-located joins. Dynamic data redistribution eliminates the need for query aware data models and sympathetic placement of data, thus significantly reducing implementation time, labor and costs.

XtremeData allows users to simply "load and go" with any data model and any placement. No longer does a team of DBAs need to fully understand the usage patterns and try to match the placement with queries to obtain performance. XtremeData provides high performance out of the box, at all scales.

SQL Acceleration Model

In addition to the vector-oriented execution model, XtremeData implements acceleration of SQL operators using modern techniques such as real-time code generation and just-in-time compilation. Highly optimized libraries have been built for the key operators required to implement SQL query plans.

These optimized libraries take full advantage of modern CPUs:

many cores
multi-level cache architecture
and internal SIMD (Single Instruction Multiple Data) vector units

How can we help you?

This email address is being protected from spambots. You need JavaScript enabled to view it.
847.871.0379

Have A Question? Call Us Today. 847.871.0379 Contact

Complexity Simplified

Engine for high-speed scanning and transforming data at scale

Highlights

Parallel Everything Architecture

Scale vertically and horizontally to meet workload requirements

Vector Execution Model

Dynamic Data Redistribution

SQL Acceleration Model

How can we help you?

Company

Deployment

Product

Resources

Support