• LeadingPerformance2

  • softwareonly
  • datamarts
  • QUANTUM IMPROVEMENT IN BIG DATA ELT
  • PEAK PERFORMANCE ON COMPLEX AD HOC QUERIES
  • CONTINUOUS INGEST

Parallel Everything Architecture

XtremeData, like almost every other vendor recently, began development by starting with the base of an open-source database software package. But, unlike every other vendor, XtremeData completely replaced the open-source SQL execution engine with a truly parallel, vectorized SQL engine, developed from first principles. The reason we did this is simple: the open-source engine was developed decades ago and therefore was not optimized for the key computing resources of today: many-core CPUs, large amounts of memory and high-speed networks.

The reason the other vendors did not take our approach is also simple: development of an optimized parallel SQL engine is non-trivial, requires deep expertise, a competent team and significant time. A simple short-cut is to use the open-source engine as-is, and obtain parallelism by creating a "federation" of individual instances of the engine. Higher-level software is written to manage these individual database instances. But there are several disadvantages inherent in this approach. First, as already mentioned, the open-source database engine is limiting in performance: for one, it is "single-threaded" code that cannot effectively leverage today's many-core CPUs.

At XtremeData we have benchmarked our dbX database engine against the open-source engine and measured performance gains of 10x. What does this mean? Simple: federated systems that use the open-source engine will need 10x the hardware resources to match dbX.

The federated approach does provide parallel execution and scalability, but at the price of redundant processing and many choke-points at large scale. One-time processing of each SQL query, such as parsing and compiling into an execution plan, is repeated unnecessarily in each database instance. All communication and data exchange between instances has to be implemented outside the database in higher-levels of software, incurring the significant overheads of entering and exiting the database software stack. The penalty for communication and data exchange between instances is large and therefore end-users are forced to spend considerable labor in carefully crafting the data models, data partitioning and physical placement of the data. These labor costs are often larger than the cost of acquiring the database system!

Unlike federated systems, dbX is a single instance of a database engine with a truly parallel SQL execution engine that manages all peer-to-peer communication and data exchange within its core layer. dbX has been specifically designed to excel at what other databases find difficult or impossible to do: handle Big Data issues of complex SQL against complex schema. dbX is data model agnostic and does not require careful data partitioning or placement to deliver performance. For instance, dbX excels at performing complex n-way joins and aggregates against multiple big tables, at scales of 1-100's of TB.