Voltron Data
When people think about query engines and databases they typically expect there to be some support for concurrent queries as large pipelines and multiple requests hit the engine simultaneously.
In the past we adopted a policy of one session to one cluster (group of GPU workers) per query since the GPU workers were so fast, but spinning up new GPU clusters has setup time costs, and sometimes customers want to submit large batches of queries to the engine runtime all at once (either for one large workload or because of multiple users). This coupled alongside our improved performance at smaller scale means concurrency has become increasingly consequential for our users.
To solve this, we created an abstraction inside Theseus that allows us to create multiple policies for concurrency, the first three of which are:
Concurrency testing and benchmarking are typically presented in a results report ( e.g. page 3 of HPE SF10K TPC-H results report). Therefore, we adopted a similar report structure for our concurrency results.
While not limited to AWS specifically, this experiment was done on AWS. When we did a power run of TPC-H SF10K we were able to run in 403s, and if we ran that 9 times sequentially the best time we would get is 3,627s.
Power Run | Runtime (s) | Runtime x 9 (s) |
---|---|---|
1 | 403 | 3,627 |
As you can see running 9 streams simultaneously provide a 10% improvement on average across the board, and some streams actually run almost 30% faster.
Runtime (s) | % Better than Power Run x 9 | |
---|---|---|
Average Concurrent Run | 3229.64 | 10.96% |
What this means is there is left over GPU, networking, and IO opportunity in Theseus and AWS and we can process large workloads with hundreds or thousands of queries even better.
Concurrency Stream | Runtime (s) | % Better than Power Run x 9 |
---|---|---|
1 | 3,379.99 | 6.81% |
2 | 3,077.37 | 15.15% |
3 | 3,608.19 | 0.52% |
4 | 3,365.73 | 7.20% |
5 | 3,598.78 | 0.78% |
6 | 3,370.95 | 7.06% |
7 | 2,757.30 | 23.98% |
8 | 2,572.95 | 29.06% |
9 | 3,335.45 | 8.04% |
This demonstrates that concurrency is not only supported in Theseus, but can dramatically improve performance. On average, Theseus provides a 10% improvement across all steams, with a peak performance benefit of 30% for top performing streams. Support for concurrent users and jobs removes idle GPU cycles, improves efficiency and reduces costs. We’ll continue to develop new optimizations, run more experiments, and add concurrency tests to our nightly benchmarking.