Save Time. Save Energy. Save Space. Save Money.
90x
Performance Increase.
Jobs done in 2 minutesrather than 3 hours.
100x
Decrease in Server Count.
Go from 200 servers down to 2.
100x
Cost Savings.
Spend $2 rather than$200 to run the same job.
Theseus redefines the data processing SPACE - Scale, Performance, and Cost Efficiency. Learn how big data systems perform as you scale out infrastructure and normalize on cost.
Get Started with Theseus Today
Request a DemoAccelerate Data Analytics on GPUs
Theseus, the fastest SQL Engine for large-scale ETL, from 30 TBs to PB and beyond. Squash data silos and eliminate bottlenecks in critical data pipelines.
Maximize GPU Utilization
Maximize your investment and extend the effective life of all GPUs - slots into your existing stack.
Deploy Anywhere
Deploy on-premise or in the cloud where Kubernetes is supported. Your Data. Your Kubernetes. Your Authentication.
Raw Data-Ready
Indexing is not required. Suitable for tabular machine or log data. No sorting, indexing, caching, or warm-up runs (cold queries).
Data Preprocessing for Machine Learning and GenAI
Accelerate preprocessing tasks and scale beyond big data bottlenecks by transforming, sampling, and labeling ML workloads with the massive parallelism of multiple distributed GPUs.
Built Composable with No Lock-in
Leverages open source standards like Arrow and Ibis. Swap between DuckDB and Snowflake, and back to Theseus when needed without changing a single line of code.
Financial Services
For teams developing large-scale data platforms to support internal customers, Theseus offers robust solutions for managing complex and costly workloads used in prediction, simulations, and strategic planning, while adhering to strict SLAs.
-
Don’t make compromises such as downsampling datasets and developing custom workflows.
-
Increase workload processing speed and capability without increasing infrastructure.
-
Handle more sophisticated modeling, tune across a broader spectrum of data, conduct extensive stress tests, and accurately predict the impact of social, economic, and environmental factors on portfolios.
Cybersecurity
For security teams overwhelmed by excessive alerts where UEBA falls short, Theseus offers a unique advantage with its hardware-accelerated processing capabilities - to find the needle in the haystack. Voltron Data provides a unique edge with hardware-accelerated detection capabilities.
-
Analyze vast amounts of case data simultaneously, identifying and addressing complex, long-standing threats within the network, thereby reducing overall enterprise risk.
-
Store less data in the SIEM and leverage Theseus for large-scale data processing to reduce costs, gain visibility, and reduce risk.
-
Eliminate the need for specialized tool-specific skills and hire analysts with broad cyber and IT expertise.
We help companies integrate open source standards
We are thrilled to be working with Voltron Data. G-Research has long recognized the potential of Apache Arrow and helped to fund its early development by sponsoring Ursa Labs. With Arrow underpinning some of our most crucial production systems at G-Research, we know we can count on the team at Voltron Data to provide world-class support.
Alex Scammon
Head of Open Source Development, G-Research
Pedro Pedreira
Software Engineer, Meta
Snowflake adopted Arrow for faster database access clients, keeping the data in standard columnar data structures all the way, and now added ADBC support for cross-language API support working closely with the Voltron Data Team. Voltron Data helps enterprises design and build composable data systems with open standards like ADBC, Arrow, Ibis, Substrait, and more.
Anurag Gupta
Product Manager, Snowflake
FAQ
Is Theseus right for your organization?
Anywhere that supports Kubernetes. Theseus can be deployed on-premise or in the cloud where Kubernetes is supported. We regularly deploy on GKE, EKS, and AKS, but as an accelerator-native engine, our customers who deploy their technology on-premise have the greatest TCO and performance gains.
Theseus is not a database. Theseus doesn’t store data. Theseus works with data in the Arrow memory format – so long as your data speaks Arrow, Theseus can operate on it wherever it is. Theseus can read CSV files from network-attached storage, parquet files from s3, ORC, AVRO, and data warehouses like Snowflake and beyond.
Theseus delivers exceptional value for queries exceeding 30TB. There are numerous products and open-source projects for data under 30TB. We understand you have workloads below and above 30TB, so we've designed Theseus to work alongside other solutions through our supported open-source projects and standards.
If your team likes Python, they can leverage Ibis, a popular data frame library that supports over 20 different engines (including Theseus). If your team likes SQL, Theseus can support a myriad of different dialects, from PostgreSQL to Standard SQL (function support notwithstanding) via Ibis.
If you wrote your Spark jobs in Ibis (which supports Spark), then yes; otherwise, not yet.
On the largest queries, it’s expensive not to run on GPUs. As our benchmark report demonstrates, the costs for incredibly large queries quickly balloon out of control. Queries on GPUs can dramatically reduce costs. Moreover, the initial investment in GPUs for AI/ML use cases, where GPUs are often underutilized, can be repurposed for analytics. Move your most demanding analytics workloads to GPUs and get more value out of your investment. Our benchmarks show analytics workloads are dramatically less expensive on Theseus compared to CPU-native engines — over 50x cheaper than Spark.
We support all NVIDIA data center class GPUs from Volta onwards. The more HBM on the GPU, the better.