Theseus for RAG Workflows

Maximize Data Freshness and Tokenize on the Fly

Instantly tokenize and search live data at query runtime to always retrieve the freshest insights and power enterprise-scale RAG with zero overhead.

Embed vector search directly in SQL via UDFs and eliminate external pipelines and orchestration.

Scale to large-sized datasets with GPU acceleration, delivering results in seconds, not hours.

Feed live production data as structured context to LLMs for up-to-the-minute, domain-specific insights.

Build RAG Pipelines
with SQL Statements

Python
1k = 100
2user_question = "Where are earthquakes causing damage?"
3
4result = con.sql(f"""
5SELECT
6 source_url, source_text, rag.find_nearest_neighbor_distances(
7 embedding, '{user_question}'
8 ) AS distance_result
9FROM gdelt_text_embeddings
10ORDER BY
11distance_result ASC
12LIMIT {k}
13""").to_pyarrow()
14agent_response = chat.ask(user_question, result['source_text'])
15
16pprint(agent_response[0].as_py())

Drawbacks of Traditional RAG Approaches

Traditional RAG setups require heavy Python orchestration and external vector stores (e.g., Pinecone, FAISS, Chroma), which complicates SQL integration and drives up retrieval costs at scale.

Introducing SQL operations like joins, sorts, aggregations, or filters across multiple sources deteriorates RAG pipeline performance.

Advantages of Using Theseus

Theseus leverages SQL-native vector search and GPU-accelerated query performance for production-scale, structured, and performance-critical applications.



TheseusOthers
Retrieval MethodSQL dialect, structured & vectorSimilarity search with embeddings
InfrastructureGPU-accelerated, SQL-native enginePython libraries, vector DBs
ScalePetabyte scale, structured and semi-structuredDocument-level, small-to-medium scale
Target UserData engineers, SQL analystsAI developers, data scientists
Use CasesEnterprise analytics, SQL pipelinesDocument retrieval, chatbots, QA systems

Example RAG Pipeline with JIT Tokenization and Embedding

Example RAG Pipeline with JIT Tokenization and Embedding

1

Pull in RAW data into GPU memory (CSV, Parquet, JSON)

for example, news articles with metadata and URL to news source

2

Generate embeddings in situ

scrape text from news articles and generate embeddings with a GPU tokenizer using tools like Hugging Face and Triton Inference

3

Search embeddings in situ

use a vector search tool or library like Pinecone, Quadrant, AstraDB or NVIDIA cuVS to search embeddings for articles relevant to the question asked.

4

Inference/LLM in situ

feed relevant articles alongside the user question and generate a response using Langchain, RaySegve or AWS Bedrock.

Contact us to get started

We'll help you start using Theseus and accelerate your data processing workflows.