Theseus for RAG Workflows
Maximize Data Freshness and Tokenize on the Fly
Instantly tokenize and search live data at query runtime to always retrieve the freshest insights and power enterprise-scale RAG with zero overhead.
Embed vector search directly in SQL via UDFs and eliminate external pipelines and orchestration.
Scale to large-sized datasets with GPU acceleration, delivering results in seconds, not hours.
Feed live production data as structured context to LLMs for up-to-the-minute, domain-specific insights.
Build RAG Pipelines
with SQL Statements
1k = 1002user_question = "Where are earthquakes causing damage?"34result = con.sql(f"""5SELECT6 source_url, source_text, rag.find_nearest_neighbor_distances(7 embedding, '{user_question}'8 ) AS distance_result9FROM gdelt_text_embeddings10ORDER BY11distance_result ASC12LIMIT {k}13""").to_pyarrow()14agent_response = chat.ask(user_question, result['source_text'])1516pprint(agent_response[0].as_py())
Drawbacks of Traditional RAG Approaches
Traditional RAG setups require heavy Python orchestration and external vector stores (e.g., Pinecone, FAISS, Chroma), which complicates SQL integration and drives up retrieval costs at scale.
Introducing SQL operations like joins, sorts, aggregations, or filters across multiple sources deteriorates RAG pipeline performance.
Advantages of Using Theseus
Theseus leverages SQL-native vector search and GPU-accelerated query performance for production-scale, structured, and performance-critical applications.
Theseus | Others | ||
---|---|---|---|
Retrieval Method | SQL dialect, structured & vector | Similarity search with embeddings | |
Infrastructure | GPU-accelerated, SQL-native engine | Python libraries, vector DBs | |
Scale | Petabyte scale, structured and semi-structured | Document-level, small-to-medium scale | |
Target User | Data engineers, SQL analysts | AI developers, data scientists | |
Use Cases | Enterprise analytics, SQL pipelines | Document retrieval, chatbots, QA systems |
Example RAG Pipeline with JIT Tokenization and Embedding

Pull in RAW data into GPU memory (CSV, Parquet, JSON)
for example, news articles with metadata and URL to news source
Generate embeddings in situ
scrape text from news articles and generate embeddings with a GPU tokenizer using tools like Hugging Face and Triton Inference
Search embeddings in situ
use a vector search tool or library like Pinecone, Quadrant, AstraDB or NVIDIA cuVS to search embeddings for articles relevant to the question asked.
Inference/LLM in situ
feed relevant articles alongside the user question and generate a response using Langchain, RaySegve or AWS Bedrock.