ScyllaDB Vector Search, A New Chapter for Real-Time AI Infrastructure

ScyllaDB Vector Search, Millisecond Vector Retrieval for Real-Time AI at Scale


Introduction

Real-time AI is now a business requirement. Retrieval-augmented generation, semantic search, personalisation, fraud signals, recommendations. These workloads demand fast reads, predictable latency, and scale without disruption.

ScyllaDB has long been trusted for low-latency transactional data. With its new vector search engine, ScyllaDB brings millisecond vector retrieval to the same environment, a strong step toward unified AI infrastructure.


Why ScyllaDB Built Vector Search

Teams building AI features often coupled ScyllaDB with a separate vector database for similarity search. That solved one problem but created others, extra systems, cost, and latency overhead.

ScyllaDB’s goal was clear: deliver performance, accuracy, and scale while removing operational complexity. The result is an integrated vector search engine that fits naturally with ScyllaDB’s real-time workloads.


How the Design Works

ScyllaDB avoided embedding HNSW directly inside the database core. Instead, it built a dedicated Vector Store in Rust, placed next to each ScyllaDB replica in the same availability zone.

ScyllaDB stores vectors and metadata; the Vector Store reads data through Change Data Capture (CDC), builds indexes, and serves similarity results over HTTP.
Clients simply query ScyllaDB using CQL and the ANN OF clause, ScyllaDB handles the heavy-lifting behind the scenes.

This design keeps ingestion fast while the Vector Store processes intensive ANN queries asynchronously.


What’s Different for Operators

The database and Vector Store scale independently.
You can fine-tune hardware per role, storage-optimised for SSTables, RAM-optimised for the Vector Store. Traffic remains local to the availability zone, keeping network costs under control.
Regular queries stay predictable, while ANN workloads scale separately.


Performance Highlights

In benchmark tests, ScyllaDB’s vector engine delivered outstanding numbers:

    • 65 000 QPS (P99 < 20 ms) on openai_small_50k

    • 12 000 QPS (P99 < 40 ms) on laion_large_100m

    • Over 97 % recall accuracy, even under heavy concurrency

These results confirm ScyllaDB’s consistency in low-latency, high-throughput workloads.





Tuning Findings That Matter in Production

ScyllaDB’s engineers discovered that TCP delayed ACK combined with Nagle’s algorithm inflated latency.
Disabling Nagle (TCP_NODELAY) dropped latencies from around 50 ms to single-digit milliseconds.

They also tested thread layouts within Rust’s Tokio runtime.

    • Async-only delivered the highest throughput.

    • Mixed async/sync reduced latency under concurrency.

    • Yielding tasks before heavy compute further improved P99 stability.

These tuning insights matter when squeezing maximum performance from real-time AI systems.


Working with the Vector Type and ANN Queries

Creating a keyspace, table, and index is straightforward.
You define a vector column, create a custom index, and query with ANN OF to find the nearest neighbours.


Client-Side Tip: Remove Extra Latency

When using the Java driver with Netty, enable TCP_NODELAY to avoid waiting for delayed acknowledgements.


Optional: Calling the Vector Store Directly

While Vector Store runs internally, the call pattern is simple.
Here’s a clean Rust client sketch that performs an ANN request over HTTP.

Important Note: In-Memory Indexing in the Beta

In this beta, vector indexes are stored fully in memory. This design enables ScyllaDB to achieve sub-10 ms responses, and for optimal performance, the Vector Store keeps all indexes in memory. This means the entire index needs to fit into a single node’s RAM.

For example, with a benchmark of 100 million vectors at 768 dimensions, the raw vectors alone require approximately 307 GB, and with HNSW index overhead, the real-world memory footprint is around 333 GB. This approach suggests a need for substantial RAM-optimized machines for each index node to support very large datasets.

It would be great to see the future versions to include hybrid in-memory/ disk indexing, which will offer expanded capacity while maintaining speed.

What This Means for Enterprises

For teams building search layers for RAG, personalisation, or fraud detection, this launch is significant.

It brings: 

    • Millisecond-level retrieval

    • Predictable latency at scale

    • Integrated management for vector + tabular data

    • Reduced complexity and cost

    • Familiar CQL interface for existing teams

ScyllaDB’s approach simplifies AI-driven workloads and reduces operational friction.


Where to Start: A Short Checklist

    • Define target k, acceptable P99, and required recall.

    • Map embedding dimensions to RAM per node.

    • Decide on similarity function and tuning parameters.

    • Enable TCP_NODELAY in client drivers.

    • Run benchmarks with your own data distribution.

    • Validate concurrency behaviour, then tune for cost and scale.

    • Plan for future hybrid indexing once available.


Datanised Insight

At Datanised, we view this as an important move toward unifying transactional, analytical, and AI-driven workloads under one platform.

The architecture is elegant, the early metrics impressive. However, the in-memory limitation is a real operational factor. Teams must plan capacity and shard strategy carefully.

Our experts are already helping enterprises evaluate ScyllaDB Vector Search, designing, tuning, and optimising configurations across ScyllaDB, Cassandra, MongoDB, PostgreSQL, and modern streaming systems.

If you’re exploring AI-ready NoSQL architectures, we can help you assess, design, and optimise the right setup for your workloads.

Ricardo Gil

Writer & Blogger

Leave a Reply

Your email address will not be published. Required fields are marked *

Most Recent Posts

  • All Posts
  • Data Sovereignty
  • Data Strategy & Architecture
  • Database Technologies
  • Expert Playbooks & Best Practices
  • News, Events & Product Updates
  • Scaling & Performance Engineering
  • Uncategorized

IFZA Business Park, Building A1, Dubai Silicon Oasis, Dubai, UAE

Office: +971 50 532 0988
info@datanised.com

Subscribe to Datanised Updates.

Stay updated on new releases and features, guides, and community updates.
Copyright © 2025 Datanised Limited