ScyllaDB University Live | Free Virtual Training Event
Learn more
ScyllaDB Documentation Logo Documentation
  • Deployments
    • Cloud
    • Server
  • Tools
    • ScyllaDB Manager
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
  • Drivers
    • CQL Drivers
    • DynamoDB Drivers
  • Resources
    • ScyllaDB University
    • Community Forum
    • Tutorials
Install
Ask AI
ScyllaDB Docs Vector Search Build a RAG chatbot with ScyllaDB

Build a RAG chatbot with ScyllaDB¶

This tutorial shows you how to build a Retrieval-Augmented Generation (RAG) chatbot using ScyllaDB, Ollama, and LlamaIndex.

The chatbot runs in your terminal and lets you ask questions about ScyllaDB documentation.

Source code is available on GitHub.

Prerequisites¶

  • ScyllaDB Cloud account

  • Python 3.9 or newer

Install Python requirements¶

  1. Create and activate a new Python virtual environment.:

    virtualenv env && source env/bin/activate
    
  2. Install requirements:

    pip install -r requirements.txt
    

    Including:

    • ScyllaDB Python driver

    • LlamaIndex

    • Ollama

    • SpaCy

Set up ScyllaDB as a vector store¶

  1. Create a new ScyllaDB Cloud instance with vector search enabled.

  2. Create config.py and add your database connection details (hosts, username, password, etc…):

    SCYLLADB_CONFIG = {
        "hosts": ["node-0.aws-us-east-1.xxxxxxxxxxx.clusters.scylla.cloud",
                "node-1.aws-us-east-1.xxxxxxxxxxx.clusters.scylla.cloud",
                "node-2.aws-us-east-1.xxxxxxxxxxx.clusters.scylla.cloud"],
        "port": "9042",
        "username": "scylla",
        "password": "passwd",
        "datacenter": "AWS_US_EAST_1",
        "keyspace": "rag"
    }
    
  3. Create migrate.py:

    import os
    from scylladb import ScyllaClient
    
    client = ScyllaClient()
    session = client.get_session()
    
    def absolute_file_path(relative_file_path):
        current_dir = os.path.dirname(__file__)
        return os.path.join(current_dir, relative_file_path)
    
    print("Creating keyspace and tables...")
    with open(absolute_file_path("schema.cql"), "r") as file:
        for query in file.read().split(";"):
            if len(query) > 0:
                session.execute(query)
    print("Migration completed.")
    
    client.shutdown()
    

    This migration script creates a keyspace, a table for text chunks and embeddings, and a vector index for similarity search in ScyllaDB:

    CREATE KEYSPACE rag WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': '3'} AND TABLETS = {'enabled': 'false'};
    
    CREATE TABLE rag.chunks (
        chunk_id UUID PRIMARY KEY,
        text TEXT,
        embedding vector<float, 768>
    ) WITH cdc = {'enabled': 'true'};
    
    
    CREATE INDEX IF NOT EXISTS ann_index ON rag.chunks(embedding)
    USING 'vector_index'
    WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };
    

Download documentation files from GitHub¶

For this example, you will use documentation stored in the ScyllaDB GitHub repository (.md and .rst files).

  1. Create a shell script (./download_docs.sh) to download files only from the scylladb/docs folder:

    git clone --no-checkout --depth=1 --filter=tree:0 \
    https://github.com/scylladb/scylladb.git
    cd scylladb
    git sparse-checkout set --no-cone /docs
    git checkout
    

After running this script, the documents will be saved in scylladb/docs folder locally. This folder will be used by the RAG ingestion component in the next step.


Build a complete RAG application¶

In this step, you’ll build a complete RAG application including loading documents, chunking, embedding, storing, and retrieval.

1. ScyllaDB client¶

ScyllaDB acts as a persistent store for the document chunk embeddings, enabling scalable vector storage and semantic search.

  1. Create a helper module called scylladb.py to insert data, and query results ScyllaDB:

    from cassandra.cluster import Cluster, ExecutionProfile, EXEC_PROFILE_DEFAULT
    from cassandra.policies import DCAwareRoundRobinPolicy, TokenAwarePolicy
    from cassandra.auth import PlainTextAuthProvider
    from cassandra.query import dict_factory
    import config
    
    class ScyllaClient():
    
        def __init__(self):
            scylla_config = config.SCYLLADB_CONFIG
            self.cluster = self._get_cluster(scylla_config)
            self.session = self.cluster.connect(scylla_config["keyspace"])
    
        def __enter__(self):
            return self
    
        def __exit__(self, exc_type, exc_value, traceback):
            self.shutdown()
    
        def shutdown(self):
            self.cluster.shutdown()
    
        def _get_cluster(self, config: dict) -> Cluster:
            profile = ExecutionProfile(
                load_balancing_policy=TokenAwarePolicy(
                        DCAwareRoundRobinPolicy(local_dc=config["datacenter"])
                    ),
                    row_factory=dict_factory
                )
            return Cluster(
                execution_profiles={EXEC_PROFILE_DEFAULT: profile},
                contact_points=config["hosts"],
                port=config["port"],
                auth_provider = PlainTextAuthProvider(username=config["username"],
                                                    password=config["password"]))
    
        def print_metadata(self):
            for host in self.cluster.metadata.all_hosts():
                print(f"Datacenter: {host.datacenter}; Host: {host.address}; Rack: {host.rack}")
    
        def get_session(self):
            return self.session
    
        def insert_data(self, table, data: dict):
            columns = list(data.keys())
            values = list(data.values())
            insert_query = f"""
            INSERT INTO {table} ({','.join(columns)}) 
            VALUES ({','.join(['%s' for c in columns])});
            """
            self.session.execute(insert_query, values)
    
        def query_data(self, query, values=[]):
            rows = self.session.execute(query, values)
            return rows.all()
    

2. Document ingestion¶

  1. Create a file called scylla_rag.py with the following content:

    from llama_index.core.node_parser import (
        SemanticDoubleMergingSplitterNodeParser,
        LanguageConfig,
    )
    from llama_index.core import SimpleDirectoryReader
    
    class ScyllaRag():
    
  2. Add the create_chunks() function and implement document loading first:

         def create_chunks(self, dir_path: str, files_limit=1):
             documents = SimpleDirectoryReader(input_dir=dir_path,
                                             recursive=True,
                                             num_files_limit=files_limit,
                                             required_exts=[".md", ".rst"],
                                             exclude_empty=True,
                                             exclude_hidden=True).load_data()
             # Filter out docs with no text
             documents = [doc for doc in documents if doc.text.strip()]
    
  3. Then split the documents into semantically meaningful chunks:

        def create_chunks(self, dir_path: str, files_limit=1):
            documents = SimpleDirectoryReader(input_dir=dir_path,
                                            recursive=True,
                                            num_files_limit=files_limit,
                                            required_exts=[".md", ".rst"],
                                            exclude_empty=True,
                                            exclude_hidden=True).load_data()
            # Filter out docs with no text
            documents = [doc for doc in documents if doc.text.strip()]
    
            splitter = SemanticDoubleMergingSplitterNodeParser(
                language_config=LanguageConfig(spacy_model="en_core_web_md"),
                initial_threshold=0.4, # merge sentences to create chunks
                appending_threshold=0.5, # merge chunk to the following sentence
                merging_threshold=0.5, # merge chunks to create bigger chunks
                max_chunk_size=2048,    
            )
            return splitter.get_nodes_from_documents(documents, show_progress=True)
    

3. Embedding generation¶

  1. Add a function that turns a text chunk into embedding uising Ollama:

    import ollama
    EMBEDDING_MODEL = "hf.co/CompendiumLabs/bge-base-en-v1.5-gguf"
    def create_embedding(self, content):
        return ollama.embed(model=self.EMBEDDING_MODEL, input=content)["embeddings"][0]
    
  2. Add function that inserts each chunk and its embedding into the ScyllaDB table created earlier:

    def vectorize(self, nodes, target_table: str):
        db_client = ScyllaClient()
        for node in nodes:
            chunk_id = uuid.uuid4()
            text = node.get_content()
            embedding = self.create_embedding(text)
            db_client.insert_data(target_table, {"text": text,
                                                "chunk_id": chunk_id,
                                                "embedding": embedding})
    

4. Retrieval and semantic search¶

  1. Implement function that searches ScyllaDB for the most relevant chunks based on the user question:

    def fetch_chunks(self, table: str, user_query: str, top_k=5):
        db_client = ScyllaClient()
        user_query_embedding = self.create_embedding(user_query)
        db_query = f"""SELECT chunk_id, text
                    FROM {table} 
                    ORDER BY embedding ANN OF %s LIMIT %s;
                   """
        values = [user_query_embedding, top_k]
        return db_client.query_data(db_query, values)
    
  2. Add function that executes the request towards the LLM (combining the user’s question with the retrieved chunks):

    def query_llm(self, user_query: str, chunks: list[str]) -> str:
        context = ""
        for i, chunk in enumerate(chunks):
            context += f"\n\n Item {i+1}: {chunk}"
        system_prompt = f"""You are an AI assistant that answers user 
        questions by combining your reasoning ability with the information 
        provided below: \n
        {context}
        """
        stream = ollama.chat(
            model=self.LANGUAGE_MODEL,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_query},
            ],
            stream=True,
        )
        print("Chatbot response:")
        for chunk in stream:
            print(chunk["message"]["content"], end="", flush=True)
    
  3. Finally, putting it all toghether:

    if __name__ == "__main__":
        scylla_rag = ScyllaRag()
    
        # ingest documents (only needs to run once)
        # nodes = scylla_rag.create_chunks("../scylladb/docs", files_limit=200)
        # scylla_rag.vectorize(nodes, target_table="rag.chunks")
    
        user_input = input("What do you want to know about ScyllaDB? ")
    
        nodes = scylla_rag.fetch_chunks("rag.chunks", user_input, top_k=3)
    
        chunks = [node['chunk_id'] for node in nodes]
        print("Retrieved chunks:", chunks)
    
        scylla_rag.query_llm(user_input, [node["text"] for node in nodes])
    

The complete RAG application file is available on GitHub.

Relevant resources¶

  • ScyllaDB Cloud

  • ScyllaDB Documentation

  • Ollama

  • LlamaIndex

Was this page helpful?

PREVIOUS
ScyllaDB Vector Search
  • Create an issue
  • Edit this page

On this page

  • Build a RAG chatbot with ScyllaDB
    • Prerequisites
    • Install Python requirements
    • Set up ScyllaDB as a vector store
    • Download documentation files from GitHub
    • Build a complete RAG application
      • 1. ScyllaDB client
      • 2. Document ingestion
      • 3. Embedding generation
      • 4. Retrieval and semantic search
    • Relevant resources
Vector Search
  • main
  • Build a RAG Chatbot
  • GitHub repository
Docs Tutorials University Contact Us About Us
© 2025, ScyllaDB. All rights reserved. | Terms of Service | Privacy Policy | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.
Last updated on 10 Sep 2025.
Powered by Sphinx 7.4.7 & ScyllaDB Theme 1.8.8
Ask AI