Memory Guide

Detailed guide on using the MemoryManager and ChromaDBService.

Guide: Persistent Memory & RAG

Karo includes components to equip agents with persistent memory, allowing them to recall information across sessions and leverage external knowledge bases. This is often used for Retrieval-Augmented Generation (RAG).

Core Concepts

  • Vector Database: Stores textual information (memories, document chunks) along with their numerical representations (embeddings). This allows for efficient semantic search – finding information based on meaning rather than just keywords. Karo currently uses ChromaDB for this.
  • Embeddings: Numerical vectors representing the semantic meaning of text. Generated by embedding models (Karo uses OpenAI's text-embedding-3-small by default via ChromaDBService).
  • Memory Record (MemoryRecord): A Pydantic model representing a single piece of information stored in the memory (text content, metadata like source/timestamp, optional importance score). Defined in karo.memory.memory_models.
  • ChromaDB Service (ChromaDBService): A low-level service (karo.memory.services.chromadb_service) that handles the direct connection and interaction with the ChromaDB database (local or remote), including adding documents and performing vector similarity searches. It also manages the embedding function.
  • Memory Manager (MemoryManager): A higher-level abstraction (karo.memory.memory_manager) that provides a simpler interface for the agent to interact with the memory system. It uses the ChromaDBService internally to perform operations like adding memories (add_memory) and retrieving relevant ones (retrieve_relevant_memories).

How it Works (RAG Example)

The most common use case for the memory system is Retrieval-Augmented Generation (RAG), where the agent retrieves relevant information from a knowledge base before answering a user's query.

  1. Ingestion (Offline Step)

  • First, you need to populate the memory store (ChromaDB) with your knowledge base documents (e.g., FAQs, policies, product manuals).
  • This typically involves a separate script, which you can create by inheriting from karo/utils/base_ingestion_script.py. This base script provides a template for:
    • Loading environment variables (e.g., OPENAI_API_KEY).
    • Initializing ChromaDBService and MemoryManager.
    • Using DocumentReaderTool to read files from a directory.
    • Basic chunking strategy.
    • Adding document chunks to the MemoryManager with metadata.
  • To create your own ingestion script:
    1. Copy karo/utils/base_ingestion_script.py to your project (e.g., into a 'scripts' directory).
    2. Modify the CONFIGURATION section to point to your knowledge base directory, database path, and collection name.
    3. Customize the chunking strategy and metadata as needed.
    4. Ensure necessary dependencies are installed (karo, python-dotenv, pypdf, python-docx).
    5. Set your OPENAI_API_KEY (or other provider key) in a .env file accessible from where you run the script.
    6. Run the script: python path/to/your/copied_ingestion_script.py
  • During storage, the ChromaDBService automatically uses its configured embedding function (defaulting to OpenAI's text-embedding-3-small) to create a vector embedding for each chunk. The text, metadata, and embedding are stored together in ChromaDB.
  • This ingestion process only needs to be run once initially and then again whenever your knowledge base documents change.
  1. Runtime (Agent Interaction):
    • A user asks the agent a question (e.g., "What's the return policy for electronics?").

    • The BaseAgent (if configured with a MemoryManager in its BaseAgentConfig) automatically triggers the retrieval step before calling the LLM.

    • It calls memory_manager.retrieve_relevant_memories(), passing the user's question as the query_text.

    • MemoryManager uses ChromaDBService to:

      • Generate an embedding for the user's query text using the same embedding model used during ingestion.
      • Perform a similarity search in the ChromaDB collection, finding stored chunks whose embeddings are semantically closest to the query embedding.
    • The service returns the top N most relevant chunks (as configured by memory_query_results in BaseAgentConfig, default is 3).

    • BaseAgent receives these MemoryQueryResult objects.

    • It uses the SystemPromptBuilder (via _create_initial_prompt) to format the retrieved text chunks (e.g., mem.record.text) and includes them as context within the system prompt sent to the LLM. A typical format might look like:

      You are a helpful assistant...
      
      ## Relevant Previous Information
      -----------------------------
      - (YYYY-MM-DD HH:MM UTC): Relevant text chunk 1...
      - (YYYY-MM-DD HH:MM UTC): Relevant text chunk 2...
      
      ## Available Tools
      -----------------
      - tool_name_1: description...
      
    • The LLM receives the user's question along with the relevant context retrieved from the knowledge base.

    • The LLM generates an informed answer based on both the question and the provided context (e.g., "According to our return policy document, electronics can be returned within 15 days if unopened...").

    • The BaseAgent returns this final response.


Using the Memory System in Your Agent

  1. Initialize Components:

    • Ensure you have chromadb installed (pip install chromadb).
    • Import necessary classes: ChromaDBService, ChromaDBConfig, MemoryManager.
    • Load your OpenAI API key (needed for embeddings) via .env or other means.
    • Configure ChromaDBConfig, specifying at least the path for local storage and a collection_name.
    • Instantiate the service and then the manager.
    from karo.memory.services.chromadb_service import ChromaDBService, ChromaDBConfig
    from karo.memory.memory_manager import MemoryManager
    from dotenv import load_dotenv
    import os
    
    load_dotenv() # Load OPENAI_API_KEY
    
    # Configure ChromaDB
    db_path = "./my_agent_karo_db" # Choose a persistent path
    collection = "main_kb"
    chroma_config = ChromaDBConfig(path=db_path, collection_name=collection)
    
    try:
        chroma_service = ChromaDBService(config=chroma_config)
        memory_manager = MemoryManager(chroma_service=chroma_service)
        print(f"Memory system initialized using DB at: {db_path}")
    except Exception as e:
        print(f"Failed to initialize memory system: {e}")
        memory_manager = None # Handle error
    
  2. Configure Agent:

    • When creating your BaseAgentConfig, pass the initialized memory_manager instance to the memory_manager argument.
    • You can also set memory_query_results to control how many chunks are retrieved.
    from karo.core.base_agent import BaseAgent, BaseAgentConfig
    # ... other imports (provider, prompt_builder, tools)
    
    if memory_manager: # Only configure if initialization succeeded
        agent_config = BaseAgentConfig(
            provider=my_provider,
            prompt_builder=my_prompt_builder,
            memory_manager=memory_manager, # Assign the manager
            memory_query_results=5 # Retrieve top 5 results
            # tools=...
        )
        agent = BaseAgent(config=agent_config)
    else:
        # Handle case where memory couldn't be initialized
        print("Agent created without memory capabilities.")
        # agent = BaseAgent(...) # without memory_manager
    
  3. Ingest Data (Separate Script):

    • Create a separate script (like ingest_kb.py in the tutorial) to load your documents, potentially chunk them, and use memory_manager.add_memory() to store them. Run this script whenever your knowledge base needs to be created or updated.
  4. Agent Interaction:

    • When you call agent.run(), the agent will automatically perform the retrieval step using the memory_manager and include the results in the prompt sent to the LLM (as handled by BaseAgent._create_initial_prompt and SystemPromptBuilder).

Memory Tools (KnowledgeBaseQueryTool)

While the BaseAgent handles automatic retrieval for RAG context, you might also want tools that explicitly interact with memory. The tutorial example includes a KnowledgeBaseQueryTool which uses the MemoryManager internally. This allows the LLM to decide to query the KB explicitly if it needs specific information not covered by the automatic retrieval or if it wants to verify something, providing another layer of control.