Memory Guide
Detailed guide on using the MemoryManager and ChromaDBService.
Guide: Persistent Memory & RAG
Karo includes components to equip agents with persistent memory, allowing them to recall information across sessions and leverage external knowledge bases. This is often used for Retrieval-Augmented Generation (RAG).
Core Concepts
- Vector Database: Stores textual information (memories, document chunks) along with their numerical representations (embeddings). This allows for efficient semantic search – finding information based on meaning rather than just keywords. Karo currently uses ChromaDB for this.
- Embeddings: Numerical vectors representing the semantic meaning of text. Generated by embedding models (Karo uses OpenAI's
text-embedding-3-small
by default viaChromaDBService
). - Memory Record (
MemoryRecord
): A Pydantic model representing a single piece of information stored in the memory (text content, metadata like source/timestamp, optional importance score). Defined inkaro.memory.memory_models
. - ChromaDB Service (
ChromaDBService
): A low-level service (karo.memory.services.chromadb_service
) that handles the direct connection and interaction with the ChromaDB database (local or remote), including adding documents and performing vector similarity searches. It also manages the embedding function. - Memory Manager (
MemoryManager
): A higher-level abstraction (karo.memory.memory_manager
) that provides a simpler interface for the agent to interact with the memory system. It uses theChromaDBService
internally to perform operations like adding memories (add_memory
) and retrieving relevant ones (retrieve_relevant_memories
).
How it Works (RAG Example)
The most common use case for the memory system is Retrieval-Augmented Generation (RAG), where the agent retrieves relevant information from a knowledge base before answering a user's query.
-
Ingestion (Offline Step)
- First, you need to populate the memory store (ChromaDB) with your knowledge base documents (e.g., FAQs, policies, product manuals).
- This typically involves a separate script, which you can create by inheriting from
karo/utils/base_ingestion_script.py
. This base script provides a template for:- Loading environment variables (e.g., OPENAI_API_KEY).
- Initializing ChromaDBService and MemoryManager.
- Using DocumentReaderTool to read files from a directory.
- Basic chunking strategy.
- Adding document chunks to the MemoryManager with metadata.
- To create your own ingestion script:
- Copy
karo/utils/base_ingestion_script.py
to your project (e.g., into a 'scripts' directory). - Modify the CONFIGURATION section to point to your knowledge base directory, database path, and collection name.
- Customize the chunking strategy and metadata as needed.
- Ensure necessary dependencies are installed (karo, python-dotenv, pypdf, python-docx).
- Set your OPENAI_API_KEY (or other provider key) in a .env file accessible from where you run the script.
- Run the script:
python path/to/your/copied_ingestion_script.py
- Copy
- During storage, the
ChromaDBService
automatically uses its configured embedding function (defaulting to OpenAI'stext-embedding-3-small
) to create a vector embedding for each chunk. The text, metadata, and embedding are stored together in ChromaDB. - This ingestion process only needs to be run once initially and then again whenever your knowledge base documents change.
- Runtime (Agent Interaction):
-
A user asks the agent a question (e.g., "What's the return policy for electronics?").
-
The
BaseAgent
(if configured with aMemoryManager
in itsBaseAgentConfig
) automatically triggers the retrieval step before calling the LLM. -
It calls
memory_manager.retrieve_relevant_memories()
, passing the user's question as thequery_text
. -
MemoryManager
usesChromaDBService
to:- Generate an embedding for the user's query text using the same embedding model used during ingestion.
- Perform a similarity search in the ChromaDB collection, finding stored chunks whose embeddings are semantically closest to the query embedding.
-
The service returns the top N most relevant chunks (as configured by
memory_query_results
inBaseAgentConfig
, default is 3). -
BaseAgent
receives theseMemoryQueryResult
objects. -
It uses the
SystemPromptBuilder
(via_create_initial_prompt
) to format the retrieved text chunks (e.g.,mem.record.text
) and includes them as context within the system prompt sent to the LLM. A typical format might look like:You are a helpful assistant... ## Relevant Previous Information ----------------------------- - (YYYY-MM-DD HH:MM UTC): Relevant text chunk 1... - (YYYY-MM-DD HH:MM UTC): Relevant text chunk 2... ## Available Tools ----------------- - tool_name_1: description...
-
The LLM receives the user's question along with the relevant context retrieved from the knowledge base.
-
The LLM generates an informed answer based on both the question and the provided context (e.g., "According to our return policy document, electronics can be returned within 15 days if unopened...").
-
The
BaseAgent
returns this final response.
-
Using the Memory System in Your Agent
-
Initialize Components:
- Ensure you have
chromadb
installed (pip install chromadb
). - Import necessary classes:
ChromaDBService
,ChromaDBConfig
,MemoryManager
. - Load your OpenAI API key (needed for embeddings) via
.env
or other means. - Configure
ChromaDBConfig
, specifying at least thepath
for local storage and acollection_name
. - Instantiate the service and then the manager.
from karo.memory.services.chromadb_service import ChromaDBService, ChromaDBConfig from karo.memory.memory_manager import MemoryManager from dotenv import load_dotenv import os load_dotenv() # Load OPENAI_API_KEY # Configure ChromaDB db_path = "./my_agent_karo_db" # Choose a persistent path collection = "main_kb" chroma_config = ChromaDBConfig(path=db_path, collection_name=collection) try: chroma_service = ChromaDBService(config=chroma_config) memory_manager = MemoryManager(chroma_service=chroma_service) print(f"Memory system initialized using DB at: {db_path}") except Exception as e: print(f"Failed to initialize memory system: {e}") memory_manager = None # Handle error
- Ensure you have
-
Configure Agent:
- When creating your
BaseAgentConfig
, pass the initializedmemory_manager
instance to thememory_manager
argument. - You can also set
memory_query_results
to control how many chunks are retrieved.
from karo.core.base_agent import BaseAgent, BaseAgentConfig # ... other imports (provider, prompt_builder, tools) if memory_manager: # Only configure if initialization succeeded agent_config = BaseAgentConfig( provider=my_provider, prompt_builder=my_prompt_builder, memory_manager=memory_manager, # Assign the manager memory_query_results=5 # Retrieve top 5 results # tools=... ) agent = BaseAgent(config=agent_config) else: # Handle case where memory couldn't be initialized print("Agent created without memory capabilities.") # agent = BaseAgent(...) # without memory_manager
- When creating your
-
Ingest Data (Separate Script):
- Create a separate script (like
ingest_kb.py
in the tutorial) to load your documents, potentially chunk them, and usememory_manager.add_memory()
to store them. Run this script whenever your knowledge base needs to be created or updated.
- Create a separate script (like
-
Agent Interaction:
- When you call
agent.run()
, the agent will automatically perform the retrieval step using thememory_manager
and include the results in the prompt sent to the LLM (as handled byBaseAgent._create_initial_prompt
andSystemPromptBuilder
).
- When you call
KnowledgeBaseQueryTool
)
Memory Tools (While the BaseAgent
handles automatic retrieval for RAG context, you might also want tools that explicitly interact with memory. The tutorial example includes a KnowledgeBaseQueryTool
which uses the MemoryManager
internally. This allows the LLM to decide to query the KB explicitly if it needs specific information not covered by the automatic retrieval or if it wants to verify something, providing another layer of control.