How to Add Memory to LlamaIndex Agents with AI Memory MCP Server (2026)
Last updated: June 9, 2026 ยท 15 min read
LlamaIndex is one of the most powerful frameworks for building data-aware AI agents โ excelling at RAG (Retrieval-Augmented Generation) over private data sources. But when it comes to persistent conversation memory, LlamaIndex's built-in options fall short. In this guide, we'll show you how to add persistent, cross-session memory to your LlamaIndex agents using the AI Memory MCP server.
Why LlamaIndex Agents Need Persistent Memory
LlamaIndex agents leverage context windows from modern LLMs โ typically 128K-200K tokens for models like Claude 3.5 Sonnet or GPT-4o. But even with large context windows, there are fundamental problems:
- Session boundaries: When your application restarts, all conversation history is gone
- Context window limits: Long conversations eventually exceed the window, forcing truncation
- RAG is not memory: LlamaIndex's strength is querying documents, not remembering conversations
- No cross-agent sharing: Agent A cannot access what Agent B learned from a previous conversation
- Cost explosion: Passing full conversation history in every prompt is expensive
Persistent memory solves all of these problems. Your agent can remember relevant context from past conversationswithout loading everything into the context window. This is complementary to LlamaIndex's RAG capabilities โ RAG queries your data, memory captures your interactions.
LlamaIndex's Built-In Memory Options (And Their Limits)
LlamaIndex offers several memory abstractions out of the box:
| Memory Class | How It Works | Limitation |
|---|---|---|
| ChatMemoryBuffer | Stores recent messages up to a token limit | Lost on restart, simple buffer only |
| ChatSummaryMemoryBuffer | Summarizes older messages to save tokens | Lost on restart, loses detail |
| VectorMemory | Embeds messages in vector store for retrieval | Persistent if vector store is, but no cross-platform |
| SimpleComposableMemory | Composes multiple memory sources together | Complex setup, still limited to LlamaIndex |
| AI Memory MCP Server | Full-text search + semantic retrieval via MCP | โ Persistent, cross-agent, cross-platform |
The key insight: LlamaIndex's memory classes are in-process abstractions. They don't connect to an external memory service that survives restarts. The AI Memory MCP server provides exactly that โ a persistent, searchable memory backend accessible via the Model Context Protocol.
What is the AI Memory MCP Server?
The aimemory-mcp-server is an open-source MCP (Model Context Protocol) server that provides persistent memory tools for any MCP-compatible client โ including LlamaIndex, Claude Desktop, Cursor, Windsurf, and 113+ other tools.
- 12 memory tools: save, search, list, update, delete, get, stats, export, import, batch_save, get_all_tags, inject_memory
- SQLite + FTS5: Full-text search with no external dependencies
- Cross-platform: Same memory accessible from ChatGPT, Claude, Gemini, DeepSeek, and LlamaIndex agents
- Free and open-source: pip install aimemory-mcp-server
Step-by-Step Setup Guide
Step 1: Install the MCP Server
pip install aimemory-mcp-serverThis installs the MCP server with all dependencies. The server uses SQLite for storage โ no database setup required.
Step 2: Configure MCP in Your Project
Create or update your MCP configuration file. For LlamaIndex projects using MCP tools, add the server to your mcp.json:
{
"mcpServers": {
"ai-memory": {
"command": "aimemory-mcp-server",
"args": [],
"env": {}
}
}
}Step 3: Connect LlamaIndex to the MCP Server
Use LlamaIndex's MCP tool adapter to connect your agent to the memory server:
from llama_index.tools.mcp import McpToolSpec
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
# Connect to the AI Memory MCP server
mcp_tool_spec = McpToolSpec(
command="aimemory-mcp-server",
args=[],
transport="stdio",
)
# Load all 12 memory tools
tools = await mcp_tool_spec.to_tool_list_async()
# Create agent with memory capabilities
llm = OpenAI(model="gpt-4o")
agent = ReActAgent.from_tools(tools, llm=llm, verbose=True)
# Agent can now save and search memories!
response = await agent.achat(
"Remember that I prefer Python over JavaScript for backend work"
)Step 4: Save Conversations Automatically
Add a memory-saving hook to your agent workflow. After each conversation, save the key context:
# After conversation completes, save to memory
response = await agent.achat("""
Save this conversation to memory with these details:
- conversation_id: chat-001
- content: [conversation summary]
- tags: ["python", "project-setup", "preferences"]
- source: "llamaindex-agent"
""")Step 5: Inject Relevant Memories into New Conversations
At the start of each new conversation, search for relevant past context:
# At conversation start, inject relevant memories
memory_context = await agent.achat("""
Search my memories for anything related to "Python project setup"
and summarize the most relevant findings.
""")
# Use the memory context to prime the conversation
system_prompt = f"""You are a helpful assistant.
Here is relevant context from past conversations:
{memory_context}
Use this context to provide personalized responses."""Combining RAG + Memory: The Best of Both Worlds
LlamaIndex's killer feature is RAG โ querying structured data sources. When you combine RAG with persistent memory from AI Memory MCP server, you get an agent that can:
- Query your data: Use LlamaIndex's index and query engine to search documents, APIs, and databases
- Remember past interactions: Use AI Memory to recall what you've discussed before
- Connect the dots: Combine retrieved data with conversational context for richer responses
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.tools.mcp import McpToolSpec
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
# Load your documents into a RAG index
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
# Connect to AI Memory MCP server
mcp_tool_spec = McpToolSpec(
command="aimemory-mcp-server",
args=[],
transport="stdio",
)
memory_tools = await mcp_tool_spec.to_tool_list_async()
# Combine RAG tools + memory tools
rag_tools = # ... your query engine tools
all_tools = rag_tools + memory_tools
# Create an agent with both RAG and memory
agent = ReActAgent.from_tools(all_tools, llm=OpenAI(model="gpt-4o"))
# Agent can query documents AND remember conversations
response = await agent.achat("""
Search my memories for what I asked about last week regarding
the Q2 financial report, then query the report for those specific metrics.
""")Advanced: Multi-Agent Memory Sharing with LlamaIndex Workflows
LlamaIndex Workflows let you orchestrate multiple agents. When combined with AI Memory MCP server, all agents in the workflow can share a persistent memory store:
- Ingestion Agent: Processes and indexes new documents
- Query Agent: Answers questions using RAG
- Summary Agent: Generates summaries and saves key findings to memory
# Ingestion Agent saves a finding
await ingestion_agent.achat("""
Save to memory: "New document indexed: Q2-2026-earnings.pdf
Key metrics: Revenue $4.2B, EPS $3.15, 12% YoY growth"
tags: ["finance", "q2-2026", "earnings"]
""")
# Query Agent retrieves it later
await query_agent.achat("""
Search memories for "Q2 earnings" โ I need the latest
financial findings from the ingestion pipeline.
""")Comparison: LlamaIndex Memory vs AI Memory MCP Server
| Feature | LlamaIndex Built-in | AI Memory MCP |
|---|---|---|
| Persistent storage | โ Session-only | โ SQLite + FTS5 |
| Cross-agent sharing | โ Per-agent | โ Shared store |
| Full-text search | โ Manual only | โ FTS5 powered |
| Cross-platform access | โ LlamaIndex only | โ 113+ MCP clients |
| Tag organization | โ No | โ Tags + categories |
| Web UI for browsing | โ No | โ aimemory.pro web app |
| Works with RAG | โ Native integration | โ Complementary |
| Setup complexity | โ Import and go | โก pip install + config |
| Cost | โ Free (in-process) | โ Free (local server) |
Real-World Use Cases
1. Document Research Assistant with Memory
A research agent that combines LlamaIndex's document querying with persistent memory. It remembers which documents you've already reviewed, your key findings, and your research preferences across sessions.
2. Enterprise Knowledge Base with Conversational Context
An enterprise agent that queries internal documentation via LlamaIndex RAG while maintaining a memory of each employee's past questions and preferences. New queries are automatically enriched with relevant historical context.
3. Data Analysis Agent with Accumulated Insights
An agent that analyzes data sources, saves key insights to memory, and builds on previous analysis sessions. Each session adds new findings, and future sessions can search and extend previous work.
Troubleshooting
MCP server not found
Make sure aimemory-mcp-server is installed in your Python environment. Run which aimemory-mcp-server to verify.
Tools not loading
Check your mcp.json configuration. The server should be listed under mcpServers with the correct command. Also ensure you have llama-index-tools-mcp installed.
Memories not persisting
By default, the MCP server stores data in a local SQLite file. Make sure the server process has write permissions to the working directory.
Async errors with LlamaIndex
LlamaIndex's MCP integration is async-first. Make sure you're usingawait agent.achat() inside an async function, or use asyncio.run() for top-level calls.
Next Steps
- AI Memory MCP Server Setup Guide
- LangChain Memory Integration Guide
- CrewAI Memory Integration Guide
- MCP Memory Server: Complete Guide
Start Building with Persistent Memory
AI Memory MCP server is free and open-source. Give your LlamaIndex agents the memory they deserve.