How to Add Memory to LlamaIndex Agents with AI Memory MCP Server (2026)

Last updated: June 9, 2026 ยท 15 min read

LlamaIndex is one of the most powerful frameworks for building data-aware AI agents โ€” excelling at RAG (Retrieval-Augmented Generation) over private data sources. But when it comes to persistent conversation memory, LlamaIndex's built-in options fall short. In this guide, we'll show you how to add persistent, cross-session memory to your LlamaIndex agents using the AI Memory MCP server.

Why LlamaIndex Agents Need Persistent Memory

LlamaIndex agents leverage context windows from modern LLMs โ€” typically 128K-200K tokens for models like Claude 3.5 Sonnet or GPT-4o. But even with large context windows, there are fundamental problems:

  • Session boundaries: When your application restarts, all conversation history is gone
  • Context window limits: Long conversations eventually exceed the window, forcing truncation
  • RAG is not memory: LlamaIndex's strength is querying documents, not remembering conversations
  • No cross-agent sharing: Agent A cannot access what Agent B learned from a previous conversation
  • Cost explosion: Passing full conversation history in every prompt is expensive

Persistent memory solves all of these problems. Your agent can remember relevant context from past conversationswithout loading everything into the context window. This is complementary to LlamaIndex's RAG capabilities โ€” RAG queries your data, memory captures your interactions.

LlamaIndex's Built-In Memory Options (And Their Limits)

LlamaIndex offers several memory abstractions out of the box:

Memory ClassHow It WorksLimitation
ChatMemoryBufferStores recent messages up to a token limitLost on restart, simple buffer only
ChatSummaryMemoryBufferSummarizes older messages to save tokensLost on restart, loses detail
VectorMemoryEmbeds messages in vector store for retrievalPersistent if vector store is, but no cross-platform
SimpleComposableMemoryComposes multiple memory sources togetherComplex setup, still limited to LlamaIndex
AI Memory MCP ServerFull-text search + semantic retrieval via MCPโœ… Persistent, cross-agent, cross-platform

The key insight: LlamaIndex's memory classes are in-process abstractions. They don't connect to an external memory service that survives restarts. The AI Memory MCP server provides exactly that โ€” a persistent, searchable memory backend accessible via the Model Context Protocol.

What is the AI Memory MCP Server?

The aimemory-mcp-server is an open-source MCP (Model Context Protocol) server that provides persistent memory tools for any MCP-compatible client โ€” including LlamaIndex, Claude Desktop, Cursor, Windsurf, and 113+ other tools.

  • 12 memory tools: save, search, list, update, delete, get, stats, export, import, batch_save, get_all_tags, inject_memory
  • SQLite + FTS5: Full-text search with no external dependencies
  • Cross-platform: Same memory accessible from ChatGPT, Claude, Gemini, DeepSeek, and LlamaIndex agents
  • Free and open-source: pip install aimemory-mcp-server

Step-by-Step Setup Guide

Step 1: Install the MCP Server

pip install aimemory-mcp-server

This installs the MCP server with all dependencies. The server uses SQLite for storage โ€” no database setup required.

Step 2: Configure MCP in Your Project

Create or update your MCP configuration file. For LlamaIndex projects using MCP tools, add the server to your mcp.json:

{
  "mcpServers": {
    "ai-memory": {
      "command": "aimemory-mcp-server",
      "args": [],
      "env": {}
    }
  }
}

Step 3: Connect LlamaIndex to the MCP Server

Use LlamaIndex's MCP tool adapter to connect your agent to the memory server:

from llama_index.tools.mcp import McpToolSpec
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

# Connect to the AI Memory MCP server
mcp_tool_spec = McpToolSpec(
    command="aimemory-mcp-server",
    args=[],
    transport="stdio",
)

# Load all 12 memory tools
tools = await mcp_tool_spec.to_tool_list_async()

# Create agent with memory capabilities
llm = OpenAI(model="gpt-4o")
agent = ReActAgent.from_tools(tools, llm=llm, verbose=True)

# Agent can now save and search memories!
response = await agent.achat(
    "Remember that I prefer Python over JavaScript for backend work"
)

Step 4: Save Conversations Automatically

Add a memory-saving hook to your agent workflow. After each conversation, save the key context:

# After conversation completes, save to memory
response = await agent.achat("""
    Save this conversation to memory with these details:
    - conversation_id: chat-001
    - content: [conversation summary]
    - tags: ["python", "project-setup", "preferences"]
    - source: "llamaindex-agent"
""")

Step 5: Inject Relevant Memories into New Conversations

At the start of each new conversation, search for relevant past context:

# At conversation start, inject relevant memories
memory_context = await agent.achat("""
    Search my memories for anything related to "Python project setup"
    and summarize the most relevant findings.
""")

# Use the memory context to prime the conversation
system_prompt = f"""You are a helpful assistant.
Here is relevant context from past conversations:
{memory_context}

Use this context to provide personalized responses."""

Combining RAG + Memory: The Best of Both Worlds

LlamaIndex's killer feature is RAG โ€” querying structured data sources. When you combine RAG with persistent memory from AI Memory MCP server, you get an agent that can:

  • Query your data: Use LlamaIndex's index and query engine to search documents, APIs, and databases
  • Remember past interactions: Use AI Memory to recall what you've discussed before
  • Connect the dots: Combine retrieved data with conversational context for richer responses
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.tools.mcp import McpToolSpec
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

# Load your documents into a RAG index
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# Connect to AI Memory MCP server
mcp_tool_spec = McpToolSpec(
    command="aimemory-mcp-server",
    args=[],
    transport="stdio",
)
memory_tools = await mcp_tool_spec.to_tool_list_async()

# Combine RAG tools + memory tools
rag_tools = # ... your query engine tools
all_tools = rag_tools + memory_tools

# Create an agent with both RAG and memory
agent = ReActAgent.from_tools(all_tools, llm=OpenAI(model="gpt-4o"))

# Agent can query documents AND remember conversations
response = await agent.achat("""
    Search my memories for what I asked about last week regarding 
    the Q2 financial report, then query the report for those specific metrics.
""")

Advanced: Multi-Agent Memory Sharing with LlamaIndex Workflows

LlamaIndex Workflows let you orchestrate multiple agents. When combined with AI Memory MCP server, all agents in the workflow can share a persistent memory store:

  • Ingestion Agent: Processes and indexes new documents
  • Query Agent: Answers questions using RAG
  • Summary Agent: Generates summaries and saves key findings to memory
# Ingestion Agent saves a finding
await ingestion_agent.achat("""
    Save to memory: "New document indexed: Q2-2026-earnings.pdf
    Key metrics: Revenue $4.2B, EPS $3.15, 12% YoY growth"
    tags: ["finance", "q2-2026", "earnings"]
""")

# Query Agent retrieves it later
await query_agent.achat("""
    Search memories for "Q2 earnings" โ€” I need the latest 
    financial findings from the ingestion pipeline.
""")

Comparison: LlamaIndex Memory vs AI Memory MCP Server

FeatureLlamaIndex Built-inAI Memory MCP
Persistent storageโŒ Session-onlyโœ… SQLite + FTS5
Cross-agent sharingโŒ Per-agentโœ… Shared store
Full-text searchโŒ Manual onlyโœ… FTS5 powered
Cross-platform accessโŒ LlamaIndex onlyโœ… 113+ MCP clients
Tag organizationโŒ Noโœ… Tags + categories
Web UI for browsingโŒ Noโœ… aimemory.pro web app
Works with RAGโœ… Native integrationโœ… Complementary
Setup complexityโœ… Import and goโšก pip install + config
Costโœ… Free (in-process)โœ… Free (local server)

Real-World Use Cases

1. Document Research Assistant with Memory

A research agent that combines LlamaIndex's document querying with persistent memory. It remembers which documents you've already reviewed, your key findings, and your research preferences across sessions.

2. Enterprise Knowledge Base with Conversational Context

An enterprise agent that queries internal documentation via LlamaIndex RAG while maintaining a memory of each employee's past questions and preferences. New queries are automatically enriched with relevant historical context.

3. Data Analysis Agent with Accumulated Insights

An agent that analyzes data sources, saves key insights to memory, and builds on previous analysis sessions. Each session adds new findings, and future sessions can search and extend previous work.

Troubleshooting

MCP server not found

Make sure aimemory-mcp-server is installed in your Python environment. Run which aimemory-mcp-server to verify.

Tools not loading

Check your mcp.json configuration. The server should be listed under mcpServers with the correct command. Also ensure you have llama-index-tools-mcp installed.

Memories not persisting

By default, the MCP server stores data in a local SQLite file. Make sure the server process has write permissions to the working directory.

Async errors with LlamaIndex

LlamaIndex's MCP integration is async-first. Make sure you're usingawait agent.achat() inside an async function, or use asyncio.run() for top-level calls.

Next Steps

Start Building with Persistent Memory

AI Memory MCP server is free and open-source. Give your LlamaIndex agents the memory they deserve.

Ready to organize your AI conversations?

Import your ChatGPT, Claude, and DeepSeek conversations into AI Memory. Search everything instantly.

Try AI Memory Free โ†’

Related Articles