Does LlamaIndex have built-in persistent memory?

LlamaIndex provides ChatMemoryBuffer and other memory classes for managing conversation context within a session, but they are session-scoped — meaning memories are lost when the application restarts. For true persistent cross-session memory, you need an external memory store like AI Memory MCP server.

What is the AI Memory MCP server for LlamaIndex?

AI Memory MCP server (aimemory-mcp-server) is a Model Context Protocol server that provides persistent memory tools for LlamaIndex agents. It offers 12 tools including save_conversation, search_memories, inject_memory, and batch operations. It stores memories in SQLite with full-text search (FTS5) and can be installed via pip.

How do I install AI Memory MCP server for LlamaIndex?

Install with pip install aimemory-mcp-server. Then configure it as an MCP server in your LlamaIndex project by adding it to your mcp.json configuration file. The server runs locally via stdio transport and requires no cloud account for basic usage.

Can LlamaIndex agents share memory with other frameworks?

Yes. AI Memory MCP server supports shared memory across multiple agents and frameworks. A LlamaIndex agent and a LangChain agent connected to the same MCP server instance can save, search, and retrieve from the same memory store. You can use tags to organize memories by agent or topic.

How to Add Memory to LlamaIndex Agents with AI Memory MCP Server (2026)

Last updated: June 9, 2026 · 15 min read

LlamaIndex is one of the most powerful frameworks for building data-aware AI agents — excelling at RAG (Retrieval-Augmented Generation) over private data sources. But when it comes to persistent conversation memory, LlamaIndex's built-in options fall short. In this guide, we'll show you how to add persistent, cross-session memory to your LlamaIndex agents using the AI Memory MCP server.

Why LlamaIndex Agents Need Persistent Memory

LlamaIndex agents leverage context windows from modern LLMs — typically 128K-200K tokens for models like Claude 3.5 Sonnet or GPT-4o. But even with large context windows, there are fundamental problems:

Session boundaries: When your application restarts, all conversation history is gone
Context window limits: Long conversations eventually exceed the window, forcing truncation
RAG is not memory: LlamaIndex's strength is querying documents, not remembering conversations
No cross-agent sharing: Agent A cannot access what Agent B learned from a previous conversation
Cost explosion: Passing full conversation history in every prompt is expensive

Persistent memory solves all of these problems. Your agent can remember relevant context from past conversationswithout loading everything into the context window. This is complementary to LlamaIndex's RAG capabilities — RAG queries your data, memory captures your interactions.

LlamaIndex's Built-In Memory Options (And Their Limits)

LlamaIndex offers several memory abstractions out of the box:

Memory Class	How It Works	Limitation
ChatMemoryBuffer	Stores recent messages up to a token limit	Lost on restart, simple buffer only
ChatSummaryMemoryBuffer	Summarizes older messages to save tokens	Lost on restart, loses detail
VectorMemory	Embeds messages in vector store for retrieval	Persistent if vector store is, but no cross-platform
SimpleComposableMemory	Composes multiple memory sources together	Complex setup, still limited to LlamaIndex
AI Memory MCP Server	Full-text search + semantic retrieval via MCP	✅ Persistent, cross-agent, cross-platform

The key insight: LlamaIndex's memory classes are in-process abstractions. They don't connect to an external memory service that survives restarts. The AI Memory MCP server provides exactly that — a persistent, searchable memory backend accessible via the Model Context Protocol.

What is the AI Memory MCP Server?

The aimemory-mcp-server is an open-source MCP (Model Context Protocol) server that provides persistent memory tools for any MCP-compatible client — including LlamaIndex, Claude Desktop, Cursor, Windsurf, and 113+ other tools.

12 memory tools: save, search, list, update, delete, get, stats, export, import, batch_save, get_all_tags, inject_memory
SQLite + FTS5: Full-text search with no external dependencies
Cross-platform: Same memory accessible from ChatGPT, Claude, Gemini, DeepSeek, and LlamaIndex agents
Free and open-source: pip install aimemory-mcp-server

Step-by-Step Setup Guide

Step 1: Install the MCP Server

pip install aimemory-mcp-server

This installs the MCP server with all dependencies. The server uses SQLite for storage — no database setup required.

Step 2: Configure MCP in Your Project

Create or update your MCP configuration file. For LlamaIndex projects using MCP tools, add the server to your mcp.json:

{
  "mcpServers": {
    "ai-memory": {
      "command": "aimemory-mcp-server",
      "args": [],
      "env": {}
    }
  }
}

Step 3: Connect LlamaIndex to the MCP Server

Use LlamaIndex's MCP tool adapter to connect your agent to the memory server:

from llama_index.tools.mcp import McpToolSpec
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

# Connect to the AI Memory MCP server
mcp_tool_spec = McpToolSpec(
    command="aimemory-mcp-server",
    args=[],
    transport="stdio",
)

# Load all 12 memory tools
tools = await mcp_tool_spec.to_tool_list_async()

# Create agent with memory capabilities
llm = OpenAI(model="gpt-4o")
agent = ReActAgent.from_tools(tools, llm=llm, verbose=True)

# Agent can now save and search memories!
response = await agent.achat(
    "Remember that I prefer Python over JavaScript for backend work"
)

Step 4: Save Conversations Automatically

Add a memory-saving hook to your agent workflow. After each conversation, save the key context:

# After conversation completes, save to memory
response = await agent.achat("""
    Save this conversation to memory with these details:
    - conversation_id: chat-001
    - content: [conversation summary]
    - tags: ["python", "project-setup", "preferences"]
    - source: "llamaindex-agent"
""")

Step 5: Inject Relevant Memories into New Conversations

At the start of each new conversation, search for relevant past context:

# At conversation start, inject relevant memories
memory_context = await agent.achat("""
    Search my memories for anything related to "Python project setup"
    and summarize the most relevant findings.
""")

# Use the memory context to prime the conversation
system_prompt = f"""You are a helpful assistant.
Here is relevant context from past conversations:
{memory_context}

Use this context to provide personalized responses."""

Combining RAG + Memory: The Best of Both Worlds

LlamaIndex's killer feature is RAG — querying structured data sources. When you combine RAG with persistent memory from AI Memory MCP server, you get an agent that can:

Query your data: Use LlamaIndex's index and query engine to search documents, APIs, and databases
Remember past interactions: Use AI Memory to recall what you've discussed before
Connect the dots: Combine retrieved data with conversational context for richer responses

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.tools.mcp import McpToolSpec
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

# Load your documents into a RAG index
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# Connect to AI Memory MCP server
mcp_tool_spec = McpToolSpec(
    command="aimemory-mcp-server",
    args=[],
    transport="stdio",
)
memory_tools = await mcp_tool_spec.to_tool_list_async()

# Combine RAG tools + memory tools
rag_tools = # ... your query engine tools
all_tools = rag_tools + memory_tools

# Create an agent with both RAG and memory
agent = ReActAgent.from_tools(all_tools, llm=OpenAI(model="gpt-4o"))

# Agent can query documents AND remember conversations
response = await agent.achat("""
    Search my memories for what I asked about last week regarding 
    the Q2 financial report, then query the report for those specific metrics.
""")

Advanced: Multi-Agent Memory Sharing with LlamaIndex Workflows

LlamaIndex Workflows let you orchestrate multiple agents. When combined with AI Memory MCP server, all agents in the workflow can share a persistent memory store:

Ingestion Agent: Processes and indexes new documents
Query Agent: Answers questions using RAG
Summary Agent: Generates summaries and saves key findings to memory

# Ingestion Agent saves a finding
await ingestion_agent.achat("""
    Save to memory: "New document indexed: Q2-2026-earnings.pdf
    Key metrics: Revenue $4.2B, EPS $3.15, 12% YoY growth"
    tags: ["finance", "q2-2026", "earnings"]
""")

# Query Agent retrieves it later
await query_agent.achat("""
    Search memories for "Q2 earnings" — I need the latest 
    financial findings from the ingestion pipeline.
""")

Comparison: LlamaIndex Memory vs AI Memory MCP Server

Feature	LlamaIndex Built-in	AI Memory MCP
Persistent storage	❌ Session-only	✅ SQLite + FTS5
Cross-agent sharing	❌ Per-agent	✅ Shared store
Full-text search	❌ Manual only	✅ FTS5 powered
Cross-platform access	❌ LlamaIndex only	✅ 113+ MCP clients
Tag organization	❌ No	✅ Tags + categories
Web UI for browsing	❌ No	✅ aimemory.pro web app
Works with RAG	✅ Native integration	✅ Complementary
Setup complexity	✅ Import and go	⚡ pip install + config
Cost	✅ Free (in-process)	✅ Free (local server)

Real-World Use Cases

1. Document Research Assistant with Memory

A research agent that combines LlamaIndex's document querying with persistent memory. It remembers which documents you've already reviewed, your key findings, and your research preferences across sessions.

2. Enterprise Knowledge Base with Conversational Context

An enterprise agent that queries internal documentation via LlamaIndex RAG while maintaining a memory of each employee's past questions and preferences. New queries are automatically enriched with relevant historical context.

3. Data Analysis Agent with Accumulated Insights

An agent that analyzes data sources, saves key insights to memory, and builds on previous analysis sessions. Each session adds new findings, and future sessions can search and extend previous work.

Troubleshooting

MCP server not found

Make sure aimemory-mcp-server is installed in your Python environment. Run which aimemory-mcp-server to verify.

Tools not loading

Check your mcp.json configuration. The server should be listed under mcpServers with the correct command. Also ensure you have llama-index-tools-mcp installed.

Memories not persisting

By default, the MCP server stores data in a local SQLite file. Make sure the server process has write permissions to the working directory.

Async errors with LlamaIndex

LlamaIndex's MCP integration is async-first. Make sure you're usingawait agent.achat() inside an async function, or use asyncio.run() for top-level calls.

Next Steps

Start Building with Persistent Memory

AI Memory MCP server is free and open-source. Give your LlamaIndex agents the memory they deserve.

Get Started with MCP →View on PyPI →