LlamaIndex + Neo4j Integration

Overview

LlamaIndex is an open source data orchestration framework for building LLM-powered applications. It provides data connectors for ingesting from diverse sources, powerful indexing and retrieval mechanisms, query engines and chat interfaces, event-driven workflows for complex agentic applications, and seamless integrations with vector stores, databases, and other LLM frameworks.

Installation:

pip install llama-index-core llama-index-tools-mcp llama-index-vector-stores-neo4jvector

Key Features:

Event-driven Workflows and FunctionAgent for building multi-agent applications
Native Neo4j integrations via llama-index-vector-stores-neo4jvector package
MCP server support through llama-index-tools-mcp
Custom tool creation with FunctionTool.from_defaults()
Support for virtually every major LLM provider (OpenAI, Anthropic, Google, Cohere, Mistral, AWS Bedrock, Azure, and more)
LlamaCloud tools for document parsing (LlamaParse), classification (LlamaClassify), and extraction (LlamaExtract)

Examples

Notebook	Description
llamaindex_functionagent.ipynb	Building a company research agent using LlamaIndex with Neo4j MCP server, custom tools, vector search, and FunctionAgent workflow
build_knowledge_graph_with_neo4j_llamacloud.ipynb	End-to-end pipeline for legal document processing using LlamaClassify, LlamaExtract, and Neo4j knowledge graph construction

Notebook

Description

llamaindex_functionagent.ipynb

Building a company research agent using LlamaIndex with Neo4j MCP server, custom tools, vector search, and FunctionAgent workflow

build_knowledge_graph_with_neo4j_llamacloud.ipynb

End-to-end pipeline for legal document processing using LlamaClassify, LlamaExtract, and Neo4j knowledge graph construction

Extension Points

1. MCP Integration

LlamaIndex supports MCP servers via the llama-index-tools-mcp package. Use BasicMCPClient and McpToolSpec to connect to MCP servers and retrieve tools.

Neo4j MCP Server: Leverage the official Neo4j MCP server for schema reading and Cypher query execution

2. Direct Neo4j Integrations

LlamaIndex provides native Neo4j integrations:

Neo4jVectorStore: Vector store integration via llama-index-vector-stores-neo4jvector for semantic search over graph data with support for hybrid search, metadata filtering, and custom retrieval queries
Neo4j Python Driver: You can always use the Neo4j Python driver directly for executing Cypher queries within custom tools

3. Custom Tools/Functions

Define custom Neo4j tools using FunctionTool.from_defaults():

Implement functions that execute Cypher queries via the Neo4j Python driver
Wrap Neo4j vector stores as tools with QueryEngineTool
Combine MCP tools with custom tools in a single FunctionAgent

4. LlamaCloud Tools

Build knowledge graphs from documents using LlamaCloud services:

LlamaParse: Parse complex document formats (PDFs, presentations, etc.)
LlamaClassify: AI-powered document classification with custom rules
LlamaExtract: Extract structured data using Pydantic schemas

5. Text-to-Cypher and GraphRAG Retrieval

LlamaIndex provides TextToCypherRetriever and VectorContextRetriever for building GraphRAG agents that combine semantic search with natural language Cypher generation. Both retrievers work against a Neo4jPropertyGraphStore and can be composed in a single query engine exposed as an agent tool.

from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
from llama_index.core.retrievers import (
    CustomPGRetriever,
    VectorContextRetriever,
    TextToCypherRetriever,
)
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.tools import QueryEngineTool

graph_store = Neo4jPropertyGraphStore(
    username="companies",
    password="companies",
    url="neo4j+s://demo.neo4jlabs.com:7687",
    database="companies",
)

# Semantic search over article chunks linked to company nodes
vector_retriever = VectorContextRetriever(
    graph_store,
    include_text=True,
    similarity_top_k=3,
)

# Natural language → Cypher for structured graph queries
cypher_retriever = TextToCypherRetriever(graph_store)

# Combine into a query engine and wrap as an agent tool
query_engine = RetrieverQueryEngine.from_args(
    graph_store.as_retriever(
        sub_retrievers=[vector_retriever, cypher_retriever]
    )
)

research_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="company_research",
    description=(
        "Search news and relationships in the companies knowledge graph. "
        "Use for questions about organizations, industries, leadership, and recent articles."
    ),
)

6. Neo4j Query Engine Tools

The llama-index-tools-neo4j package provides a Neo4jQueryToolSpec that creates ready-made query engines over a Neo4j graph. Available engine types include vector-based entity retrieval, keyword-based retrieval, hybrid retrieval, raw vector index retrieval, KnowledgeGraphQueryEngine, and KnowledgeGraphRAGRetriever. Each type is exposed as a callable tool that an agent can select at runtime.

pip install llama-index-tools-neo4j

MCP Authentication

Supported Mechanisms:

✅ Environment Variables (STDIO transport) - For local MCP servers, set environment variables before spawning the process. The BasicMCPClient can connect to local processes via stdio transport.

✅ HTTP Headers (HTTP/SSE transport) - For remote MCP servers, pass API keys or bearer tokens via the headers parameter (e.g., Authorization: Basic ${CREDENTIALS} or Authorization: Bearer ${API_TOKEN}).

✅ OAuth 2.0 (in-client) - The BasicMCPClient supports OAuth 2.0 authentication via the with_oauth() method with configurable token storage.

Configuration Example (HTTP transport):

import os
import base64
from llama_index.tools.mcp import BasicMCPClient, McpToolSpec

# Set environment variables for the MCP server
os.environ["NEO4J_URI"] = "neo4j+s://demo.neo4jlabs.com"
os.environ["NEO4J_DATABASE"] = "companies"
os.environ["NEO4J_MCP_TRANSPORT"] = "http"

# Credentials passed via HTTP headers
credentials = base64.b64encode(
    f"{os.environ['NEO4J_USERNAME']}:{os.environ['NEO4J_PASSWORD']}".encode()
).decode()

mcp_client = BasicMCPClient(
    "http://localhost:80/mcp",
    headers={"Authorization": f"Basic {credentials}"},
)

mcp_tool_spec = McpToolSpec(client=mcp_client)
tools = await mcp_tool_spec.to_tool_list_async()