LlamaIndex + Neo4j Integration

Overview

LlamaIndex is an open source data orchestration framework for building LLM-powered applications. It provides data connectors for ingesting from diverse sources, powerful indexing and retrieval mechanisms, query engines and chat interfaces, event-driven workflows for complex agentic applications, and seamless integrations with vector stores, databases, and other LLM frameworks.

Installation:

pip install llama-index-core llama-index-tools-mcp llama-index-vector-stores-neo4jvector

Key Features:

  • Event-driven Workflows and FunctionAgent for building multi-agent applications

  • Native Neo4j integrations via llama-index-vector-stores-neo4jvector package

  • MCP server support through llama-index-tools-mcp

  • Custom tool creation with FunctionTool.from_defaults()

  • Support for virtually every major LLM provider (OpenAI, Anthropic, Google, Cohere, Mistral, AWS Bedrock, Azure, and more)

  • LlamaCloud tools for document parsing (LlamaParse), classification (LlamaClassify), and extraction (LlamaExtract)

Examples

Notebook Description

llamaindex_functionagent.ipynb

Building a company research agent using LlamaIndex with Neo4j MCP server, custom tools, vector search, and FunctionAgent workflow

build_knowledge_graph_with_neo4j_llamacloud.ipynb

End-to-end pipeline for legal document processing using LlamaClassify, LlamaExtract, and Neo4j knowledge graph construction

Extension Points

1. MCP Integration

LlamaIndex supports MCP servers via the llama-index-tools-mcp package. Use BasicMCPClient and McpToolSpec to connect to MCP servers and retrieve tools.

  • Neo4j MCP Server: Leverage the official Neo4j MCP server for schema reading and Cypher query execution

2. Direct Neo4j Integrations

LlamaIndex provides native Neo4j integrations:

  • Neo4jVectorStore: Vector store integration via llama-index-vector-stores-neo4jvector for semantic search over graph data with support for hybrid search, metadata filtering, and custom retrieval queries

  • Neo4j Python Driver: You can always use the Neo4j Python driver directly for executing Cypher queries within custom tools

3. Custom Tools/Functions

Define custom Neo4j tools using FunctionTool.from_defaults():

  • Implement functions that execute Cypher queries via the Neo4j Python driver

  • Wrap Neo4j vector stores as tools with QueryEngineTool

  • Combine MCP tools with custom tools in a single FunctionAgent

4. LlamaCloud Tools

Build knowledge graphs from documents using LlamaCloud services:

  • LlamaParse: Parse complex document formats (PDFs, presentations, etc.)

  • LlamaClassify: AI-powered document classification with custom rules

  • LlamaExtract: Extract structured data using Pydantic schemas

5. Text-to-Cypher and GraphRAG Retrieval

LlamaIndex provides TextToCypherRetriever and VectorContextRetriever for building GraphRAG agents that combine semantic search with natural language Cypher generation. Both retrievers work against a Neo4jPropertyGraphStore and can be composed in a single query engine exposed as an agent tool.

from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
from llama_index.core.retrievers import (
    CustomPGRetriever,
    VectorContextRetriever,
    TextToCypherRetriever,
)
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.tools import QueryEngineTool

graph_store = Neo4jPropertyGraphStore(
    username="companies",
    password="companies",
    url="neo4j+s://demo.neo4jlabs.com:7687",
    database="companies",
)

# Semantic search over article chunks linked to company nodes
vector_retriever = VectorContextRetriever(
    graph_store,
    include_text=True,
    similarity_top_k=3,
)

# Natural language → Cypher for structured graph queries
cypher_retriever = TextToCypherRetriever(graph_store)

# Combine into a query engine and wrap as an agent tool
query_engine = RetrieverQueryEngine.from_args(
    graph_store.as_retriever(
        sub_retrievers=[vector_retriever, cypher_retriever]
    )
)

research_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="company_research",
    description=(
        "Search news and relationships in the companies knowledge graph. "
        "Use for questions about organizations, industries, leadership, and recent articles."
    ),
)

6. Neo4j Query Engine Tools

The llama-index-tools-neo4j package provides a Neo4jQueryToolSpec that creates ready-made query engines over a Neo4j graph. Available engine types include vector-based entity retrieval, keyword-based retrieval, hybrid retrieval, raw vector index retrieval, KnowledgeGraphQueryEngine, and KnowledgeGraphRAGRetriever. Each type is exposed as a callable tool that an agent can select at runtime.

pip install llama-index-tools-neo4j

MCP Authentication

Supported Mechanisms:

Environment Variables (STDIO transport) - For local MCP servers, set environment variables before spawning the process. The BasicMCPClient can connect to local processes via stdio transport.

HTTP Headers (HTTP/SSE transport) - For remote MCP servers, pass API keys or bearer tokens via the headers parameter (e.g., Authorization: Basic ${CREDENTIALS} or Authorization: Bearer ${API_TOKEN}).

OAuth 2.0 (in-client) - The BasicMCPClient supports OAuth 2.0 authentication via the with_oauth() method with configurable token storage.

Configuration Example (HTTP transport):

import os
import base64
from llama_index.tools.mcp import BasicMCPClient, McpToolSpec

# Set environment variables for the MCP server
os.environ["NEO4J_URI"] = "neo4j+s://demo.neo4jlabs.com"
os.environ["NEO4J_DATABASE"] = "companies"
os.environ["NEO4J_MCP_TRANSPORT"] = "http"

# Credentials passed via HTTP headers
credentials = base64.b64encode(
    f"{os.environ['NEO4J_USERNAME']}:{os.environ['NEO4J_PASSWORD']}".encode()
).decode()

mcp_client = BasicMCPClient(
    "http://localhost:80/mcp",
    headers={"Authorization": f"Basic {credentials}"},
)

mcp_tool_spec = McpToolSpec(client=mcp_client)
tools = await mcp_tool_spec.to_tool_list_async()