Documentation for local deployments
Prerequisites
You will need to have a Neo4j Database V5.18 or later with APOC installed to use this Knowledge Graph Builder. You can use any Neo4j Aura database (including the free tier database). Neo4j Aura automatically includes APOC and run on the latest Neo4j version, making it a great choice to get started quickly.
You can also use the free trial in Neo4j Sandbox, which also includes Graph Data Science.
If want to use Neo4j Desktop instead, you need to configure your NEO4J_URI=bolt://host.docker.internal
to allow the Docker container to access the network running on your computer.
Docker-compose
By default only OpenAI and Diffbot are enabled since Gemini requires extra GCP configurations.
In your root folder, create a .env file with your OPENAI and DIFFBOT keys (if you want to use both):
OPENAI_API_KEY="your-openai-key"
DIFFBOT_API_KEY="your-diffbot-key"
if you only want OpenAI:
LLM_MODELS="gpt-3.5,gpt-4o"
OPENAI_API_KEY="your-openai-key"
if you only want Diffbot:
LLM_MODELS="diffbot"
DIFFBOT_API_KEY="your-diffbot-key"
You can then run Docker Compose to build and start all components:
docker-compose up --build
Configuring LLM Models
You can configure the following LLM models besides the ones supported out of the box:
-
OpenAI GPT 3.5 and 4o (default)
-
VertexAI (Gemini 1.0) (default)
-
VertexAI (Gemini 1.5)
-
Diffbot
-
Bedrock models
-
Anthropic
-
OpenAI API compatible models like Ollama, Groq, Fireworks
To achieve that you need to set a number of environment variables:
In your .env
file, add the following lines. You can of course also add other model configurations from these providers or any OpenAI API compatible provider.
LLM_MODEL_CONFIG_azure_ai_gpt_35="gpt-35,https://<deployment>.openai.azure.com/,<api-key>,<version>"
LLM_MODEL_CONFIG_anthropic_claude_35_sonnet="claude-3-5-sonnet-20240620,<api-key>"
LLM_MODEL_CONFIG_fireworks_llama_v3_70b="accounts/fireworks/models/llama-v3-70b-instruct,<api-key>"
LLM_MODEL_CONFIG_bedrock_claude_35_sonnet="anthropic.claude-3-sonnet-20240229-v1:0,<api-key>,<region>"
LLM_MODEL_CONFIG_ollama_llama3="llama3,http://host.docker.internal:11434"
LLM_MODEL_CONFIG_fireworks_qwen_72b="accounts/fireworks/models/qwen2-72b-instruct,<api-key>"
# Optional Frontend config
VITE_LLM_MODELS="diffbot,gpt-3.5,gpt-4o,azure_ai_gpt_35,azure_ai_gpt_4o,groq_llama3_70b,anthropic_claude_35_sonnet,fireworks_llama_v3_70b,bedrock_claude_35_sonnet,ollama_llama3,fireworks_qwen_72b"
In your docker-compose.yml
you sadly need to pass the variables through:
- LLM_MODEL_CONFIG_anthropic_claude_35_sonnet=${LLM_MODEL_CONFIG_anthropic_claude_35_sonnet-}
- LLM_MODEL_CONFIG_fireworks_llama_v3_70b=${LLM_MODEL_CONFIG_fireworks_llama_v3_70b-}
- LLM_MODEL_CONFIG_azure_ai_gpt_4o=${LLM_MODEL_CONFIG_azure_ai_gpt_4o-}
- LLM_MODEL_CONFIG_azure_ai_gpt_35=${LLM_MODEL_CONFIG_azure_ai_gpt_35-}
- LLM_MODEL_CONFIG_groq_llama3_70b=${LLM_MODEL_CONFIG_groq_llama3_70b-}
- LLM_MODEL_CONFIG_bedrock_claude_3_5_sonnet=${LLM_MODEL_CONFIG_bedrock_claude_3_5_sonnet-}
- LLM_MODEL_CONFIG_fireworks_qwen_72b=${LLM_MODEL_CONFIG_fireworks_qwen_72b-}
- LLM_MODEL_CONFIG_ollama_llama3=${LLM_MODEL_CONFIG_ollama_llama3-}
Additional configs
By default, the input sources will be: Local files, Youtube, Wikipedia and AWS S3. This is the default config applied if you do not overwrite it in your .env file:
VITE_SOURCES="local,youtube,wiki,s3"
If however you want the Google GCS integration, add gcs
and your Google client ID:
VITE_SOURCES="local,youtube,wiki,s3,gcs"
GOOGLE_CLIENT_ID="xxxx"
The REACT_APP_SOURCES
should be a comma-separated list of the sources you want to enable.
You can of course combine all (local, youtube, wikipedia, s3 and gcs) or remove any you don’t want or need.
Development (Separate Frontend and Backend)
Alternatively, you can run the backend and frontend separately:
-
For the frontend:
-
Create the frontend/.env file by copy/pasting the frontend/example.env.
-
Change values as needed
-
Run:
-
cd frontend
yarn
yarn run dev
-
For the backend:
-
Create the backend/.env file by copy/pasting the backend/example.env.
-
Change values as needed
-
Run:
-
cd backend
python -m venv envName
source envName/bin/activate
pip install -r requirements.txt
uvicorn score:app --reload
ENV
Processing Configuration
Env Variable Name | Mandatory/Optional | Default Value | Description |
---|---|---|---|
|
Optional |
|
Flag to enable text embedding for chunks |
|
Optional |
|
Flag to enable entity embedding (id and description) |
|
Optional |
|
Minimum score for KNN algorithm for connecting similar Chunks |
|
Optional |
|
Number of chunks to combine when extracting entities |
|
Optional |
|
Number of chunks processed before writing to the database and updating progress |
|
Optional |
|
Environment variable for the app |
|
Optional |
|
Time per chunk for processing |
|
Optional |
|
Size of each chunk for processing |
Front-End Configuration
Env Variable Name | Mandatory/Optional | Default Value | Description |
---|---|---|---|
|
Optional |
URL for backend API |
|
|
Optional |
|
List of input sources that will be available |
|
Optional |
|
URL for Bloom visualization |
|
Mandatory |
|
Comma separated list of LLM Model names to show in the selector |
GCP Cloud Integration
Env Variable Name | Mandatory/Optional | Default Value | Description |
---|---|---|---|
|
Optional |
|
Flag to enable Gemini |
|
Optional |
|
Flag to enable Google Cloud logs |
|
Optional |
Client ID for Google authentication for GCS upload |
|
|
Optional |
|
If set to True, will save the files to process into GCS. If set to False, will save the files locally |
LLM Model Configuration
Env Variable Name | Mandatory/Optional | Default Value | Description |
---|---|---|---|
|
Optional |
|
Models available for selection on the frontend, used for entities extraction and Q&A Chatbot (other models: |
|
Optional |
|
API key for OpenAI (if enabled) |
|
Optional |
API key for Diffbot (if enabled) |
|
|
Optional |
|
Model for generating the text embedding (all-MiniLM-L6-v2 , openai , vertexai) |
|
Optional |
API key for Groq |
|
|
Optional |
|
Flag to enable Gemini |
|
Optional |
Configuration for additional LLM models |
LangChain and Neo4j Configuration
Env Variable Name | Mandatory/Optional | Default Value | Description |
---|---|---|---|
|
Optional |
|
URI for Neo4j database for the backend to connect to |
|
Optional |
|
Username for Neo4j database for the backend to connect to |
|
Optional |
|
Password for Neo4j database for the backend to connect to |
|
Optional |
API key for LangSmith |
|
|
Optional |
Project for LangSmith |
|
|
Optional |
|
Flag to enable LangSmith tracing |
|
Optional |
Endpoint for LangSmith API |