Skip to content

A GraphRAG-powered chatbot that builds a Neo4j knowledge graph from documents, including text, images, and tables, and uses LLM retrieval to answer questions with rich, structured context.

Notifications You must be signed in to change notification settings

yammdd/GraphRAG-Chatbot

Repository files navigation

GraphRAG-Augmented Vietnamese Legal Chatbot

A Graph-enhanced Retrieval-Augmented Generation system for grounded, explainable
and multi-hop legal reasoning on Vietnamese law texts


✨ Key Features

  • GraphRAG Architecture

    • Hybrid retrieval: Vector Search + Knowledge Graph traversal
    • Neo4j used as a true hybrid database (graph + vectors)
  • Rule Engine & Domain Classification

    • Smart Routing: Classifies queries as Criminal, Civil, or Both using regex heuristics
    • Entity Extraction: Pre-computes Articles (Điều), Penalties, and Monetary values to aid Graph traversal
    • Context-Aware: Dynamically injects the correct legal code (BLHS vs BLDS) context into the LLM
  • Legal-Aware Processing

    • Hierarchical chunking tailored to Vietnamese legal structure (Chương → Điều)
    • Schema-driven entity & relation extraction
    • Strong emphasis on grounding and hallucination avoidance
  • Explainability First

    • Graph-enriched context injected into the LLM
    • Optional graph visualization endpoint
    • Designed to support why an answer is given, not just what
  • Dockerized & Reproducible

    • One-command setup
    • Local-first, cloud-optional (AuraDB supported)

📦 Tech Stack

  • Backend: Python, Flask, LangChain
  • Database: Neo4j (AuraDB or local)
  • LLMs: Gemini 2.5 Flash Lite (Q/A + extraction)
  • Reranking: Cohere Rerank v3.5
  • Embeddings: Vietnamese law-specific embedding model (DEk21_hcmute_embedding)
  • Deployment: Docker & Docker Compose

🤔 How to use?

1. Create the .env File

Create a .env file in the project root and add the following environment variables:

NEO4J_URI=NULL # local or on Aura, your choice
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=password # you can change this
NEO4J_DATABASE=neo4j

GOOGLE_API_KEY=your-google-api-key-here # get api key here https://aistudio.google.com/app/api-keys
COHERE_API_KEY=your-cohere-api-key-here # get api key here https://dashboard.cohere.com/api-keys

VECTOR_INDEX_NAME=chunk_embedding
FULLTEXT_INDEX_NAME=entity_text
ALLOWED_ORIGINS=*

Replace the placeholders with your actual keys.


2. Download Embedding Model

If you want to use a local (offline) embedding model, download it from the following link:
👉 https://huggingface.co/huyydangg/DEk21_hcmute_embedding

After downloading, set the configuration as follows:

LOCAL_MODEL_PATH = "/app/models/DEk21_hcmute_embedding"
# True  → Use Local (Offline) Embedding
# False → Use Google Embedding
USE_LOCAL_EMBEDDING = True

3. Build and Start Docker

Ensure Docker is installed on your system, then run:

docker-compose up -d --build

This will build and start all required services.


4. Usage Guide

After Docker finishes building, wait a short moment for services to initialize.

You can view container logs using:

docker logs <container_name_or_id>

Once everything is ready, open your browser and navigate to:

http://localhost

to start interacting with the chatbot.


⚠ Intended Use & Disclaimer

  • This is not a production-ready legal system

  • This is a research / academic project

  • Designed to study:

    • Hallucination control

    • Legal grounding

    • Graph-augmented reasoning

  • Human-in-the-loop is mandatory for any real legal use.

  • If your chatbot confidently invents laws, this project exists to prove why that’s unacceptable.


👥 Contributors


Thanh Dan Bui
Project Manager

Nguyen Dan Vu
Backend Developer

Tien Dung Pham
Frontend Developer

About

A GraphRAG-powered chatbot that builds a Neo4j knowledge graph from documents, including text, images, and tables, and uses LLM retrieval to answer questions with rich, structured context.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •