AI Legal Chain Resolver II

Abstract

Overview

General purpose language models are unreliable for legal queries in low-resource languages like Sinhala — they hallucinate sections, misquote provisions, and provide no source traceability. This project grounds answer generation entirely on retrieved legal context from Sri Lankan commercial law. Gemini handles intent classification and answer generation in Sinhala, while a hybrid pipeline combining Knowladge Graph,FAISS dense retrieval, BM25 lexical search, and cross-encoder re-ranking handles candidate selection — exposed via a Flask API with structured JSON responses, citation grouping, and in browser PDF download. The result is an auditable, citation grounded legal assistant for everyday citizens.

Methodology

STEP 01

Query Input

User submits a Sinhala legal question via the Flask API or web frontend

STEP 02

Intent Classification

Gemini classifies intent: section lookup, hybrid search, title search, or non-legal fallback

STEP 03

Hybrid Retrieval

FAISS dense + BM25 lexical retrieval runs in parallel; results are merged and deduplicated

STEP 04

Re-rank + Generate

Cross-encoder re-ranks top-k; Gemini generates a Sinhala answer citing only retrieved context

Data Preparation & Chunking

Acts are parsed from PDFs and cleaned. Section and subsection boundaries are detected using regex patterns, then recursive character splitting creates retrieval units that preserve legal structure.

Index Construction

build_faiss.py converts chunks into LangChain documents and stores them in a FAISS index with rich metadata: act name, section number, section title, and source file path.

Intent-Aware Retrieval Routing

The intent classifier routes each query to the most appropriate retrieval path — section number lookup, full hybrid FAISS/BM25 search, section title search, or a polite non-legal fallback.

Re-ranking & Answer Generation

After merging and deduplicating candidates, a cross-encoder scores each chunk against the query jointly. Gemini then produces a structured Sinhala answer citing only the top-ranked retrieved context.

High-level system architecture diagram — Fig 1 — High-level system architecture of the AI Legal Chain Resolver II pipeline

Experiment Setup & Implementation

Application Layer

app.py exposes three endpoints: synchronous answer generation, streaming answers for large responses, and PDF download by source. The frontend renders structured answer cards, inline citation badges, and a PDF download flow.

Core Orchestration Agent

collab_agent.py orchestrates the full retrieval pipeline, optionally expands full sections when a summary chunk is top-ranked, and logs per-query performance metrics to retrieval_log.jsonl.

Evaluation Methodology

Two benchmarks were used: an MCQ benchmark testing whether the correct section appears in the top-k retrieved chunks (hit rate), and a short-answer XLSX evaluation scoring the generated Sinhala answer against reference answers by section coverage and accuracy.

Tech Stack

Python · Flask · LangChain · FAISS · BM25 (rank_bm25) · Sentence-Transformers (cross-encoder) · Google Gemini API · PDFMiner · Pandas · Vanilla JS frontend. Deployable as a single-process Flask app with pre-built FAISS indices.

Results & Analysis

98%

Hit rate on MCQ benchmark

Hybrid Retrieval

92%

Hit rate on MCQ benchmark

BM25 Alone

74%

Hit rate on MCQ benchmark

FAISS Alone

62%

Short-answer hit rate

XLSX Evaluation

Embedding Model Evaluation

Metric	Score
MRR@10	0.8604
Recall@1	0.7576
Recall@3	0.9576
Recall@5	0.9879
Recall@10	1.0000

KG + RAG vs RAG Only

System Configuration	Total	Correct	Accuracy
KG + RAG (Full System)	50	48	96%
RAG Only (Baseline)	50	44	88%

Comparative Analysis With Existing Systems

System	MCQ Accuracy	Short Answer Score	Citation Accuracy (MCQ)
Our System	96%	4.8	96%
ChatGPT	76%	4.6	20%
Gemini 3	96%	4.8	90%
NotebookLM	96%	4.9	92%
LankaLaw	74.4%	4.7	55.8%

Fig 2 — Retrieval benchmark comparison: FAISS vs BM25 vs Hybrid

🔑 Key Observation

Lexical matching (BM25) substantially outperforms dense-only retrieval on rigid legal phrasing — legal text contains precise terminology that dense embeddings may paraphrase away. This validates the hybrid design decision.

⚡ Practical Impact

Hybrid + re-ranking maximises recall while filtering noise. In legal QA, missing the correct section is costlier than over-retrieving — making the 98% hybrid hit rate the most operationally significant result.

System Demo

Screenshot of the web frontend with a Sinhala legal question input

Home / Query Interface User enters a Sinhala legal question into the web frontend

Screenshot of the generated answer card with citations and PDF download option

Answer Card with Citations Structured Sinhala answer returned with section citation badges

Screenshot of the PDF source download option

PDF Source Download Users can download the cited source PDF directly from the answer view

Impact & Limitations

Impact of the Research

Improves access to legal information in Sinhala
Helps citizens and small businesses understand commercial law
Structured Legal Knowledge Representation
Foundation for Future Legal AI Systems

Limitations

Limited Legal Coverage
Limited to Sinhala language queries
Depends on available legal datasets (Acts and Gazettes)
Response Time

Team

Supervisor

Dr. Damayanthi Herath

Supervisor

Ms. Yasodha Vimukthi

Team Member

T.L.B Mapagedara

Team Member

R.J Yogesh

Links

Project Repository github.com/cepdnaclk

Project Page cepdnaclk.github.io

Department of Computer Engineering ce.pdn.ac.lk

University of Peradeniya eng.pdn.ac.lk

AI Legal Chain
Resolver II

Project Summary

Table of Contents

Abstract

Methodology

Query Input

Intent Classification

Hybrid Retrieval

Re-rank + Generate

Data Preparation & Chunking

Index Construction

Intent-Aware Retrieval Routing

Re-ranking & Answer Generation

Experiment Setup & Implementation

Application Layer

Core Orchestration Agent

Evaluation Methodology

Tech Stack

Results & Analysis

Embedding Model Evaluation

KG + RAG vs RAG Only

Comparative Analysis With Existing Systems

System Demo

Impact & Limitations

Impact of the Research

Limitations

Team

Links

Abstract

Related Works

Retrieval-Augmented Generation

Hybrid Search

Structured Legal QA

Low-Resource NLP

Cross-Encoder Re-ranking

Legal Document Parsing

Methodology

Query Input

Intent Classification

Hybrid Retrieval

Re-rank + Generate

Data Preparation & Chunking

Index Construction

Intent-Aware Retrieval Routing

Re-ranking & Answer Generation

Experiment Setup & Implementation

Application Layer

Core Orchestration Agent

Evaluation Methodology

Tech Stack

Results & Analysis

Embedding Model Evaluation

KG + RAG vs RAG Only

Comparative Analysis With Existing Systems

System Demo

Impact & Limitations

Impact of the Research

Limitations

Team

Links