Final Year Project — 2024/2025

AI Legal Chain
Resolver II

A Sinhala legal question answering system combining intent-aware retrieval, hybrid search, and grounded response generation with citations back to Sri Lankan commercial law.

Flask Application Gemini Intent + Answer Gen FAISS + BM25 Hybrid Retrieval Cross-Encoder Re-ranking Sinhala Legal QA RAG Pipeline

Project Summary

An end-to-end RAG pipeline for legal assistance in Sinhala. A user query flows through intent classification, retrieves candidate legal chunks from the rag, re-ranks them using a cross-encoder, and generates a structured Sinhala answer with citations and downloadable source PDFs — making legal knowledge accessible to Sri Lankan citizens in their native language.

01

Abstract

Overview

General-purpose language models are unreliable for legal queries in low-resource languages like Sinhala — they hallucinate sections, misquote provisions, and provide no source traceability. This project grounds answer generation entirely on retrieved legal context from Sri Lankan commercial law, initially focused on the Consumer Affairs Authority Act No. 9 of 2003. Gemini handles intent classification and answer generation in Sinhala, while a hybrid pipeline combining FAISS dense retrieval, BM25 lexical search, and cross-encoder re-ranking handles candidate selection — exposed via a Flask API with structured JSON responses, citation grouping, and in-browser PDF download. The result is an auditable, citation-grounded legal assistant for everyday citizens.

03

Methodology

STEP 01

Query Input

User submits a Sinhala legal question via the Flask API or web frontend

STEP 02

Intent Classification

Gemini classifies intent: section lookup, hybrid search, title search, or non-legal fallback

STEP 03

Hybrid Retrieval

FAISS dense + BM25 lexical retrieval runs in parallel; results are merged and deduplicated

STEP 04

Re-rank + Generate

Cross-encoder re-ranks top-k; Gemini generates a Sinhala answer citing only retrieved context

Data Preparation & Chunking

Acts are parsed from PDFs and cleaned. Section and subsection boundaries are detected using regex patterns, then recursive character splitting creates retrieval units that preserve legal structure — ensuring no section spans multiple unrelated chunks.

Index Construction

build_faiss.py converts chunks into LangChain documents and stores them in a FAISS index with rich metadata: act name, section number, section title, and source file path for PDF download.

Intent-Aware Retrieval Routing

The intent classifier routes each query to the most appropriate retrieval path — section number lookup, full hybrid FAISS/BM25 search, section title search, or a polite non-legal fallback when the query is outside scope.

Re-ranking & Answer Generation

After merging and deduplicating candidates, a cross-encoder scores each chunk against the query jointly. Gemini then produces a structured Sinhala answer citing only the top-ranked retrieved context — never hallucinating beyond what was retrieved.

High-level system architecture diagram
Fig 1 — High-level system architecture of the AI Legal Chain Resolver II pipeline
04

Experiment Setup & Implementation

Application Layer

app.py exposes three endpoints: synchronous answer generation, streaming answers for large responses, and PDF download by source. The frontend renders structured answer cards, inline citation badges with section numbers, and a PDF download flow — all over standard REST.

Core Orchestration Agent

collab_agent.py orchestrates the full retrieval pipeline, optionally expands full sections when a summary chunk is top-ranked, and logs per-query performance metrics to retrieval_log.jsonl for offline evaluation.

Evaluation Methodology

Two benchmarks were used: an MCQ benchmark testing whether the correct section appears in the top-k retrieved chunks (hit rate), and a short-answer XLSX evaluation scoring the generated Sinhala answer against reference answers by section coverage and accuracy.

Tech Stack

Python · Flask · LangChain · FAISS · BM25 (rank_bm25) · Sentence-Transformers (cross-encoder) · Google Gemini API · PDFMiner · Pandas · Vanilla JS frontend. Deployable as a single-process Flask app with pre-built FAISS indices.

05

Results & Analysis

98%
Hit rate on MCQ benchmark
Hybrid Retrieval
92%
Hit rate on MCQ benchmark
BM25 Alone
74%
Hit rate on MCQ benchmark
FAISS Alone
62%
Short-answer hit rate
XLSX Evaluation
Retrieval benchmark comparison
Fig 2 — Retrieval benchmark comparison: FAISS vs BM25 vs Hybrid
Evaluation overview
Fig 3 — End-to-end evaluation overview
🔑 Key Observation

Lexical matching (BM25) substantially outperforms dense-only retrieval on rigid legal phrasing — legal text contains precise terminology that dense embeddings may paraphrase away. This validates the hybrid design decision.

⚡ Practical Impact

Hybrid + re-ranking maximises recall while filtering noise. In legal QA, missing the correct section is costlier than over-retrieving — making the 98% hybrid hit rate the most operationally significant result.

06

Team

Dr. Damayanthi Herath Supervisor

Dr. Damayanthi Herath

Ms. Yasodha Vimukthi Supervisor

Ms. Yasodha Vimukthi

T.L.B Mapagedara Team Member

T.L.B Mapagedara

R.J Yogesh Team Member

R.J Yogesh