Abstract
General purpose language models are unreliable for legal queries in low-resource languages like Sinhala — they hallucinate sections, misquote provisions, and provide no source traceability. This project grounds answer generation entirely on retrieved legal context from Sri Lankan commercial law. Gemini handles intent classification and answer generation in Sinhala, while a hybrid pipeline combining Knowladge Graph,FAISS dense retrieval, BM25 lexical search, and cross-encoder re-ranking handles candidate selection — exposed via a Flask API with structured JSON responses, citation grouping, and in browser PDF download. The result is an auditable, citation grounded legal assistant for everyday citizens.
Methodology
Query Input
User submits a Sinhala legal question via the Flask API or web frontend
Intent Classification
Gemini classifies intent: section lookup, hybrid search, title search, or non-legal fallback
Hybrid Retrieval
FAISS dense + BM25 lexical retrieval runs in parallel; results are merged and deduplicated
Re-rank + Generate
Cross-encoder re-ranks top-k; Gemini generates a Sinhala answer citing only retrieved context
Data Preparation & Chunking
Acts are parsed from PDFs and cleaned. Section and subsection boundaries are detected using regex patterns, then recursive character splitting creates retrieval units that preserve legal structure.
Index Construction
build_faiss.py converts chunks into LangChain documents and stores them in a FAISS index with rich metadata: act name, section number, section title, and source file path.
Intent-Aware Retrieval Routing
The intent classifier routes each query to the most appropriate retrieval path — section number lookup, full hybrid FAISS/BM25 search, section title search, or a polite non-legal fallback.
Re-ranking & Answer Generation
After merging and deduplicating candidates, a cross-encoder scores each chunk against the query jointly. Gemini then produces a structured Sinhala answer citing only the top-ranked retrieved context.
Experiment Setup & Implementation
Application Layer
app.py exposes three endpoints: synchronous answer generation, streaming answers for large responses, and PDF download by source. The frontend renders structured answer cards, inline citation badges, and a PDF download flow.
Core Orchestration Agent
collab_agent.py orchestrates the full retrieval pipeline, optionally expands full sections when a summary chunk is top-ranked, and logs per-query performance metrics to retrieval_log.jsonl.
Evaluation Methodology
Two benchmarks were used: an MCQ benchmark testing whether the correct section appears in the top-k retrieved chunks (hit rate), and a short-answer XLSX evaluation scoring the generated Sinhala answer against reference answers by section coverage and accuracy.
Tech Stack
Python · Flask · LangChain · FAISS · BM25 (rank_bm25) · Sentence-Transformers (cross-encoder) · Google Gemini API · PDFMiner · Pandas · Vanilla JS frontend. Deployable as a single-process Flask app with pre-built FAISS indices.
Results & Analysis
Embedding Model Evaluation
| Metric | Score |
|---|---|
| MRR@10 | 0.8604 |
| Recall@1 | 0.7576 |
| Recall@3 | 0.9576 |
| Recall@5 | 0.9879 |
| Recall@10 | 1.0000 |
KG + RAG vs RAG Only
| System Configuration | Total | Correct | Accuracy |
|---|---|---|---|
| KG + RAG (Full System) | 50 | 48 | 96% |
| RAG Only (Baseline) | 50 | 44 | 88% |
Comparative Analysis With Existing Systems
| System | MCQ Accuracy | Short Answer Score | Citation Accuracy (MCQ) |
|---|---|---|---|
| Our System | 96% | 4.8 | 96% |
| ChatGPT | 76% | 4.6 | 20% |
| Gemini 3 | 96% | 4.8 | 90% |
| NotebookLM | 96% | 4.9 | 92% |
| LankaLaw | 74.4% | 4.7 | 55.8% |
Lexical matching (BM25) substantially outperforms dense-only retrieval on rigid legal phrasing — legal text contains precise terminology that dense embeddings may paraphrase away. This validates the hybrid design decision.
Hybrid + re-ranking maximises recall while filtering noise. In legal QA, missing the correct section is costlier than over-retrieving — making the 98% hybrid hit rate the most operationally significant result.
System Demo
Impact & Limitations
Impact of the Research
- Improves access to legal information in Sinhala
- Helps citizens and small businesses understand commercial law
- Structured Legal Knowledge Representation
- Foundation for Future Legal AI Systems
Limitations
- Limited Legal Coverage
- Limited to Sinhala language queries
- Depends on available legal datasets (Acts and Gazettes)
- Response Time
Team
Supervisor
Dr. Damayanthi Herath
Supervisor
Ms. Yasodha Vimukthi
T.L.B Mapagedara
Team Member
R.J Yogesh