From c54c893eacd45eee2ae892a2ec5e98902d6cf961 Mon Sep 17 00:00:00 2001
From: Kshitij <160704796+kshitij-ka@users.noreply.github.com>
Date: Sun, 3 May 2026 22:23:23 +0530
Subject: [PATCH] docs: update README.

---
 README.md | 507 ++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 439 insertions(+), 68 deletions(-)

diff --git a/README.md b/README.md
index 98b3d0a..8358402 100644
--- a/README.md
+++ b/README.md
@@ -1,102 +1,473 @@
-# SpecForge
+# SpecForge — BIS Standards Recommendation Engine
 
-A web application for querying BIS SP-21 building material standards with semantic search and AI-powered explanations.
+> **BIS × Sigma Squad AI Hackathon** | Track: AI / Retrieval Augmented Generation (RAG)
+>
+> An end-to-end RAG system that turns plain-language product descriptions into accurate BIS standard recommendations in milliseconds — helping Indian MSEs find compliance requirements in seconds instead of weeks.
 
 ---
 
-## Features
+## Public Test Set Results
 
-- **PDF Parser**: Extracts 573 unique standards from the BIS SP-21 document (929 pages, 25 material categories)
-- **Hybrid Retrieval**: FAISS dense vectors + BM25 sparse index for accurate matching
-- **AI Explanations**: Groq LLM generates natural language explanations for recommendations
-- **Gallery UI**: Photography-first interface with alternating light/dark sections
+> Evaluated on the 10 provided public queries. Judges run: `python inference.py --input <hidden_dataset>.json --output team_results.json`
 
-## Tech Stack
+| Metric | Target | **Our Score** |
+|---|---|---|
+| Hit Rate @3 | > 80% | **100%** (10/10) |
+| MRR @5 | > 0.7 | **0.783** |
+| Avg Latency | < 5 s | **~19 ms** |
 
-| Layer | Technology |
-|-------|------------|
-| PDF Processing | Python, PyMuPDF |
-| Retrieval | FAISS, BM25 |
-| LLM | Groq (llama-3.1-8b-instant) |
-| Backend | Node.js, Express |
-| Frontend | React 19, Vite 8, React Router |
+All 10 public queries returned the expected standard in the top-3 results. Average query latency is 19 ms after the index warms up — 250× faster than the 5 s target.
 
-## Getting Started
+---
 
-### Prerequisites
+## What It Does
 
-- Node.js 18+
-- Python 3.10+
+Indian Micro and Small Enterprises (MSEs) spend weeks manually searching BIS SP-21 to identify which standards apply to their products. SpecForge eliminates that.
 
-### Installation
+1. **Describe your product** in plain language — e.g. *"We manufacture 33 Grade Ordinary Portland Cement"*
+2. **Get ranked BIS standards** with matched sections and relevance scores in milliseconds
+3. **Read AI explanations** of why each standard applies, generated by Groq LLM
 
-```bash
-# Install Python dependencies
-pip install -r requirements.txt
+The system covers all **573 unique standards** across **25 building material categories** from BIS SP-21 (Summaries of Indian Standards for Building Materials).
 
-# Install web dependencies
-cd web/server && npm install
-cd web/client && npm install
+---
+
+## System Architecture
+
+### Data Flow
+
+```
+data/raw/dataset.pdf  (BIS SP-21, 929 pages)
+  → src/parse_bis_pdf.py
+  → data/processed/standards.json          573 structured records  [committed]
+  → data/processed/standards_chunks.json   1,261 RAG-ready chunks  [committed]
+  → inference.py --build
+  → data/processed/embeddings.npy          dense vectors           [gitignored — rebuild locally]
+  → data/processed/faiss.index             FAISS index             [gitignored — rebuild locally]
 ```
 
-### Running the Application
+### Request Pipeline
 
-**All platforms:**
-```bash
-cd web && npm run dev
+```
+Browser / API Client
+  → POST /api/recommend  { query, top_n, rewrite }
+  → Express server (web/server/index.js)
+      ├─ [optional] llmService.rewriteQuery()        Groq — expands to IS-standard vocabulary
+      ├─ retrieverService.retrieve()
+      │     └─ PythonRetriever singleton              EventEmitter, queues concurrent requests
+      │           └─ bridge/retrieve.py daemon        stdin/stdout newline-delimited JSON
+      │                 └─ inference.py               FAISS 0.6 + BM25 0.4 → re-rank → top-N
+      └─ llmService.generateExplanation() × N        Promise.allSettled — parallel, non-blocking
+  → JSON { standards[], latency: { retrieval_ms, llm_ms, total_ms } }
 ```
 
-**Windows:**
-```bash
-npm run dev
-```
+### Chunking & Retrieval Strategy
 
-**Manual start:**
-```bash
-# Terminal 1: Python retrieval index
-cd web/server && node bridge/retrieve.py --build-index
+**Chunking** (`src/parse_bis_pdf.py`):
+- 2-pass boundary detection splits the 929-page PDF into per-standard records
+- Each standard is further split by section with **50-word overlap** to prevent context loss at boundaries
+- Weak chunks (<30 words) are merged with their neighbour
+- Result: 1,261 chunks from 573 standards (avg 2.2 chunks/standard)
 
-# Terminal 2: Backend
-cd web/server && npm start
+**Hybrid Retrieval** (`inference.py`):
+- **Dense**: FAISS `IndexFlatIP` with `all-MiniLM-L6-v2` embeddings (384-dim cosine similarity)
+- **Sparse**: BM25Okapi with weighted document construction — title ×4, keywords ×3, section ×2, body ×1
+- **Fusion**: `score = 0.6 × dense_norm + 0.4 × sparse_norm`
 
-# Terminal 3: Frontend
-cd web/client && npm run dev
-```
+**Re-ranking** bonuses applied per candidate:
+- +0.05 per overlapping keyword (max 4) between query and standard's keyword list
+- +0.05 per overlapping title word (max 5)
+- +0.25 if ≥60% of significant title words appear in the query (strong title match)
+- +0.20 if an exact IS ID from the query matches this standard
+- -0.15 penalty for very short chunks (<40 body words)
 
-## API Endpoints
+**Deduplication**: candidates grouped by `standard_id`; only the best-scoring chunk per standard survives. Final output is top-N unique IS standards.
 
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| POST | `/api/recommend` | Get recommended standards with AI explanations |
-| POST | `/api/ask` | Ask questions about a specific standard |
-| GET | `/api/standards` | List all standards |
-| GET | `/api/search?q=query` | Search standards by keyword |
+### Key Design Decisions
+
+| Decision | Rationale |
+|---|---|
+| Persistent Python daemon | FAISS index load takes ~18 s cold. Spawn once at boot, queue all requests through a single process — zero cold start per query. |
+| `inference.py` never modified | Bridge pattern: `bridge/retrieve.py` imports `inference.py` as a module. Judges run `inference.py` directly; the web server uses the bridge. Both paths are identical. |
+| In-memory data | 573 standards + 1,261 chunks fit comfortably in RAM. No database dependency, no I/O per request. |
+| LLM fallbacks everywhere | Every Groq call is wrapped with a timeout (8 s) and a safe default return. `Promise.allSettled` for parallel calls. Server starts and retrieval works without a `GROQ_API_KEY`. |
+| Weighted BM25 document | Repeating title tokens ×4 makes exact IS-standard name queries dominant over body-text noise — critical for the BIS domain where standard names are precise. |
+
+---
 
 ## Project Structure
 
 ```
 SpecForge/
-├── data/
-│   ├── raw/dataset.pdf           # Source BIS SP-21 PDF
-│   └── processed/                 # Generated outputs
-│       ├── standards.json       # 573 parsed standards
-│       └── standards_chunks.json # 1,261 RAG chunks
-├── src/
-│   └── parse_bis_pdf.py         # PDF parser pipeline
+├── inference.py                         # Entry point for judges — do not modify
+├── requirements.txt                     # All Python dependencies
 ├── scripts/
-│   └── eval_script.py          # Evaluation metrics
-├── web/
-│   ├── client/                 # React + Vite frontend
-│   └── server/                 # Express backend
-│       ├── services/            # LLM & retrieval services
-│       └── bridge/             # Node→Python bridge
-└── requirements.txt           # Python dependencies
+│   └── eval_script.py                   # Provided evaluation script (Hit@3, MRR@5, latency)
+├── data/
+│   └── processed/
+│       ├── standards.json               # 573 parsed standards (committed)
+│       ├── standards_chunks.json        # 1,261 RAG chunks (committed)
+│       ├── public_test_set.json         # 10 public evaluation queries
+│       └── retrieval_results.json       # Our results on public test set
+├── src/
+│   └── parse_bis_pdf.py                 # PDF → JSON parsing pipeline
+└── web/
+    ├── server/
+    │   ├── index.js                     # Express API — all routes
+    │   ├── start.js                     # Safe launcher (kills stale port process)
+    │   ├── .env.example                 # Environment template
+    │   ├── bridge/
+    │   │   └── retrieve.py              # Daemon wrapping inference.py for the web server
+    │   └── services/
+    │       ├── llmService.js            # Groq wrappers with fallbacks
+    │       └── retrieverService.js      # PythonRetriever — daemon lifecycle manager
+    └── client/
+        └── src/
+            ├── App.jsx                  # React router (5 pages)
+            ├── api/standards.js         # Typed fetch wrappers
+            ├── pages/                   # Home, Standards, Categories, Recommend, About
+            ├── components/              # Navbar, Footer, StandardCard, StandardModal
+            └── locales/                 # en/ and hi/ (English + Hindi i18n)
 ```
 
-## Configuration
+---
 
-- **GROQ_API_KEY**: Set in `web/server/.env` (gitignored)
-- **Server port**: 5000
-- **Client dev port**: 5173
+## External APIs & Data Sources
+
+All sources disclosed per hackathon transparency requirements.
+
+| Source | Purpose | Key required? | Notes |
+|---|---|---|---|
+| **BIS SP-21** (Bureau of Indian Standards, Special Publication 21) | Source dataset — 929-page PDF of building material standard summaries | No | Provided by organisers; processed JSON committed to repo |
+| **HuggingFace `all-MiniLM-L6-v2`** | 384-dimension sentence embedding model for FAISS dense retrieval | No | Downloaded automatically by `sentence-transformers` on first `--build` (~90 MB) |
+| **Groq API** (`llama-3.1-8b-instant`) | Query rewriting, per-result explanation, conversational QA | Yes — `GROQ_API_KEY` | Free tier sufficient. Groq chosen for sub-second inference latency. Retrieval works without this key. |
+
+No other external APIs, databases, or paid services are used.
 
 ---
+
+## Environment Dependencies
+
+### System Requirements
+
+| Dependency | Minimum | Notes |
+|---|---|---|
+| Python | 3.10 | For retrieval pipeline and `inference.py` |
+| Node.js | 18 | For Express server and React client |
+| npm | 9 | Ships with Node 18 |
+| `fuser` | any | Linux — used by `start.js` to clear stale port; install via `psmisc` if missing |
+
+### Hardware
+
+- **CPU**: Any x86-64 or ARM64 — no GPU required
+- **RAM**: 2 GB minimum; index + embeddings use ~500 MB
+- **GPU**: Optional — a CUDA GPU reduces index build time but `faiss-cpu` and `sentence-transformers` run fully on CPU
+- **Disk**: ~1 GB free for venv and generated index files
+
+---
+
+## Setup & Running
+
+### Step 1 — Clone
+
+```bash
+git clone https://github.com/kshitij-ka/SpecForge
+cd SpecForge
+```
+
+### Step 2 — Python virtual environment
+
+```bash
+python3 -m venv .venv
+source .venv/bin/activate        # Windows: .venv\Scripts\activate
+
+pip install --upgrade pip
+pip install -r requirements.txt
+```
+
+`requirements.txt`:
+```
+pymupdf>=1.24.0
+faiss-cpu>=1.7.4
+rank-bm25>=0.2.2
+sentence-transformers>=3.0.0
+numpy>=1.26.0
+```
+
+> `sentence-transformers` downloads `all-MiniLM-L6-v2` (~90 MB) from HuggingFace on first use.
+
+### Step 3 — Build the FAISS index
+
+The processed JSON is committed. Index files are gitignored and must be built once locally.
+
+```bash
+source .venv/bin/activate
+python inference.py --build
+```
+
+Encodes 1,261 chunks, writes `embeddings.npy` + `faiss.index` to `data/processed/`. Takes **~2 min on CPU**. Subsequent starts load from cache — no rebuild needed unless chunks change.
+
+### Step 4 — Node.js dependencies
+
+```bash
+cd web/server && npm install
+cd ../client && npm install
+```
+
+### Step 5 — Environment variables
+
+```bash
+cp web/server/.env.example web/server/.env
+```
+
+Edit `web/server/.env`:
+
+```env
+# Required for LLM explanations, query rewriting, and /api/chat
+GROQ_API_KEY=your_groq_api_key_here
+
+# Optional — defaults to 5000
+PORT=5000
+
+# Required if "python" is not Python 3 — point to your venv
+PYTHON_BIN=/path/to/SpecForge/.venv/bin/python3
+```
+
+> `PYTHON_BIN` accepts only `"python"`, `"python3"`, or an absolute path. The server validates and rejects arbitrary values on startup.
+
+### Step 6 — Start the application
+
+**Terminal 1 — API server (port 5000):**
+```bash
+cd web/server
+npm start
+```
+Wait for the log line `Python retriever ready` (~20 s first boot). The server is accepting queries after that.
+
+**Terminal 2 — Frontend dev server (port 5173):**
+```bash
+cd web/client
+npm run dev
+```
+
+Open **http://localhost:5173**. The Vite dev server proxies all `/api/*` requests to `:5000`.
+
+---
+
+## Using `inference.py` (Judge Entry Point)
+
+`inference.py` is the mandatory entry point. It runs independently of the web server.
+
+> Always activate the virtual environment first: `source .venv/bin/activate`
+
+### Build / force-rebuild the index
+
+```bash
+python inference.py --build
+```
+
+### Single query (interactive testing)
+
+```bash
+python inference.py --query "Which standard covers 33 grade OPC cement?"
+```
+
+Output:
+```
+============================================================
+Query : Which standard covers 33 grade OPC cement?
+Latency: 0.019s
+
+Top results:
+  1. IS 269: 1989 — Ordinary Portland Cement, 33 Grade
+     Category: Cement and Concrete  |  Section: Scope  |  Score: 0.8921
+  2. IS 8112: 1989 — 43 Grade Ordinary Portland Cement
+     ...
+```
+
+### Batch evaluation (judge command)
+
+```bash
+python inference.py \
+  --input  data/processed/public_test_set.json \
+  --output data/processed/retrieval_results.json
+```
+
+Input format:
+```json
+[
+  {
+    "id": "PUB-01",
+    "query": "We are a small enterprise manufacturing 33 Grade OPC...",
+    "expected_standards": ["IS 269: 1989"]
+  }
+]
+```
+
+Output format:
+```json
+[
+  {
+    "id": "PUB-01",
+    "query": "...",
+    "retrieved_standards": ["IS 8112: 1989", "IS 269: 1989", "..."],
+    "details": [
+      {
+        "standard_id": "IS 269: 1989",
+        "title": "Ordinary Portland Cement, 33 Grade",
+        "category": "Cement and Concrete",
+        "score": 0.8921,
+        "matched_section": "Scope"
+      }
+    ],
+    "latency_seconds": 0.019,
+    "expected_standards": ["IS 269: 1989"]
+  }
+]
+```
+
+## Evaluation
+
+```bash
+# Step 1: generate results
+python inference.py \
+  --input  data/processed/public_test_set.json \
+  --output data/processed/retrieval_results.json
+
+# Step 2: score
+python scripts/eval_script.py \
+  --results data/processed/retrieval_results.json
+```
+
+Targets and our results on the public set:
+
+| Metric | Formula | Target | Achieved |
+|---|---|---|---|
+| Hit Rate @3 | correct queries where expected std in top-3 / total | > 80% | **100%** |
+| MRR @5 | Σ(1/rank_i) / N | > 0.7 | **0.783** |
+| Avg Latency | total_time / num_queries | < 5 s | **~0.019 s** |
+
+---
+
+## API Reference
+
+All endpoints on Express server (default `http://localhost:5000`).
+
+### `POST /api/recommend`
+
+Core RAG endpoint. Retrieval + optional LLM explanations.
+
+```json
+// Request
+{ "query": "fire resistance for brick masonry", "top_n": 5, "rewrite": false }
+
+// Response
+{
+  "standards": [
+    {
+      "standard_id": "IS 1905: 1987",
+      "title": "Code of Practice for Structural Use of Unreinforced Masonry",
+      "category": "Masonry",
+      "score": 0.812,
+      "matched_section": "Fire Resistance",
+      "explanation": "This standard specifies..."
+    }
+  ],
+  "latency": { "retrieval_ms": 19, "llm_ms": 820, "total_ms": 839 }
+}
+```
+
+| Field | Type | Default | Description |
+|---|---|---|---|
+| `query` | string | required | Natural-language product description or compliance question |
+| `top_n` | integer | 5 | Results to return (1–10) |
+| `rewrite` | boolean | `false` | Expand query to IS-standard vocabulary via LLM before retrieval |
+
+Rate limit: 20 req/min.
+
+### `POST /api/ask`
+
+Chunk-grounded QA for a specific standard.
+
+```json
+{ "standard_id": "IS 1905: 1987", "question": "What is the minimum wall thickness?" }
+```
+
+### `POST /api/chat`
+
+Conversational QA over the standards corpus. Requires `GROQ_API_KEY`; returns `503` if absent.
+
+```json
+{ "message": "What grades of Portland cement does BIS cover?" }
+```
+
+### `GET /api/standards`
+
+Paginated list. Query params: `q` (keyword search), `category`, `page` (default 1), `limit` (default 20, max 100).
+
+### `GET /api/standards/:id`
+
+Single standard. `:id` is URL-encoded IS ID, e.g. `IS%20269%3A%201989`.
+
+### `GET /api/categories`
+
+All 25 material categories sorted alphabetically.
+
+### `GET /api/stats`
+
+```json
+{ "standards": 573, "chunks": 1261, "categories": 25 }
+```
+
+---
+
+## Features
+
+| Feature | Description |
+|---|---|
+| **Hybrid RAG retrieval** | FAISS (dense, 60%) + BM25 (sparse, 40%) fused and re-ranked |
+| **Re-ranking** | Keyword overlap, title match, exact IS-ID match, short-chunk penalty |
+| **AI explanations** | Groq `llama-3.1-8b-instant` — parallel, fallback-safe |
+| **Query rewriting** | LLM expands natural language to IS-standard vocabulary (optional) |
+| **Chunk-grounded QA** | Question answered from the most relevant chunk of a specific standard |
+| **Conversational chat** | Open-ended QA against the full corpus |
+| **Browse & filter** | Paginated standards list with keyword scoring; category gallery |
+| **Persistent daemon** | Python retrieval process spawned once at boot; auto-restarts on crash |
+| **Internationalisation** | UI in English and Hindi (i18next + react-i18next) |
+| **Rate limiting** | 60 req/min global, 20 req/min on LLM endpoints (Helmet + express-rate-limit) |
+| **Production-ready API** | Input validation, sanitisation, structured JSON logging, latency breakdown |
+
+---
+
+## Tech Stack
+
+| Layer | Technology |
+|---|---|
+| Embedding model | `all-MiniLM-L6-v2` via `sentence-transformers` |
+| Dense index | FAISS `IndexFlatIP` (cosine via inner product) |
+| Sparse index | BM25Okapi (`rank-bm25`) |
+| PDF parsing | PyMuPDF |
+| LLM | Groq API (`llama-3.1-8b-instant`) |
+| Backend | Node.js 18 + Express 5 |
+| Security middleware | Helmet, CORS, express-rate-limit |
+| Frontend | React 19, Vite 8, React Router 7 |
+| Internationalisation | i18next, react-i18next, i18next-browser-languagedetector |
+
+---
+
+## Troubleshooting
+
+| Symptom | Likely cause | Fix |
+|---|---|---|
+| `PYTHON_BIN validation failed` on start | Invalid `PYTHON_BIN` | Set to `python`, `python3`, or absolute venv path |
+| `ModuleNotFoundError: faiss` | Wrong Python binary (system Python instead of venv) | Set `PYTHON_BIN=/path/to/.venv/bin/python3` in `.env` |
+| `Python daemon boot timeout` (90 s) | Index files missing | Run `python inference.py --build` with venv active |
+| Results return but no `explanation` field | `GROQ_API_KEY` absent or invalid | Set key in `.env`; retrieval still works, explanations fall back silently |
+| `fuser: command not found` on Linux | `psmisc` not installed | `sudo apt install psmisc` / `sudo dnf install psmisc` |
+| Port 5000 still in use after crash | `fuser` not available | Manually: `kill $(lsof -t -i:5000)` |
+
+---
+
+## License
+
+See [LICENSE](LICENSE).