From c54c893eacd45eee2ae892a2ec5e98902d6cf961 Mon Sep 17 00:00:00 2001 From: Kshitij <160704796+kshitij-ka@users.noreply.github.com> Date: Sun, 3 May 2026 22:23:23 +0530 Subject: [PATCH] docs: update README. --- README.md | 507 ++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 439 insertions(+), 68 deletions(-) diff --git a/README.md b/README.md index 98b3d0a..8358402 100644 --- a/README.md +++ b/README.md @@ -1,102 +1,473 @@ -# SpecForge +# SpecForge — BIS Standards Recommendation Engine -A web application for querying BIS SP-21 building material standards with semantic search and AI-powered explanations. +> **BIS × Sigma Squad AI Hackathon** | Track: AI / Retrieval Augmented Generation (RAG) +> +> An end-to-end RAG system that turns plain-language product descriptions into accurate BIS standard recommendations in milliseconds — helping Indian MSEs find compliance requirements in seconds instead of weeks. --- -## Features +## Public Test Set Results -- **PDF Parser**: Extracts 573 unique standards from the BIS SP-21 document (929 pages, 25 material categories) -- **Hybrid Retrieval**: FAISS dense vectors + BM25 sparse index for accurate matching -- **AI Explanations**: Groq LLM generates natural language explanations for recommendations -- **Gallery UI**: Photography-first interface with alternating light/dark sections +> Evaluated on the 10 provided public queries. Judges run: `python inference.py --input .json --output team_results.json` -## Tech Stack +| Metric | Target | **Our Score** | +|---|---|---| +| Hit Rate @3 | > 80% | **100%** (10/10) | +| MRR @5 | > 0.7 | **0.783** | +| Avg Latency | < 5 s | **~19 ms** | -| Layer | Technology | -|-------|------------| -| PDF Processing | Python, PyMuPDF | -| Retrieval | FAISS, BM25 | -| LLM | Groq (llama-3.1-8b-instant) | -| Backend | Node.js, Express | -| Frontend | React 19, Vite 8, React Router | +All 10 public queries returned the expected standard in the top-3 results. Average query latency is 19 ms after the index warms up — 250× faster than the 5 s target. -## Getting Started +--- -### Prerequisites +## What It Does -- Node.js 18+ -- Python 3.10+ +Indian Micro and Small Enterprises (MSEs) spend weeks manually searching BIS SP-21 to identify which standards apply to their products. SpecForge eliminates that. -### Installation +1. **Describe your product** in plain language — e.g. *"We manufacture 33 Grade Ordinary Portland Cement"* +2. **Get ranked BIS standards** with matched sections and relevance scores in milliseconds +3. **Read AI explanations** of why each standard applies, generated by Groq LLM -```bash -# Install Python dependencies -pip install -r requirements.txt +The system covers all **573 unique standards** across **25 building material categories** from BIS SP-21 (Summaries of Indian Standards for Building Materials). -# Install web dependencies -cd web/server && npm install -cd web/client && npm install +--- + +## System Architecture + +### Data Flow + +``` +data/raw/dataset.pdf (BIS SP-21, 929 pages) + → src/parse_bis_pdf.py + → data/processed/standards.json 573 structured records [committed] + → data/processed/standards_chunks.json 1,261 RAG-ready chunks [committed] + → inference.py --build + → data/processed/embeddings.npy dense vectors [gitignored — rebuild locally] + → data/processed/faiss.index FAISS index [gitignored — rebuild locally] ``` -### Running the Application +### Request Pipeline -**All platforms:** -```bash -cd web && npm run dev +``` +Browser / API Client + → POST /api/recommend { query, top_n, rewrite } + → Express server (web/server/index.js) + ├─ [optional] llmService.rewriteQuery() Groq — expands to IS-standard vocabulary + ├─ retrieverService.retrieve() + │ └─ PythonRetriever singleton EventEmitter, queues concurrent requests + │ └─ bridge/retrieve.py daemon stdin/stdout newline-delimited JSON + │ └─ inference.py FAISS 0.6 + BM25 0.4 → re-rank → top-N + └─ llmService.generateExplanation() × N Promise.allSettled — parallel, non-blocking + → JSON { standards[], latency: { retrieval_ms, llm_ms, total_ms } } ``` -**Windows:** -```bash -npm run dev -``` +### Chunking & Retrieval Strategy -**Manual start:** -```bash -# Terminal 1: Python retrieval index -cd web/server && node bridge/retrieve.py --build-index +**Chunking** (`src/parse_bis_pdf.py`): +- 2-pass boundary detection splits the 929-page PDF into per-standard records +- Each standard is further split by section with **50-word overlap** to prevent context loss at boundaries +- Weak chunks (<30 words) are merged with their neighbour +- Result: 1,261 chunks from 573 standards (avg 2.2 chunks/standard) -# Terminal 2: Backend -cd web/server && npm start +**Hybrid Retrieval** (`inference.py`): +- **Dense**: FAISS `IndexFlatIP` with `all-MiniLM-L6-v2` embeddings (384-dim cosine similarity) +- **Sparse**: BM25Okapi with weighted document construction — title ×4, keywords ×3, section ×2, body ×1 +- **Fusion**: `score = 0.6 × dense_norm + 0.4 × sparse_norm` -# Terminal 3: Frontend -cd web/client && npm run dev -``` +**Re-ranking** bonuses applied per candidate: +- +0.05 per overlapping keyword (max 4) between query and standard's keyword list +- +0.05 per overlapping title word (max 5) +- +0.25 if ≥60% of significant title words appear in the query (strong title match) +- +0.20 if an exact IS ID from the query matches this standard +- -0.15 penalty for very short chunks (<40 body words) -## API Endpoints +**Deduplication**: candidates grouped by `standard_id`; only the best-scoring chunk per standard survives. Final output is top-N unique IS standards. -| Method | Endpoint | Description | -|--------|----------|-------------| -| POST | `/api/recommend` | Get recommended standards with AI explanations | -| POST | `/api/ask` | Ask questions about a specific standard | -| GET | `/api/standards` | List all standards | -| GET | `/api/search?q=query` | Search standards by keyword | +### Key Design Decisions + +| Decision | Rationale | +|---|---| +| Persistent Python daemon | FAISS index load takes ~18 s cold. Spawn once at boot, queue all requests through a single process — zero cold start per query. | +| `inference.py` never modified | Bridge pattern: `bridge/retrieve.py` imports `inference.py` as a module. Judges run `inference.py` directly; the web server uses the bridge. Both paths are identical. | +| In-memory data | 573 standards + 1,261 chunks fit comfortably in RAM. No database dependency, no I/O per request. | +| LLM fallbacks everywhere | Every Groq call is wrapped with a timeout (8 s) and a safe default return. `Promise.allSettled` for parallel calls. Server starts and retrieval works without a `GROQ_API_KEY`. | +| Weighted BM25 document | Repeating title tokens ×4 makes exact IS-standard name queries dominant over body-text noise — critical for the BIS domain where standard names are precise. | + +--- ## Project Structure ``` SpecForge/ -├── data/ -│ ├── raw/dataset.pdf # Source BIS SP-21 PDF -│ └── processed/ # Generated outputs -│ ├── standards.json # 573 parsed standards -│ └── standards_chunks.json # 1,261 RAG chunks -├── src/ -│ └── parse_bis_pdf.py # PDF parser pipeline +├── inference.py # Entry point for judges — do not modify +├── requirements.txt # All Python dependencies ├── scripts/ -│ └── eval_script.py # Evaluation metrics -├── web/ -│ ├── client/ # React + Vite frontend -│ └── server/ # Express backend -│ ├── services/ # LLM & retrieval services -│ └── bridge/ # Node→Python bridge -└── requirements.txt # Python dependencies +│ └── eval_script.py # Provided evaluation script (Hit@3, MRR@5, latency) +├── data/ +│ └── processed/ +│ ├── standards.json # 573 parsed standards (committed) +│ ├── standards_chunks.json # 1,261 RAG chunks (committed) +│ ├── public_test_set.json # 10 public evaluation queries +│ └── retrieval_results.json # Our results on public test set +├── src/ +│ └── parse_bis_pdf.py # PDF → JSON parsing pipeline +└── web/ + ├── server/ + │ ├── index.js # Express API — all routes + │ ├── start.js # Safe launcher (kills stale port process) + │ ├── .env.example # Environment template + │ ├── bridge/ + │ │ └── retrieve.py # Daemon wrapping inference.py for the web server + │ └── services/ + │ ├── llmService.js # Groq wrappers with fallbacks + │ └── retrieverService.js # PythonRetriever — daemon lifecycle manager + └── client/ + └── src/ + ├── App.jsx # React router (5 pages) + ├── api/standards.js # Typed fetch wrappers + ├── pages/ # Home, Standards, Categories, Recommend, About + ├── components/ # Navbar, Footer, StandardCard, StandardModal + └── locales/ # en/ and hi/ (English + Hindi i18n) ``` -## Configuration +--- -- **GROQ_API_KEY**: Set in `web/server/.env` (gitignored) -- **Server port**: 5000 -- **Client dev port**: 5173 +## External APIs & Data Sources + +All sources disclosed per hackathon transparency requirements. + +| Source | Purpose | Key required? | Notes | +|---|---|---|---| +| **BIS SP-21** (Bureau of Indian Standards, Special Publication 21) | Source dataset — 929-page PDF of building material standard summaries | No | Provided by organisers; processed JSON committed to repo | +| **HuggingFace `all-MiniLM-L6-v2`** | 384-dimension sentence embedding model for FAISS dense retrieval | No | Downloaded automatically by `sentence-transformers` on first `--build` (~90 MB) | +| **Groq API** (`llama-3.1-8b-instant`) | Query rewriting, per-result explanation, conversational QA | Yes — `GROQ_API_KEY` | Free tier sufficient. Groq chosen for sub-second inference latency. Retrieval works without this key. | + +No other external APIs, databases, or paid services are used. --- + +## Environment Dependencies + +### System Requirements + +| Dependency | Minimum | Notes | +|---|---|---| +| Python | 3.10 | For retrieval pipeline and `inference.py` | +| Node.js | 18 | For Express server and React client | +| npm | 9 | Ships with Node 18 | +| `fuser` | any | Linux — used by `start.js` to clear stale port; install via `psmisc` if missing | + +### Hardware + +- **CPU**: Any x86-64 or ARM64 — no GPU required +- **RAM**: 2 GB minimum; index + embeddings use ~500 MB +- **GPU**: Optional — a CUDA GPU reduces index build time but `faiss-cpu` and `sentence-transformers` run fully on CPU +- **Disk**: ~1 GB free for venv and generated index files + +--- + +## Setup & Running + +### Step 1 — Clone + +```bash +git clone https://github.com/kshitij-ka/SpecForge +cd SpecForge +``` + +### Step 2 — Python virtual environment + +```bash +python3 -m venv .venv +source .venv/bin/activate # Windows: .venv\Scripts\activate + +pip install --upgrade pip +pip install -r requirements.txt +``` + +`requirements.txt`: +``` +pymupdf>=1.24.0 +faiss-cpu>=1.7.4 +rank-bm25>=0.2.2 +sentence-transformers>=3.0.0 +numpy>=1.26.0 +``` + +> `sentence-transformers` downloads `all-MiniLM-L6-v2` (~90 MB) from HuggingFace on first use. + +### Step 3 — Build the FAISS index + +The processed JSON is committed. Index files are gitignored and must be built once locally. + +```bash +source .venv/bin/activate +python inference.py --build +``` + +Encodes 1,261 chunks, writes `embeddings.npy` + `faiss.index` to `data/processed/`. Takes **~2 min on CPU**. Subsequent starts load from cache — no rebuild needed unless chunks change. + +### Step 4 — Node.js dependencies + +```bash +cd web/server && npm install +cd ../client && npm install +``` + +### Step 5 — Environment variables + +```bash +cp web/server/.env.example web/server/.env +``` + +Edit `web/server/.env`: + +```env +# Required for LLM explanations, query rewriting, and /api/chat +GROQ_API_KEY=your_groq_api_key_here + +# Optional — defaults to 5000 +PORT=5000 + +# Required if "python" is not Python 3 — point to your venv +PYTHON_BIN=/path/to/SpecForge/.venv/bin/python3 +``` + +> `PYTHON_BIN` accepts only `"python"`, `"python3"`, or an absolute path. The server validates and rejects arbitrary values on startup. + +### Step 6 — Start the application + +**Terminal 1 — API server (port 5000):** +```bash +cd web/server +npm start +``` +Wait for the log line `Python retriever ready` (~20 s first boot). The server is accepting queries after that. + +**Terminal 2 — Frontend dev server (port 5173):** +```bash +cd web/client +npm run dev +``` + +Open **http://localhost:5173**. The Vite dev server proxies all `/api/*` requests to `:5000`. + +--- + +## Using `inference.py` (Judge Entry Point) + +`inference.py` is the mandatory entry point. It runs independently of the web server. + +> Always activate the virtual environment first: `source .venv/bin/activate` + +### Build / force-rebuild the index + +```bash +python inference.py --build +``` + +### Single query (interactive testing) + +```bash +python inference.py --query "Which standard covers 33 grade OPC cement?" +``` + +Output: +``` +============================================================ +Query : Which standard covers 33 grade OPC cement? +Latency: 0.019s + +Top results: + 1. IS 269: 1989 — Ordinary Portland Cement, 33 Grade + Category: Cement and Concrete | Section: Scope | Score: 0.8921 + 2. IS 8112: 1989 — 43 Grade Ordinary Portland Cement + ... +``` + +### Batch evaluation (judge command) + +```bash +python inference.py \ + --input data/processed/public_test_set.json \ + --output data/processed/retrieval_results.json +``` + +Input format: +```json +[ + { + "id": "PUB-01", + "query": "We are a small enterprise manufacturing 33 Grade OPC...", + "expected_standards": ["IS 269: 1989"] + } +] +``` + +Output format: +```json +[ + { + "id": "PUB-01", + "query": "...", + "retrieved_standards": ["IS 8112: 1989", "IS 269: 1989", "..."], + "details": [ + { + "standard_id": "IS 269: 1989", + "title": "Ordinary Portland Cement, 33 Grade", + "category": "Cement and Concrete", + "score": 0.8921, + "matched_section": "Scope" + } + ], + "latency_seconds": 0.019, + "expected_standards": ["IS 269: 1989"] + } +] +``` + +## Evaluation + +```bash +# Step 1: generate results +python inference.py \ + --input data/processed/public_test_set.json \ + --output data/processed/retrieval_results.json + +# Step 2: score +python scripts/eval_script.py \ + --results data/processed/retrieval_results.json +``` + +Targets and our results on the public set: + +| Metric | Formula | Target | Achieved | +|---|---|---|---| +| Hit Rate @3 | correct queries where expected std in top-3 / total | > 80% | **100%** | +| MRR @5 | Σ(1/rank_i) / N | > 0.7 | **0.783** | +| Avg Latency | total_time / num_queries | < 5 s | **~0.019 s** | + +--- + +## API Reference + +All endpoints on Express server (default `http://localhost:5000`). + +### `POST /api/recommend` + +Core RAG endpoint. Retrieval + optional LLM explanations. + +```json +// Request +{ "query": "fire resistance for brick masonry", "top_n": 5, "rewrite": false } + +// Response +{ + "standards": [ + { + "standard_id": "IS 1905: 1987", + "title": "Code of Practice for Structural Use of Unreinforced Masonry", + "category": "Masonry", + "score": 0.812, + "matched_section": "Fire Resistance", + "explanation": "This standard specifies..." + } + ], + "latency": { "retrieval_ms": 19, "llm_ms": 820, "total_ms": 839 } +} +``` + +| Field | Type | Default | Description | +|---|---|---|---| +| `query` | string | required | Natural-language product description or compliance question | +| `top_n` | integer | 5 | Results to return (1–10) | +| `rewrite` | boolean | `false` | Expand query to IS-standard vocabulary via LLM before retrieval | + +Rate limit: 20 req/min. + +### `POST /api/ask` + +Chunk-grounded QA for a specific standard. + +```json +{ "standard_id": "IS 1905: 1987", "question": "What is the minimum wall thickness?" } +``` + +### `POST /api/chat` + +Conversational QA over the standards corpus. Requires `GROQ_API_KEY`; returns `503` if absent. + +```json +{ "message": "What grades of Portland cement does BIS cover?" } +``` + +### `GET /api/standards` + +Paginated list. Query params: `q` (keyword search), `category`, `page` (default 1), `limit` (default 20, max 100). + +### `GET /api/standards/:id` + +Single standard. `:id` is URL-encoded IS ID, e.g. `IS%20269%3A%201989`. + +### `GET /api/categories` + +All 25 material categories sorted alphabetically. + +### `GET /api/stats` + +```json +{ "standards": 573, "chunks": 1261, "categories": 25 } +``` + +--- + +## Features + +| Feature | Description | +|---|---| +| **Hybrid RAG retrieval** | FAISS (dense, 60%) + BM25 (sparse, 40%) fused and re-ranked | +| **Re-ranking** | Keyword overlap, title match, exact IS-ID match, short-chunk penalty | +| **AI explanations** | Groq `llama-3.1-8b-instant` — parallel, fallback-safe | +| **Query rewriting** | LLM expands natural language to IS-standard vocabulary (optional) | +| **Chunk-grounded QA** | Question answered from the most relevant chunk of a specific standard | +| **Conversational chat** | Open-ended QA against the full corpus | +| **Browse & filter** | Paginated standards list with keyword scoring; category gallery | +| **Persistent daemon** | Python retrieval process spawned once at boot; auto-restarts on crash | +| **Internationalisation** | UI in English and Hindi (i18next + react-i18next) | +| **Rate limiting** | 60 req/min global, 20 req/min on LLM endpoints (Helmet + express-rate-limit) | +| **Production-ready API** | Input validation, sanitisation, structured JSON logging, latency breakdown | + +--- + +## Tech Stack + +| Layer | Technology | +|---|---| +| Embedding model | `all-MiniLM-L6-v2` via `sentence-transformers` | +| Dense index | FAISS `IndexFlatIP` (cosine via inner product) | +| Sparse index | BM25Okapi (`rank-bm25`) | +| PDF parsing | PyMuPDF | +| LLM | Groq API (`llama-3.1-8b-instant`) | +| Backend | Node.js 18 + Express 5 | +| Security middleware | Helmet, CORS, express-rate-limit | +| Frontend | React 19, Vite 8, React Router 7 | +| Internationalisation | i18next, react-i18next, i18next-browser-languagedetector | + +--- + +## Troubleshooting + +| Symptom | Likely cause | Fix | +|---|---|---| +| `PYTHON_BIN validation failed` on start | Invalid `PYTHON_BIN` | Set to `python`, `python3`, or absolute venv path | +| `ModuleNotFoundError: faiss` | Wrong Python binary (system Python instead of venv) | Set `PYTHON_BIN=/path/to/.venv/bin/python3` in `.env` | +| `Python daemon boot timeout` (90 s) | Index files missing | Run `python inference.py --build` with venv active | +| Results return but no `explanation` field | `GROQ_API_KEY` absent or invalid | Set key in `.env`; retrieval still works, explanations fall back silently | +| `fuser: command not found` on Linux | `psmisc` not installed | `sudo apt install psmisc` / `sudo dnf install psmisc` | +| Port 5000 still in use after crash | `fuser` not available | Manually: `kill $(lsof -t -i:5000)` | + +--- + +## License + +See [LICENSE](LICENSE).