11 Commits

Author SHA1 Message Date
Kshitij e6fc2590c9 chore(data): add 5 new standards, enrich chunks with full_title/keywords, update eval scores. 2026-05-04 15:45:33 +05:30
Kshitij fdae5d2318 chore: update evaluation results with revised scores. 2026-05-04 15:11:16 +05:30
Kshitij b055edbbc0 feat: add 10 new standards and chunks for steel, wood, electrical categories. 2026-05-04 15:10:53 +05:30
Kshitij 42fed21586 fix: correct erroneous scope descriptions for 7 standards. 2026-05-04 15:10:27 +05:30
Kshitij 3fbf91c706 feat(retrieval): add grade matching and same-family part disambiguation.
Boost scores when query grade matches standard title grade, penalize mismatches. Add part disambiguation to correctly route queries to specific standard parts (e.g., IS 12269 (Part 1) vs (Part 2)). Regenerate retrieval results with improved ranking.
2026-05-04 00:20:19 +05:30
Kshitij 28bb4ca1de fix(parser): recover stolen scope text and truncate next-standard bleed
Add Pass 3 to recover scope text incorrectly placed in previous block, and Pass 4 to truncate bleed from the following standard. Regenerate standards.json and standards_chunks.json with the improved parser.
2026-05-04 00:18:17 +05:30
Kshitij 844973fb39 chore: add updated results in json format; output of inference.py 2026-05-03 01:45:18 +05:30
Kshitij 0440b76111 Revert "chore: remove unnecessary files from /data directory and move /data/processed/retrieval_results.json to /data/results.json"
This reverts commit 1efc0e3482.
2026-05-03 01:32:10 +05:30
Kshitij 1efc0e3482 chore: remove unnecessary files from /data directory and move /data/processed/retrieval_results.json to /data/results.json 2026-05-03 01:18:14 +05:30
Kshitij 29b32dfcac fix: complete requirements.txt and inference.py output correctness.
- Add faiss-cpu, rank-bm25, sentence-transformers, numpy to requirements.txt.
  (previously only pymupdf was listed; other deps were manual-install only)
- Cast score to float() before round() to avoid numpy type serialization errors.
- Pass expected_standards through _format_result for eval script compatibility.
- Update retrieval_results.json with expected_standards per query for eval.
2026-05-03 00:03:03 +05:30
Kshitij f65185b91e feat: add data processing outputs. 2026-04-28 23:55:59 +05:30