AI Systems Project

SpecPilot RAG

SpecPilot RAG is the kind of assistant I would build for long engineering manuals, silicon bring-up notes, and platform runbooks where the real problem is not "chat", it is finding the right source quickly and answering without drifting away from the evidence. The working version behind this page uses lexical retrieval, a Hugging Face reranker in PyTorch, and a lightweight local language model to return citation-backed answers over a technical mini-corpus.

PyTorchRAGLLMHugging FaceLexical RetrievalEvaluation

Try Now ↓

Why this project

Generic chat interfaces usually break down on dense technical documentation because the answer quality depends more on retrieval quality than on clever wording. This project is shaped around that reality: stronger candidate retrieval, an explicit reranking stage, grounded generation, and clearer citations. It is especially relevant for product specs, API references, EDA flow notes, and bring-up documents that engineers revisit under time pressure.

Model stack

Lexical retrieval over chunked technical documents
PyTorch cross-encoder reranker for relevance refinement
Hugging Face generation constrained by retrieved evidence
Evaluation loop for citation quality and factual drift

Interactive Mock

Retrieval candidates, reranking, grounded answer

User Query

Which sequence enables low-power mode without breaking the wake-up interrupt path?

Grounded Answer

LLM + citations

The safe sequence is: program the retention bits first, enable the wake-up interrupt source, flush outstanding register writes, and only then assert the low-power control bit. The retrieved notes also warn that gating the peripheral clock before the interrupt mux is latched can suppress the wake signal.

Candidate Retrieval

PMU Integration Guide0.86

Retention configuration must be written before LP_MODE_EN is asserted. Wake interrupt routing should be validated before clock gating.

SoC Bring-up Notes0.81

A previous failure mode occurred when the peripheral clock was gated early, preventing the wake interrupt from propagating.

Register Map Appendix0.73

LP_MODE_EN is located in PMU_CTRL[4]. WAKE_SRC_SEL selects the interrupt source and must be non-zero.

PyTorch Reranker Output

PMU Integration Guide0.94

Most relevant because it describes the exact ordering constraints around retention, interrupt routing, and low-power entry.

SoC Bring-up Notes0.91

Promoted after reranking because it captures the concrete failure mode that caused missed wake-ups in hardware validation.

lexical retrievalPyTorch cross-encoderHugging Face generationcitation grounding

System architecture

Document ingestion

Chunk large PDFs, runbooks, and wiki pages into retrieval-sized passages while preserving section headers, product names, and source metadata for downstream citation.

Candidate retrieval

Use lexical retrieval over chunked technical notes to pull a focused candidate set before the more expensive ranking stage runs.

PyTorch reranking

Use a cross-encoder reranker in PyTorch to rescore the top retrieved passages and push the most grounded evidence into the prompt window.

Grounded generation

Send only the best evidence into the answer stage, require citations, and measure whether the response stays inside the retrieved material instead of inventing details.

Evaluation goals

Citation hit-rate on long technical questions
Lower hallucination rate on part numbers and register names
Latency budget below 2.5s for top-k retrieval + reranking
Answer helpfulness measured against manually written reference responses