rag-assistant
RAG assistant over a synthetic internal document corpus
File Browser:
- 📁 data
- 📁 scripts
- 📁 src
- 📄 .env.example
- 📄 .gitignore
- 📄 PLAN.md
- 📄 README.md
- 📄 REPORT.md
- 📄 requirements.txt
README
Reinsurance RAG Assistant
CLI prototype for asking questions over a synthetic internal reinsurance advisory
knowledge base.
The corpus contains fictional documents for a reinsurance consulting and analytics
company: meeting notes, proposals, actuarial reports, team notes, CSV extracts, and JSON
summaries.
What Is Included
Implemented:
- 30 synthetic corpus documents in
data/corpus/ - 15 evaluation questions in
data/eval/questions.jsonl - ChromaDB index builder
- Local embeddings with
sentence-transformers/all-MiniLM-L6-v2 - CLI question answering with one-shot and interactive chat modes
- OpenRouter-backed answer generation
- Retrieval evaluation
- LLM-as-judge answer evaluation
Setup
Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate
Install dependencies:
pip install -r requirements.txt
Create a local .env file:
OPENROUTER_API_KEY=your_key_here
GENERATION_MODEL=openai/gpt-5-mini
JUDGE_MODEL=openai/gpt-5-mini
The default generation AND judge model is openai/gpt-5-mini through OpenRouter.
Build The Index
The ChromaDB index is not committed. Build it locally from the corpus:
python scripts/build_index.py
This creates data/index/.
Ask Questions
Ask one question:
python -m src.app ask "What attachment point was finally selected for Triglav Adriatic's property catastrophe renewal?"
To inspect which chunks were retrieved:
python -m src.app ask "Was there conflicting guidance about Triglav Adriatic's attachment point?" --show-retrieved
Run interactive chat mode:
python -m src.app chat
Type exit or quit to stop.
Interactive mode also supports retrieval debugging:
python -m src.app chat --show-retrieved
Run Retrieval Evaluation
python scripts/run_eval.py --retrieval-only --top-k 7
Current retrieval result:
pass: 13
partial: 0
fail: 0
review: 2
The 2 review cases are intentionally unanswerable questions and require manual answer
review.
The retrieval evaluation checks required evidence coverage. A question passes when all
expected source documents appear in the retrieved set. Extra retrieved documents are
allowed because retrieval provides context for the answer generator; the final LLM prompt
is responsible for citing only directly supporting sources.
Run Answer Evaluation
python scripts/run_eval.py --answers
Answer evaluation runs the full RAG pipeline and then uses an OpenRouter judge model to
grade the generated answer. The judge receives the question, expected answer, key facts,
expected sources, retrieved sources, and actual answer.
Current answer evaluation result:
pass: 15
partial: 0
fail: 0
Evaluation runs also save machine-readable JSON files:
data/eval/results/retrieval_eval_latest.json
data/eval/results/answer_eval_latest.json
Example Questions
What attachment point was finally selected for Triglav Adriatic's property catastrophe renewal?
Was there conflicting guidance about Triglav Adriatic's attachment point?
Which consultant should answer bordereaux quality questions for Balkan Motor Pool and why?
What cyber catastrophe treaty did Nova Kredit buy?
RAG Pipeline
- Load documents from
data/corpus/. - Split documents into chunks.
- Embed chunks with
sentence-transformers/all-MiniLM-L6-v2. - Store embeddings and metadata in ChromaDB.
- Embed the user question.
- Retrieve a moderately broad candidate set from ChromaDB.
- Rerank candidates using normalized, capped keyword-overlap scoring.
- Select final chunks for generation.
- Send the question and retrieved context to OpenRouter.
- Return a grounded answer with source citations.
The keyword rerank uses only meaningful terms from the user question. It calculates the
percentage of question terms that appear in each candidate chunk, then applies a capped
boost to the Chroma distance. This lets exact wording help without allowing long
questions to overpower vector similarity.
Date and numeric tokens are included in keyword matching because exact values such as
meeting dates, percentages, limits, deductibles, and attachment points matter in this
domain.
Chunking Strategy
Markdown documents shorter than 900 words are kept as one full-document chunk.
Longer Markdown documents are split by headings. If a section is still too long, it is
split by paragraphs.
CSV, JSON, and TXT files are kept as full-document chunks.
This keeps short notes readable as complete evidence while still splitting long reports
into focused sections.
Known Limitations
- The answer generator depends on OpenRouter being configured.
- The corpus is synthetic, so performance should not be treated as production quality.
- Exact CSV/table-cell questions can still be brittle when rows contain sparse or blank
values. - The CLI prints only text output; there is no web interface.