rag-assistant

RAG assistant over a synthetic internal document corpus

File Browser:

📁 data
📁 scripts
📁 src
📄 .env.example
📄 .gitignore
📄 PLAN.md
📄 README.md
📄 REPORT.md
📄 requirements.txt

README

Reinsurance RAG Assistant

CLI prototype for asking questions over a synthetic internal reinsurance advisory
knowledge base.

The corpus contains fictional documents for a reinsurance consulting and analytics
company: meeting notes, proposals, actuarial reports, team notes, CSV extracts, and JSON
summaries.

What Is Included

Implemented:

30 synthetic corpus documents in data/corpus/
15 evaluation questions in data/eval/questions.jsonl
ChromaDB index builder
Local embeddings with sentence-transformers/all-MiniLM-L6-v2
CLI question answering with one-shot and interactive chat modes
OpenRouter-backed answer generation
Retrieval evaluation
LLM-as-judge answer evaluation

Setup

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Create a local .env file:

OPENROUTER_API_KEY=your_key_here
GENERATION_MODEL=openai/gpt-5-mini
JUDGE_MODEL=openai/gpt-5-mini

The default generation AND judge model is openai/gpt-5-mini through OpenRouter.

Build The Index

The ChromaDB index is not committed. Build it locally from the corpus:

python scripts/build_index.py

This creates data/index/.

Ask Questions

Ask one question:

python -m src.app ask "What attachment point was finally selected for Triglav Adriatic's property catastrophe renewal?"

To inspect which chunks were retrieved:

python -m src.app ask "Was there conflicting guidance about Triglav Adriatic's attachment point?" --show-retrieved

Run interactive chat mode:

python -m src.app chat

Type exit or quit to stop.

Interactive mode also supports retrieval debugging:

python -m src.app chat --show-retrieved

Run Retrieval Evaluation

python scripts/run_eval.py --retrieval-only --top-k 7

Current retrieval result:

pass: 13
partial: 0
fail: 0
review: 2

The 2 review cases are intentionally unanswerable questions and require manual answer
review.

The retrieval evaluation checks required evidence coverage. A question passes when all
expected source documents appear in the retrieved set. Extra retrieved documents are
allowed because retrieval provides context for the answer generator; the final LLM prompt
is responsible for citing only directly supporting sources.

Run Answer Evaluation

python scripts/run_eval.py --answers

Answer evaluation runs the full RAG pipeline and then uses an OpenRouter judge model to
grade the generated answer. The judge receives the question, expected answer, key facts,
expected sources, retrieved sources, and actual answer.

Current answer evaluation result:

pass: 15
partial: 0
fail: 0

Evaluation runs also save machine-readable JSON files:

data/eval/results/retrieval_eval_latest.json
data/eval/results/answer_eval_latest.json

Example Questions

What attachment point was finally selected for Triglav Adriatic's property catastrophe renewal?
Was there conflicting guidance about Triglav Adriatic's attachment point?
Which consultant should answer bordereaux quality questions for Balkan Motor Pool and why?
What cyber catastrophe treaty did Nova Kredit buy?

RAG Pipeline

Load documents from data/corpus/.
Split documents into chunks.
Embed chunks with sentence-transformers/all-MiniLM-L6-v2.
Store embeddings and metadata in ChromaDB.
Embed the user question.
Retrieve a moderately broad candidate set from ChromaDB.
Rerank candidates using normalized, capped keyword-overlap scoring.
Select final chunks for generation.
Send the question and retrieved context to OpenRouter.
Return a grounded answer with source citations.

The keyword rerank uses only meaningful terms from the user question. It calculates the
percentage of question terms that appear in each candidate chunk, then applies a capped
boost to the Chroma distance. This lets exact wording help without allowing long
questions to overpower vector similarity.

Date and numeric tokens are included in keyword matching because exact values such as
meeting dates, percentages, limits, deductibles, and attachment points matter in this
domain.

Chunking Strategy

Markdown documents shorter than 900 words are kept as one full-document chunk.

Longer Markdown documents are split by headings. If a section is still too long, it is
split by paragraphs.

CSV, JSON, and TXT files are kept as full-document chunks.

This keeps short notes readable as complete evidence while still splitting long reports
into focused sections.

Known Limitations

The answer generator depends on OpenRouter being configured.
The corpus is synthetic, so performance should not be treated as production quality.
Exact CSV/table-cell questions can still be brittle when rows contain sparse or blank
values.
The CLI prints only text output; there is no web interface.