Live · Generative AI · NLP

Document Q&A · grounded RAG

Type a question about AI implementation, governance, RAG, MLOps, or the EU AI Act. The system embeds your question with a small ONNX sentence-transformer (BGE-small), scores it against 12 curated paragraphs by cosine similarity, and returns the best sentence with the supporting citations.

Try one

Knowledge base

0 curated articles on AI implementation, governance, RAG, MLOps.

How it works

Embed corpus

Twelve original AI-consulting paragraphs are split into sentence chunks and embedded once at boot with BGE-small via fastembed (ONNX).

Embed the question

Same model embeds the live question; result is L2-normalised.

Cosine retrieval

Dot product against the chunk matrix returns top-k matches by semantic similarity.

Extractive answer

The highest-scoring sentence is returned as the answer. Other top hits accompany it as citations.

No LLM in this demo

Pure retrieval. In production we feed the citations to an LLM that synthesises an answer constrained to the retrieved context.

Production swap

Replace the corpus, add BM25 hybrid retrieval, plug in Cohere reranker + Anthropic Claude — same API shape, real answers.

Want this over your own corpus?

We build production RAG with your documents, hybrid retrieval (BM25 + dense + reranker), eval gates on groundedness and faithfulness, and citation-grade outputs your auditor will sign off on.

Talk to us Generative AI & RAG services

Ready to start

Turn one AI use case into measurable production value.

Book a 30-minute consultation. We will walk through the use case, sketch the value case, and tell you honestly whether we can help.

Book a consultation See all services