Skip to content
Live · Generative AI · NLP

Document Q&A · grounded RAG

Type a question about AI implementation, governance, RAG, MLOps, or the EU AI Act. The system embeds your question with a small ONNX sentence-transformer (BGE-small), scores it against 12 curated paragraphs by cosine similarity, and returns the best sentence with the supporting citations.

First query loads the embedding model — 5–10 s. Subsequent queries return in under 200 ms.

Try one

Knowledge base

0 curated articles on AI implementation, governance, RAG, MLOps.

    How it works

    01

    Embed corpus

    Twelve original AI-consulting paragraphs are split into sentence chunks and embedded once at boot with BGE-small via fastembed (ONNX).

    02

    Embed the question

    Same model embeds the live question; result is L2-normalised.

    03

    Cosine retrieval

    Dot product against the chunk matrix returns top-k matches by semantic similarity.

    04

    Extractive answer

    The highest-scoring sentence is returned as the answer. Other top hits accompany it as citations.

    05

    No LLM in this demo

    Pure retrieval. In production we feed the citations to an LLM that synthesises an answer constrained to the retrieved context.

    06

    Production swap

    Replace the corpus, add BM25 hybrid retrieval, plug in Cohere reranker + Anthropic Claude — same API shape, real answers.

    Want this over your own corpus?

    We build production RAG with your documents, hybrid retrieval (BM25 + dense + reranker), eval gates on groundedness and faithfulness, and citation-grade outputs your auditor will sign off on.

    Ready to start

    Turn one AI use case into measurable production value.

    Book a 30-minute consultation. We will walk through the use case, sketch the value case, and tell you honestly whether we can help.