Hybrid retrieval (BM25 + dense), cross-encoder reranking, knowledge graph entity boost, evidence tier scoring, freshness decay. Not a prototype — production-grade.
Pinecone or Weaviate you pick in a day, then comes the work: chunking, reranking, scoring, freshness, observability. RAG tax is underestimated.
The model invents facts when context isn't relevant enough. Without a relevance gate, evidence tiers and source attribution this happens regularly.
Too many chunks dilutes; too few leaves answers incomplete. Smart selection is its own engineering discipline.
Searches tickets, docs, release notes. Cross-encoder picks the 8 most relevant; LLM answers grounded. Hallucinations excluded.
Wikipedia-style bot for the whole company. Per-team scoping, freshness alerts, audit logs for compliance.
Legal, medical, financial — domains where accuracy is critical. Evidence tier scoring puts regulation above blog posts.
BM25 (lexical) + dense embeddings (semantic) + Reciprocal Rank Fusion. Finds both exact terms ("Article 6(1)(a)") and intent.
Top-30 candidates are reordered by a cross-encoder for true relevance. Top-8 goes to the LLM.
Regulation > structured > text > image. Fresh (<30d: 1.0) to old (>365d: 0.80). Composite score = rerank × tier × decay.