The cheapest part of RAG is the embeddings
Why most RAG systems return crap from a working retrieval pipeline, and the corpus-curation work that actually fixes it.
Why most RAG systems return crap from a working retrieval pipeline, and the corpus-curation work that actually fixes it.
How to build the evaluation pipeline a RAG team will actually keep using, in five pieces, with the loop that prevents abandonment.
A guest post from the SaaSPerform team on the five places production RAG latency actually lives, and why the embeddings are usually fine.