Running Notes on RAG

RAG is at the heart of many AI deployments going on today, and pershaps my own interest in the topic, and engaging with it, I see many articles, and new items related to it.

These are some of the primary aspects of improving RAG:

Semantic understanding of documents, including applying Visual LLMs to make sense of embedded pictures, diagrams and other non-textual content
Chunking
Embedding
Having a good evaluation (“eval”)
Not ignoring search as the first line of attack; I find Joe Bergum a reliable authority in understanding search better. Most recently, he said - Search becomes the infrastructure for agents, agents are the new users.

RAG Companies ^fb946e

vectify-ai
https://jina.ai - who do work on Embedding, Visual Document Retrieval (JinaVDR), ReaderLM etc
https://doctly.ai — focussing on extracting data from PDFs. I have spoken to their co-founder Ali Basiri, who suggested two methods to improve my own RAG work — split pages horizontally, send the image of the page to OpeAI, who are quite good at extracting content from visual elements.
Vectify.AI/PageIndex “Beyond Semantic Similarity and Vector Search Reasoning-based RAG with PageIndex”
Datalab | State of the art document intelligence models

If you think of RAG as an Information Architecture problem, then there are ways to improve the value without changing the model:

Audit user queries to look for higher frequency searches that are returning irrelavent results, or zero-result queries, and repeated questions in the face of such results (frustration).
Cluster unmet needs - what is common among failed queries? internal docs, product metadata?
Fix the inventory (of documents indexed)
Tighten the feedback loop by tracking query success/failure, satisfaction, and document retrieval coverage.

The pain of chunking and bad context retrieval can be mitigated by using better search tools which will provide the context needed by the foundation models. RAG isn’t dead, just the context that is being provided is hindered by not getting chunking etc right.

To quote Jo Bergum

A significant portion of the rag is dead movement is imho related to the pain of chunking and how bad the vector databases are at managing the overall context of the source document. Splitting data across rows and records and reconciliation at run time never scales, plus you need to manage that state in case the source document changes. — via

RAG Explorer by Respeak “An interactive journey through the Retrieval Augmented Generation pipeline — explore each component, understand pitfalls, and master implementation.”. Has a good step step by explanation of how various phases of a RAG system flow from one to another.

Start thinking LLM as infrastructure. Lean into integrating LLM into scalable system design much like any other important piece. Remember:

RAG is the evolution of search.

Search is the natural abstraction for augmenting AI with moving context — — Jo Bergum

90% of RAG deployments are running with chunk and embed using 3.5 with 8K context window

(on why embedding seems to be disfavored these days) … Part is that many were mislead in the late 2022 days to think that embedding retrieval was the only way to do RAG and that chunk-and-embed is a pain to manage.

Task specific sub agents with their own set of tools and context window is a strong trend

Armand Ruiz on LinkedIn mentioned, IBM Granite; I have to investigate if I can plug this into the RAG pipeline to extract content from tables, charts, and diagrams for structured data analysis.

Elysia: Building an end-to-end agentic RAG app | Weaviate; Note: elysia has a hard dependency on Weaviate cloud.

what if your AI could dynamically decide not just what to say, but how to show it? What if it could learn from your preferences, intelligently categorize, label, and search through your data, and provide complete transparency into its decision-making process?

Few things that make Elysia interesting

Decision Trees and Decision Agents
Displaying data sources in dynamic formats
an automatic expert on your data — “an LLM analyzes your collections to examine the data structure, create summaries, generate metadata, and choose display types. This isn’t just useful information for users to see - it significantly enhances Elysia’s ability to handle complex queries and provide knowledgeable responses”
Feedback system — “Each user maintains their own set of feedback examples stored in within their Weaviate instance. When you make a query, Elysia first searches for similar past queries you’ve rated positively using vector similarity matching.”
Elysia serves content through Static HTML — this pleases me.
Multi Model Strategy — “routes different tasks to appropriate model sizes based on task complexity”
DSPy serves as the LLM interaction layer.

Another Summary of Elysia by X/victorialslocum

—

Building Graph RAG Pipelines with Kuzu, DSPy and marimo - YouTube by Prashanth Rao

—

run-llama/semtools: Semantic search and document parsing tools for the command line

parse - Parse documents (PDF, DOCX, etc.) using, by default, the LlamaParse API into markdown format. Parse uses Llamaparse ( Cloud hosted). todo how to parse using a different model?
search - Local semantic keyword search using multilingual embeddings with cosine similarity matching and per-line context matching

btbytes.com

Running Notes on RAG

RAG Companies ^fb946e

Graph View

Backlinks