We Compared The Features of 82 RAG Tools: Here's What We Found
Last updated: May 25, 2026
RAG tools look mature at the pipeline layer, but the features that make them operationally useful are still unevenly distributed. We built a dataset of 82 RAG tools, classified every feature with a seven-label availability scheme, and ran the aggregates to figure out what actually matters if you are shipping your own RAG tool.
The dataset spans seven workflow families: RAG application frameworks, managed RAG platforms, knowledge chat applications, retrieval and ranking components, evaluation and observability tools, memory and graph retrieval tools, and document ingestion and parsing tools. For each tool we recorded a comparable feature taxonomy across ingestion, retrieval, grounding, evaluation, memory, and governance, then classified availability based on actual packaging rather than marketing claims.
If you want to compare these RAG tool patterns against proven feature decisions in other markets, our database of 300 profitable internet businesses breaks down what each one shipped, gated, or skipped.
Summary
This study analyzes the feature landscape of 82 RAG tools captured from public product, documentation, pricing, and repository information. The dataset covers RAG application frameworks, managed RAG platforms, knowledge chat applications, retrieval components, evaluation tools, memory and graph retrieval products, and document ingestion tools, with each feature classified by availability status.
Deployment, governance, and enterprise controls are the most widely present feature in RAG tools, appearing in 68 of 82 tools, or 82.9%, which means operational packaging has become the broadest expectation in the category.
Document parsing and citations are tied as the second-most common capabilities, each appearing in 65 of 82 tools, which confirms that ingestion and source-grounding now define buyer expectations for most RAG products.
Core retrieval primitives are close to table stakes in RAG tools. Chunking appears in 64 tools, while embeddings and hybrid search each appear in 60, which means a product without these capabilities reads as partial unless it targets a narrow ingestion or evaluation workflow.
Free-full availability is strongest around lower-level technical primitives. Among tools with embedding and vector index management, 35 of 60 provide it as free full, which suggests the basic retrieval substrate is more open than the application and enterprise layers around it.
Document parsing is widely available but rarely fully free. Of the 65 tools that include parsing, 41 classify it as free limited and only 16 as free full, which means ingestion is common but usually capped, constrained, or tied to deployment mode.
Citations are common but under-specified. They appear in 65 tools, but 26 of those implementations are unclear on availability, which makes source-grounded answers one of the hardest features to compare from public packaging.
Agentic workflows are present in 53 tools but have the highest restricted-access count in the dataset, with 16 restricted implementations, which suggests tool calling is often tied to specific environments, integrations, hosted layers, or beta-style access.
Knowledge graph and memory features remain scarce in RAG tools. Knowledge graph and entity extraction appears in only 21 tools, while memory and personalization appears in 23, which means most products still stop at retrieval rather than persistent knowledge modeling.
RAG evaluation and data connectors both appear in 44 tools, but their packaging differs. Evaluation has a higher paid-only count at 8 of 44, while connectors have a stronger free-limited pattern at 24 of 44, which points to two different monetization mechanics.
Workflow family explains many of the sharpest gaps. Memory and graph retrieval tools have 100% knowledge graph coverage, evaluation tools have 100% evaluation coverage, and document ingestion tools have 100% parsing coverage, which means benchmarking RAG tools without workflow context produces misleading comparisons.
Get the biggest database of
profitable internet businesses
We mapped 300+ proven digital businesses so you can skip the blind trial and error. For each one, you get the site, the revenue numbers, the distribution strategy, the repeatable patterns, and ideas to recreate the model in a different niche, channel, or angle.
Get the full database →The full feature comparison table
We built this dataset from scratch. For each of the 82 RAG tools, we inspected public feature information and recorded the availability of 12 feature categories: document parsing and layout understanding, data connectors and source synchronization, chunking and semantic segmentation, embedding and vector index management, hybrid search and retrieval orchestration, reranking and relevance optimization, citations and source-grounded answers, agentic workflows and tool calling, knowledge graph and entity extraction, memory and personalization, RAG evaluation and quality monitoring, and deployment, governance, and enterprise controls. Each feature was classified with one of seven standardized availability labels. The full comparison table is below.
| Name | Primary Workflow | Business Model | Document parsing and layout understanding | Data connectors and source synchronization | Chunking and semantic segmentation | Embedding and vector index management | Hybrid search and retrieval orchestration | Reranking and relevance optimization | Citations and source-grounded answers | Agentic workflows and tool calling | Knowledge graph and entity extraction | Memory and personalization layer | RAG evaluation and quality monitoring | Deployment, governance, and enterprise controls |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LangChain | RAG application frameworks | Free, pay for advanced features | Free limited | Free full | Free full | Free full | Free full | Free limited | Free limited | Free full | Free limited | Free limited | Free limited | Paid only |
| LlamaIndex | RAG application frameworks | Free, pay for advanced features | Free limited | Free full | Free full | Free full | Free full | Free limited | Free full | Free full | Free limited | Free limited | Free limited | Paid only |
| Haystack | RAG application frameworks | Free, pay for advanced features | Free limited | Free limited | Free full | Free full | Free full | Free full | Free limited | Free full | Absent | Free limited | Free limited | Paid only |
| LightRAG | RAG application frameworks | 100% free | Free limited | Free limited | Free limited | Free full | Free full | Free limited | Free limited | Restricted | Free full | Absent | Absent | Free limited |
| RAGLite | RAG application frameworks | 100% free | Free limited | Absent | Free full | Free full | Free limited | Free limited | Free limited | Free limited | Absent | Absent | Absent | Free limited |
| ragbits | RAG application frameworks | 100% free | Free limited | Free limited | Free full | Free full | Free full | Free limited | Free limited | Free full | Absent | Absent | Free full | Free limited |
| Embedchain | RAG application frameworks | 100% free | Free limited | Free full | Free full | Free full | Free limited | Absent | Free limited | Free limited | Absent | Free limited | Absent | Free limited |
| txtai | RAG application frameworks | 100% free | Free limited | Free limited | Free limited | Free full | Free full | Free limited | Free limited | Free full | Free limited | Free limited | Absent | Free limited |
| R2R | Managed RAG platforms | 100% free | Free full | Free limited | Free full | Free full | Free full | Free limited | Free full | Free full | Free full | Free limited | Free limited | Free full |
| RAGFlow | Managed RAG platforms | Free, pay for advanced features | Free full | Free limited | Free full | Free full | Free full | Free full | Free full | Free full | Free full | Absent | Free limited | Free limited |
| Verba | Knowledge chat applications | 100% free | Free limited | Free limited | Free full | Free full | Free full | Free limited | Free full | Absent | Absent | Absent | Absent | Free limited |
| RAGatouille | Retrieval and ranking components | 100% free | Absent | Absent | Free limited | Free full | Free limited | Free full | Absent | Absent | Absent | Absent | Absent | Free limited |
| FlashRAG | Evaluation and observability | 100% free | Absent | Absent | Free limited | Free full | Free full | Free full | Absent | Absent | Absent | Absent | Free full | Free limited |
| dsRAG | RAG application frameworks | 100% free | Free limited | Absent | Free full | Free full | Free full | Free full | Free limited | Absent | Absent | Absent | Free limited | Free limited |
| AutoRAG | Managed RAG platforms | 100% free | Free limited | Absent | Free full | Free full | Free full | Free full | Unclear | Absent | Absent | Absent | Free full | Free full |
| Dcup | Managed RAG platforms | 100% free | Free full | Free full | Free full | Free full | Free full | Free full | Free full | Unclear | Unclear | Absent | Unclear | Free full |
| Agentset | Managed RAG platforms | Free but limited, subscribe for more | Free limited | Paid only | Free limited | Free limited | Free limited | Free limited | Free limited | Free limited | Absent | Absent | Free limited | Free limited |
| TrustGraph | Memory and graph retrieval | 100% free | Unclear | Restricted | Free full | Free full | Free full | Free full | Free full | Free full | Free full | Free full | Free full | Free full |
| Graphlit | Managed RAG platforms | Free but limited, subscribe for more | Free limited | Free limited | Free limited | Free limited | Free limited | Free limited | Free limited | Free limited | Free limited | Free limited | Unclear | Free limited |
| Ragie | Managed RAG platforms | Free but limited, subscribe for more | Free limited | Free limited | Free limited | Free limited | Free limited | Free limited | Unclear | Restricted | Free limited | Absent | Unclear | Free limited |
| Vectara | Managed RAG platforms | Free trial, then subscription | Paid only | Paid only | Paid only | Paid only | Paid only | Paid only | Paid only | Paid only | Absent | Paid only | Paid only | Paid only |
| Contextual AI | Managed RAG platforms | Pay per use | Paid only | Paid only | Paid only | Paid only | Paid only | Paid only | Paid only | Paid only | Absent | Absent | Paid only | Paid only |
| ZeroEntropy | Retrieval and ranking components | Pay per use | Paid only | Absent | Paid only | Paid only | Paid only | Paid only | Absent | Restricted | Absent | Absent | Paid only | Paid only |
| Nuclia | Managed RAG platforms | Free trial, then subscription | Trial only | Trial only | Trial only | Trial only | Trial only | Trial only | Trial only | Trial only | Trial only | Unclear | Trial only | Trial only |
| Raggenie | Knowledge chat applications | 100% free | Free full | Free full | Free full | Free full | Free limited | Unclear | Unclear | Free full | Absent | Absent | Unclear | Free limited |
| Gurubase | Knowledge chat applications | Free trial, then subscription | Paid only | Paid only | Paid only | Paid only | Paid only | Paid only | Paid only | Paid only | Absent | Paid only | Paid only | Paid only |
| SWIRL | RAG application frameworks | Free, pay for advanced features | Free limited | Free limited | Restricted | Restricted | Free limited | Free limited | Free limited | Free limited | Absent | Free limited | Unclear | Free, pay for advanced features |
| Kotaemon | Knowledge chat applications | 100% free | Free full | Free limited | Free full | Free full | Free full | Free full | Free full | Absent | Absent | Absent | Absent | Free full |
| AnythingLLM | Knowledge chat applications | Free, pay for advanced features | Free full | Free limited | Free full | Free full | Free limited | Unclear | Free full | Free full | Absent | Free full | Absent | Free limited |
| PrivateGPT | Knowledge chat applications | 100% free | Free limited | Absent | Free full | Free full | Free limited | Absent | Unclear | Absent | Absent | Absent | Absent | Free full |
| Quivr | Knowledge chat applications | 100% free | Free limited | Free limited | Free full | Free full | Free limited | Unclear | Unclear | Free limited | Absent | Free limited | Absent | Free full |
| Khoj | Knowledge chat applications | Free, pay for advanced features | Free limited | Free limited | Unclear | Free full | Free limited | Absent | Free limited | Free limited | Absent | Free full | Absent | Free limited |
| DocsGPT | Knowledge chat applications | Free, pay for advanced features | Free limited | Free limited | Unclear | Unclear | Free limited | Unclear | Free full | Free limited | Absent | Unclear | Paid only | Free limited |
| Onyx / Danswer | Knowledge chat applications | Free, pay for advanced features | Free limited | Free limited | Unclear | Free full | Free full | Unclear | Free full | Paid only | Absent | Unclear | Paid only | Free limited |
| CocoIndex | Document ingestion and parsing | Free, pay for advanced features | Free limited | Free full | Free full | Free full | Free limited | Absent | Absent | Restricted | Absent | Absent | Absent | Free limited |
| Chonkie | Retrieval and ranking components | 100% free | Absent | Free limited | Free full | Free limited | Absent | Absent | Absent | Absent | Absent | Absent | Absent | Free full |
| Chunkr | Document ingestion and parsing | Pay per use | Free limited | Absent | Free limited | Absent | Absent | Absent | Free full | Absent | Absent | Absent | Unclear | Paid only |
| pdfmux | Document ingestion and parsing | 100% free | Free full | Absent | Free full | Absent | Absent | Absent | Unclear | Restricted | Absent | Absent | Free full | Free full |
| LlamaParse | Document ingestion and parsing | Pay per use | Free limited | Absent | Paid only | Paid only | Paid only | Absent | Paid only | Restricted | Absent | Absent | Unclear | Paid only |
| Unstructured | Document ingestion and parsing | Pay per use | Paid only | Paid only | Paid only | Paid only | Absent | Absent | Unclear | Absent | Absent | Absent | Absent | Paid only |
| Reducto | Document ingestion and parsing | Pay per use | Free limited | Absent | Paid only | Absent | Absent | Absent | Free limited | Restricted | Absent | Absent | Paid only | Free limited |
| Docling | Document ingestion and parsing | 100% free | Free full | Absent | Free limited | Restricted | Absent | Absent | Unclear | Restricted | Absent | Absent | Absent | Free full |
| Tensorlake | Document ingestion and parsing | Free but limited, subscribe for more | Unclear | Absent | Unclear | Absent | Absent | Absent | Unclear | Free limited | Absent | Absent | Absent | Free limited |
| LandingAI ADE | Document ingestion and parsing | Pay per use | Free limited | Absent | Absent | Absent | Absent | Absent | Free limited | Restricted | Absent | Absent | Unclear | Paid only |
| Marker | Document ingestion and parsing | 100% free | Free full | Absent | Free full | Absent | Absent | Absent | Unclear | Restricted | Absent | Absent | Absent | Restricted |
| PyMuPDF4LLM | Document ingestion and parsing | 100% free | Free full | Absent | Free limited | Absent | Absent | Absent | Unclear | Absent | Absent | Absent | Absent | Absent |
| MinerU | Document ingestion and parsing | 100% free | Free full | Absent | Free limited | Absent | Absent | Absent | Unclear | Absent | Absent | Absent | Absent | Absent |
| MegaParse | Document ingestion and parsing | 100% free | Free full | Absent | Unclear | Absent | Absent | Absent | Unclear | Absent | Absent | Absent | Absent | Absent |
| DocETL | Document ingestion and parsing | 100% free | Free limited | Free limited | Free limited | Absent | Absent | Absent | Unclear | Free full | Absent | Absent | Unclear | Free full |
| Extractous | Document ingestion and parsing | 100% free | Free limited | Absent | Absent | Absent | Absent | Absent | Absent | Absent | Absent | Absent | Absent | Absent |
| Unstract | Document ingestion and parsing | Free, pay for advanced features | Free limited | Free limited | Unclear | Absent | Absent | Absent | Free limited | Free limited | Absent | Absent | Unclear | Paid only |
| ChatDOC PDF Parser | Document ingestion and parsing | Free but limited, subscribe for more | Free limited | Restricted | Unclear | Absent | Absent | Absent | Free limited | Restricted | Absent | Absent | Absent | Unclear |
| Zerox | Document ingestion and parsing | 100% free | Free full | Absent | Absent | Absent | Absent | Absent | Absent | Restricted | Absent | Absent | Absent | Absent |
| OpenParse | Document ingestion and parsing | 100% free | Free full | Absent | Free full | Absent | Absent | Absent | Unclear | Absent | Absent | Absent | Absent | Absent |
| llmsherpa | Document ingestion and parsing | 100% free | Free limited | Absent | Free full | Absent | Absent | Absent | Unclear | Absent | Absent | Absent | Absent | Restricted |
| Ragas | Evaluation and observability | 100% free | Absent | Absent | Absent | Absent | Absent | Absent | Absent | Absent | Absent | Absent | Free full | Free limited |
| TruLens | Evaluation and observability | 100% free | Absent | Absent | Absent | Absent | Unclear | Absent | Unclear | Free limited | Absent | Absent | Free full | Free limited |
| DeepEval | Evaluation and observability | Free, pay for advanced features | Absent | Absent | Absent | Absent | Absent | Absent | Absent | Free full | Absent | Absent | Free full | Paid only |
| zbench | Evaluation and observability | 100% free | Absent | Absent | Absent | Unclear | Free limited | Free full | Absent | Absent | Absent | Absent | Free full | Absent |
| rag-select | Evaluation and observability | 100% free | Absent | Absent | Unclear | Free limited | Free limited | Free limited | Absent | Absent | Absent | Absent | Free full | Absent |
| syftr | Evaluation and observability | 100% free | Absent | Absent | Unclear | Unclear | Free limited | Free limited | Absent | Free full | Absent | Absent | Free full | Unclear |
| Rankify | Retrieval and ranking components | 100% free | Absent | Absent | Absent | Free limited | Free full | Free full | Absent | Absent | Absent | Absent | Free full | Absent |
| FlashRank | Retrieval and ranking components | 100% free | Absent | Absent | Absent | Absent | Free limited | Free full | Absent | Absent | Absent | Absent | Absent | Absent |
| ColBERT | Retrieval and ranking components | 100% free | Absent | Absent | Absent | Free full | Free full | Free full | Absent | Absent | Absent | Absent | Absent | Free limited |
| ColPali | Retrieval and ranking components | 100% free | Free limited | Absent | Absent | Free full | Free limited | Free full | Free limited | Absent | Absent | Absent | Absent | Absent |
| Airweave | Document ingestion and parsing | Free, pay for advanced features | Free limited | Free full | Free limited | Free full | Free full | Unclear | Free full | Free limited | Unclear | Absent | Absent | Unclear |
| Neum AI | Document ingestion and parsing | Free, pay for advanced features | Free limited | Free full | Free full | Free full | Free limited | Unclear | Unclear | Absent | Absent | Absent | Free limited | Unclear |
| Firecrawl | Document ingestion and parsing | Pay per use | Free limited | Restricted | Absent | Absent | Restricted | Absent | Unclear | Restricted | Absent | Absent | Absent | Paid only |
| Unbody | Managed RAG platforms | Free but limited, subscribe for more | Free limited | Restricted | Unclear | Free limited | Free limited | Unclear | Unclear | Free limited | Unclear | Absent | Absent | Paid only |
| Embedbase | Managed RAG platforms | Free, pay for advanced features | Absent | Free limited | Unclear | Free limited | Free limited | Absent | Unclear | Absent | Absent | Absent | Absent | Unclear |
| Zep | Memory and graph retrieval | Free but limited, subscribe for more | Absent | Absent | Absent | Free limited | Free full | Unclear | Free full | Free limited | Free full | Free full | Absent | Paid only |
| Mem0 | Memory and graph retrieval | Free but limited, subscribe for more | Absent | Absent | Absent | Free limited | Free limited | Free limited | Unclear | Free limited | Unclear | Free full | Absent | Paid only |
| Cognee | Memory and graph retrieval | Free but limited, subscribe for more | Free limited | Free limited | Free limited | Free limited | Free limited | Unclear | Unclear | Free limited | Free limited | Free limited | Free limited | Paid only |
| Memori | Memory and graph retrieval | Free but limited, subscribe for more | Absent | Free limited | Absent | Free limited | Free limited | Unclear | Free limited | Free limited | Free limited | Free limited | Paid only | Paid only |
| Graphiti | Memory and graph retrieval | 100% free | Free limited | Free limited | Absent | Free full | Free full | Unclear | Free full | Restricted | Free full | Free full | Unclear | Absent |
| Microsoft GraphRAG | Memory and graph retrieval | 100% free | Free limited | Absent | Free limited | Free full | Free full | Unclear | Free limited | Absent | Free full | Absent | Free limited | Absent |
| WhyHow.AI Knowledge Graph Studio | Memory and graph retrieval | Free, pay for advanced features | Free limited | Free limited | Absent | Free limited | Free limited | Unclear | Unclear | Restricted | Free full | Absent | Absent | Custom priced |
| Haiku.rag | RAG application frameworks | 100% free | Free full | Free full | Free full | Free full | Free full | Free full | Free full | Free full | Absent | Free full | Free limited | Free full |
| Archive Agent | Knowledge chat applications | 100% free | Free full | Restricted | Free full | Free full | Free full | Free full | Unclear | Restricted | Free limited | Absent | Absent | Free full |
| MidrasAI | Retrieval and ranking components | 100% free | Free limited | Absent | Absent | Free full | Free limited | Absent | Absent | Absent | Absent | Absent | Absent | Restricted |
| zchunk | Retrieval and ranking components | 100% free | Absent | Absent | Free full | Absent | Absent | Absent | Absent | Absent | Absent | Absent | Free limited | Absent |
| Hydrot | RAG application frameworks | 100% free | Free limited | Absent | Free full | Free full | Free limited | Absent | Unclear | Absent | Absent | Absent | Absent | Free full |
Building a digital business?
We have mapped 300+ proven internet businesses. You'll get the full breakdown: revenue, distribution, why it works and how to replicate.
GET THE FULL DATABASE → $49Questions on features of RAG tools
These are the questions we kept returning to while building the dataset. They matter if you are deciding which RAG tool features are non-negotiable, which ones can differentiate, which ones to gate, and what to build first.
Which features are commoditized in RAG tools?
The most commoditized features in RAG tools are deployment and governance, document parsing, citations, chunking, embeddings, and hybrid search. All six appear in at least 60 of the 82 tools, which makes them the core feature surface buyers expect to see.
Deployment, governance, and enterprise controls are the clearest category-wide capability, appearing in 68 tools. That does not mean every implementation is generous or complete, but it does mean the market now treats operational controls as part of the RAG product surface.
Document parsing and citations both appear in 65 tools, which is a strong signal that RAG tooling has moved beyond raw vector search. Buyers expect a product to ingest source material and preserve enough traceability to support grounded answers.
Chunking is almost as widespread, appearing in 64 tools. It is especially universal in RAG application frameworks, managed RAG platforms, and knowledge chat applications, where every tool in those three workflow families includes it.
Embedding and vector index management, plus hybrid search and retrieval orchestration, each appear in 60 tools. That pairing is important because it shows that RAG tools are no longer competing on vector search alone, but on the orchestration layer around retrieval.
The builder takeaway is simple: a general-purpose RAG tool that skips parsing, chunking, embeddings, hybrid retrieval, source-grounding, or deployment controls will look incomplete. Exceptions only make sense for narrow products like rerankers, parsers, or evaluation-only tools.
Which features are usually free by default in RAG tools?
In RAG tools, the features most likely to be free by default are embedding and vector index management, chunking, and hybrid search. Embeddings lead the category with 35 free-full cases among 60 present implementations.
Lower-level retrieval primitives carry the strongest free-full posture. Embeddings are free full in 58.3% of present implementations, and chunking is free full in 45.3%, which makes these the most open parts of the RAG stack.
This pattern is strongest in application frameworks and open technical components. LangChain, LlamaIndex, Haystack, RAGFlow, R2R, Kotaemon, and many smaller open-source tools expose core indexing or chunking functionality without a hard commercial gate.
Hybrid search is widely available but more often constrained than embeddings. Among tools with hybrid retrieval, 24 are free full and 28 are free limited, which suggests vendors are willing to expose retrieval orchestration but often cap scale, hosting, or integrations.
Document parsing is free in many tools, but usually not free full. Only 16 of 65 parsing implementations are free full, while 41 are free limited, which makes ingestion a more common freemium surface than a fully open one.
The rule for builders is to make the technical starting loop accessible: ingest a small corpus, chunk it, embed it, and retrieve from it. Charging too early for these primitives fights the category norm, but unlimited scale does not need to be free.
Which features are most often limited, paywalled, or premium-only in RAG tools?
The most gated features in RAG tools are deployment and governance, RAG evaluation, data connectors, and agentic workflows. Governance has the largest paid-only count with 20 paid-only implementations among 68 present cases, while agentic workflows have the largest restricted-access count.
Deployment and governance are both widespread and heavily monetized. Only 14 of 68 implementations are free full, while 20 are paid only, which makes this the strongest broad paywall in the dataset.
RAG evaluation is a smaller but cleaner premium signal. It appears in 44 tools, and 8 of those implementations are paid only, which reflects how monitoring, benchmarking, and quality assurance become valuable once a RAG system is in production.
Data connectors use a softer gate. Connectors appear in 44 tools, but only 9 are free full, while 24 are free limited and 5 are restricted. That means vendors often gate the number of sources, the sync frequency, the hosted connector layer, or the enterprise integrations.
Agentic workflows have the most unusual gating profile. They appear in 53 tools, but 16 are restricted, which is more than any other feature. In practice, tool calling is often available only in specific deployment modes, hosted products, integrations, or advanced workflow environments.
Paid-only and restricted gates also cluster by workflow. Memory and graph retrieval tools make deployment and governance paid only in 5 of 6 present cases, while document ingestion tools make it paid only in 6 of 17 present cases.
If you want to see how premium features are gated outside RAG tools, our database of 300 profitable internet businesses shows what companies chose to keep free, limit, or paywall.
Which features are still strong differentiators in RAG tools?
The strongest differentiators in RAG tools are knowledge graph and entity extraction, memory and personalization, RAG evaluation, and agentic workflows. They either sit below 65% penetration or have unusually fragmented access, which makes them more useful as positioning signals than basic pipeline features.
Knowledge graph and entity extraction is the cleanest differentiator because it appears in only 21 of 82 tools. When a product like Graphiti, Microsoft GraphRAG, TrustGraph, Cognee, or WhyHow.AI leans into graph retrieval, it is signaling a different architecture from standard vector-first RAG.
Memory and personalization is similarly differentiating, appearing in 23 tools. It is not heavily paywalled when present, but its scarcity means that products with persistent user, entity, or conversation memory can stand apart from generic document chat products.
RAG evaluation is a differentiator because it changes the buyer from a builder experimenting with retrieval to a team operating quality-sensitive systems. Evaluation appears in 44 tools overall, but it reaches 100% inside evaluation and observability tools.
Agentic workflows sit in a middle zone: common enough to matter, but restricted enough to differentiate. A product that makes tool calling broadly usable can stand out from products that mention agents but limit access to specific deployment or partner conditions.
The best differentiators are not random add-ons. They extend the RAG system from retrieval into reasoning over entities, persistent context, production quality, or workflow execution.
If you are trying to figure out what makes a product genuinely different in its category, our database of 300 proven internet businesses shows how companies turned specific features into differentiation.
Stop testing random ideas
Start from proof. 300+ profitable internet businesses, mapped, broken down, and ready to copy, in one searchable database.
STEAL WHAT WORKS → $49Which features are rarely offered in RAG tools?
The rarest features in RAG tools are knowledge graph and entity extraction, memory and personalization, and RAG evaluation when viewed outside specialist workflows. Knowledge graph features appear in only 25.6% of tools, and memory appears in only 28.0%.
Knowledge graph and entity extraction is rare because most RAG tools still treat retrieval as document or chunk retrieval. Only 21 tools include graph or entity extraction, and 8 of those sit in the memory and graph retrieval workflow.
Memory and personalization is also rare, with 23 present cases. Knowledge chat applications include it in 6 of 11 tools, but document ingestion, retrieval components, and evaluation tools largely avoid it.
RAG evaluation looks moderately common overall, appearing in 44 tools, but the distribution is highly uneven. It is universal in evaluation and observability tools, yet appears in only 4 of 11 knowledge chat applications and 3 of 9 retrieval and ranking components.
Data connectors are not rare at the category level, but they are rare in several technical workflows. Evaluation tools have zero connector coverage, and retrieval and ranking components include connectors in only 1 of 9 cases.
The important reading rule is that rarity in RAG tools is often workflow-driven. A feature can be rare across the full dataset and still be table stakes inside a narrow sub-category.
Which missing features create the biggest opportunity in RAG tools?
The biggest feature opportunities in RAG tools sit where established workflow boundaries leave obvious gaps: memory in managed RAG platforms, knowledge graphs in knowledge chat applications, and evaluation in document ingestion tools. These gaps are large enough to create differentiated products without requiring a completely new category.
Managed RAG platforms are strong across the core stack, but only 4 of 12 include memory and personalization. A managed platform that combines hosted ingestion, retrieval, evaluation, and persistent memory would occupy a more complete application infrastructure position.
Knowledge chat applications have full coverage across parsing, chunking, embeddings, hybrid search, citations, and deployment, but only 1 of 11 includes knowledge graph and entity extraction. That creates room for chat products that understand entities and relationships rather than only retrieving source chunks.
Document ingestion tools have universal parsing and strong chunking coverage, but only 8 of 23 include RAG evaluation. A parser that measures downstream retrieval quality could move from an ingestion utility into a quality-sensitive RAG infrastructure product.
Retrieval and ranking components have strong embeddings and hybrid search coverage, but no memory or graph coverage in this dataset. That absence creates an opening for retrieval components that optimize not only relevance, but also entity continuity and user context.
The opportunity pattern is not to add every missing feature everywhere. It is to find the adjacent feature that buyers already expect in the next workflow over, then bring it into a product where competitors still treat it as out of scope.
If you want to spot feature gaps buyers may pay to close, our internet business database surfaces the same build-versus-skip patterns across 300 different markets.
What should be free versus paid in RAG tools?
In RAG tools, the free layer should cover the basic RAG loop: parsing a limited corpus, chunking it, embedding it, retrieving from it, and producing grounded answers. The paid layer should concentrate around scale, connectors, evaluation, governance, hosted operations, and advanced agentic workflows.
The basic RAG loop is where buyers expect hands-on exploration. Chunking, embeddings, and hybrid search all have large free-full or free-limited footprints, so hiding them completely behind a paywall makes a new tool harder to evaluate.
Parsing should be free enough to prove quality, but not unlimited. The dataset shows parsing as free limited in 41 of 65 present implementations, which makes usage caps, page caps, file-size limits, and hosted-processing limits normal for the category.
Connectors are safe to limit because they create operational value beyond the core algorithm. A small set of free connectors can reduce friction, while enterprise sources, sync controls, or managed connector infrastructure can sit behind paid plans.
Evaluation belongs closer to paid once the product moves from prototype to production. Since 8 of 44 evaluation implementations are paid only, the category already treats monitoring, quality gates, and advanced benchmarking as monetizable.
Governance is the clearest paid surface. Deployment, governance, and enterprise controls have 20 paid-only implementations, so buyers already expect security, admin, compliance, and managed deployment depth to map to commercial plans.
Looking for a profitable business idea?
Get our database of 300+ profitable internet businesses, mapped, broken down, and ready to copy.
STEAL WHAT WORKS → $49Which features make users upgrade to paid plans in RAG tools?
Users upgrade in RAG tools when they move from proving retrieval quality to operating a real knowledge system. The strongest upgrade triggers are governance, managed deployment, data connectors, evaluation, and scaled document parsing.
Governance creates the clearest upgrade path because it maps directly to organizational risk. Security controls, admin controls, hosted deployment, and enterprise packaging are paid-only in 20 of the 68 tools that include them.
Connectors trigger upgrades when the buyer needs real data rather than demo documents. Since 24 of 44 connector implementations are free limited, the free tier often proves the workflow while paid plans unlock more sources, sync volume, or enterprise integrations.
Evaluation drives upgrades after the system starts affecting real users. RAG quality monitoring, test suites, and observability become more valuable once teams need to compare retrieval changes and catch regressions.
Parsing upgrades are usually tied to volume and document complexity. LlamaParse, Unstructured, Reducto, LandingAI ADE, and similar ingestion-focused products show how document processing can become a paid infrastructure layer rather than a simple free utility.
Agentic workflows can also drive upgrades, but usually through access conditions rather than simple pricing. Their 16 restricted implementations suggest vendors use deployment mode, integrations, or advanced environments as part of the upgrade mechanic.
If you are building a RAG product and designing upgrade paths, our database of 300 proven internet businesses includes SaaS examples showing exactly which features companies gated at upgrade.
What should the MVP of a RAG tool include and what should it skip?
The MVP of a general RAG tool should include document ingestion, chunking, embeddings, hybrid retrieval, citations, and basic deployment controls. It should skip knowledge graphs, persistent memory, advanced evaluation, and broad connector depth unless one of those is the workflow anchor.
The non-negotiable MVP surface is the core RAG loop. Parsing, chunking, embeddings, hybrid search, and citations all appear in at least 60 tools, so omitting one creates a visible gap for a general-purpose product.
Basic deployment controls belong in the MVP because the feature is present in 82.9% of the dataset. Even a developer-focused tool needs a credible story for configuration, deployment, or governance, even if the advanced controls come later.
The MVP should include only the workflow anchor that matches the product. A document ingestion tool needs excellent parsing. An evaluation tool needs quality monitoring. A memory and graph retrieval product needs graph or memory depth. A retrieval component needs ranking performance, not a chat UI.
Knowledge graph and memory features should usually be skipped at launch for a broad RAG product. Their penetration is low enough that they differentiate, but not high enough to be required for a first credible version.
Broad connector coverage should also wait unless the product is a managed platform or knowledge chat application. Connectors are table stakes in those workflows, but they add operational complexity that can distract from proving retrieval quality.
If you want to see what an MVP looks like across businesses that actually shipped and grew, our database of 300 profitable internet businesses lets you compare launch scopes across markets.
Get the biggest database of
profitable internet businesses
We mapped 300+ proven digital businesses so you can skip the blind trial and error. For each one, you get the site, the revenue numbers, the distribution strategy, the repeatable patterns, and ideas to recreate the model in a different niche, channel, or angle.
Get the full database →What are other interesting feature patterns in RAG tools?
Beyond the headline patterns, RAG tools have several quieter feature dynamics that reveal how the category bundles, hides, and monetizes capabilities.
Citations have one of the largest marketing-versus-packaging gaps in RAG tools. The feature appears in 65 tools, but 26 of those implementations are unclear, which means vendors often claim grounding without making plan-level access obvious.
Reranking has a similar clarity problem. It appears in 50 tools, but 14 of those implementations are unclear, and the ambiguity is especially visible in knowledge chat, memory and graph retrieval, and document ingestion workflows.
Document ingestion tools are more application-adjacent than their category label implies. They have 20 of 23 citation-enabled cases and 17 of 23 governance-enabled cases, even though embeddings and hybrid search are much less common in that workflow.
Memory and graph retrieval tools are not uniformly memory-first. All 8 include knowledge graph features, embeddings, hybrid search, and citations, but only 6 include memory and personalization, which shows that graph retrieval and memory are related but not identical product promises.
Evaluation and observability tools are narrower than they first appear. They all include RAG evaluation, but none include document parsing or data connectors, which makes them add-on infrastructure rather than end-to-end RAG platforms.
Insights
We collected and analyzed the features of 82 RAG tools, then read the aggregates as a full market map rather than as isolated feature counts. These are the higher-order patterns that emerge from the dataset.
- Workflow is the strongest predictor of feature presence in RAG tools. The same feature can be universal in one workflow and nearly absent in another, which means category-wide averages are useful only after the workflow boundary is clear.
- RAG tools split into four broad feature archetypes: open frameworks, managed platforms, specialist components, and operational products. Each archetype has a different default packaging logic, so copying another tool's pricing only works when the workflow matches.
- The RAG tools market is open at the primitive layer and commercial at the operational layer. Embeddings, chunking, and hybrid search show strong free availability, while governance, evaluation, connectors, and hosted deployment carry stronger paid or limited patterns.
- Source-grounding is widely advertised but poorly packaged in RAG tools. Citations have high penetration and high unclear availability, which makes them a trust signal in marketing but a due-diligence problem in buying.
- The clearest feature boundary in RAG tools is not between open source and commercial products. It is between products that stop at retrieval and products that extend into memory, graph structure, evaluation, or operations.
- Restricted access acts as a hidden monetization layer across RAG tools. Agentic workflows and connectors are often gated by integration, deployment mode, hosted environment, or partner access, not only by price.
- Document ingestion has become its own RAG tools sub-market rather than a simple preprocessing step. The strongest ingestion products package parsing, chunking, citations, and governance, even when they do not own retrieval end to end.
- Memory and graph retrieval tools prove that advanced retrieval still depends on basic retrieval infrastructure. Their full coverage of embeddings and hybrid search shows that graph and memory products add layers rather than replace the retrieval stack.
- Evaluation is the most production-coded feature family in RAG tools. It does not need to be universal at prototype time, but once teams operate RAG systems, monitoring and quality measurement become natural paid expansion surfaces.
- Feature scarcity in RAG tools often signals strategic focus, not immaturity. A parser without memory or an evaluator without connectors can be coherent when the workflow is narrow, but incoherent when the product claims to be an end-to-end RAG platform.
Methodology
We analyzed 82 RAG, retrieval, document-ingestion, knowledge-chat, memory/graph, and evaluation tools based on publicly available information from their homepages, documentation, feature pages, product pages, GitHub repositories, and pricing pages.
We include tools whose primary value proposition is to help developers or teams build, manage, evaluate, or optimize retrieval-augmented generation systems, including document ingestion, chunking, embeddings, vector search, retrieval pipelines, grounding, citation, knowledge connectors, and RAG evaluation.
We exclude generic vector databases, AI chatbots, search tools, knowledge bases, LLM app platforms, document management tools, and data pipelines unless RAG system development or operation is a central advertised feature. For ambiguous tools, we include them only if the product is clearly used to connect LLMs to external knowledge through retrieval, not merely to store data, chat with documents, or search content.
The final dataset contains 82 tools. The goal of the dataset is not to capture every marginal open-source experiment, internal library, or small regional product, but to represent the most visible, relevant, and commercially meaningful products in the category. A small number of niche, newly launched, deprecated, or lightly documented tools may have been missed, but the sample is designed to support a rigorous market-level comparison of the tools most likely to appear in buyer, builder, or competitive research.
The RAG and retrieval tooling market contains many overlapping features, often described with inconsistent terminology across vendors. To make the analysis readable and comparable, we grouped product capabilities into 12 broader feature categories: document parsing and layout understanding; data connectors and source synchronization; chunking and semantic segmentation; embedding and vector index management; hybrid search and retrieval orchestration; reranking and relevance optimization; citations and source-grounded answers; agentic workflows and tool calling; knowledge graph and entity extraction; memory and personalization; RAG evaluation and quality monitoring; and deployment, governance, and enterprise controls.
This categorization avoids two common problems: treating every vendor-specific phrase as a separate feature, which would make the analysis too fragmented, and using overly broad buckets, which would obscure meaningful differences between products. The resulting structure provides a market-level view while preserving enough specificity to identify differences in product positioning, monetization, and maturity.
For each feature, we applied a standardized availability label based on information published by each vendor. Absent means the feature is not available, or does not appear to be available, based on public information. Free full means the feature is available for free without meaningful usage, volume, functionality, or access limits. Free limited means the feature is available for free, but with usage limits, volume caps, reduced functionality, limited integrations, self-hosting requirements, or other meaningful constraints.
Paid only means the feature is available only through a paid plan, paid API, paid cloud service, enterprise plan, or custom-priced commercial agreement. Trial only means the feature is available only during a free trial or temporary evaluation period. Restricted means the feature depends on a specific integration, deployment mode, partner, region, API key, hosted environment, beta program, or other conditional access requirement. Unclear means the feature appears to be present, but public information does not clearly indicate whether it is free, paid, trial-based, limited, or restricted.
When public information was incomplete or ambiguous, we avoided inferring availability beyond what could reasonably be supported by the vendor's own materials. In those cases, we used the Unclear label rather than assuming that a feature was free, paid, fully available, or commercially restricted.
For each feature, we calculated two types of metrics. First, we measured feature coverage: the number and percentage of tools in the dataset where the feature appears to be available. Second, among the tools where the feature appears to be available, we measured the distribution of availability labels: free full, free limited, paid only, trial only, restricted, and unclear. We also reviewed the same patterns by primary workflow category to identify differences between frameworks, managed platforms, knowledge-chat applications, retrieval components, document-ingestion tools, memory/graph products, and evaluation tools.
To keep the analysis consistent, percentages for feature coverage are calculated against the full dataset of 82 tools, while percentages for pricing and access labels are calculated only among tools where the feature appears to be present. This distinction prevents absent features from distorting the interpretation of how available, limited, or monetized a feature is among vendors that actually offer it.
Building a digital business?
We have mapped 300+ proven internet businesses. You'll get the full breakdown: revenue, distribution, why it works and how to replicate.
GET THE FULL DATABASE → $49
Who wrote this?
STEAL WHAT WORKS TEAM
We study profitable internet businesses, take them apart, and write down what actually works: pricing, distribution, growth, packaging. We turn 300+ proven examples into a database so founders can stop testing random ideas and start from proof. Explore the database →