Mastra RAG
The simplest way to build a full agent RAG pipeline in TypeScript
Mastra handles the complete RAG pipeline and enhances LLM outputs by incorporating relevant context from your own data sources. Retrieval without a standardized pipeline means stitching together chunking, embedding, storage and retrieval across separate tools. Mastra covers the full pipeline and gives agents accurate, grounded responses.
Build your full RAG pipeline
Mastra provides standardized APIs for every step of the RAG pipeline in a single framework. Chunk documents using recursive or sliding window strategies, generate embeddings, store them in your preferred vector database and retrieve relevant context at query time. Mastra includes observability for tracking embedding and retrieval performance.
Embed, query and rerank
Transform document chunks into vector embeddings using your preferred embedding model, retrieve semantically similar chunks from your vector store at query time and rerank retrieved results for more accurate and context-aware responses
Give agents knowledge of your data
When agents need answers grounded in your data, Mastra RAG incorporates relevant context from your own sources into every LLM response. Control how documents are chunked, choose your embedding strategy and store vectors in the database you prefer. Use filters for precise retrieval tailored to your pipeline.
Advanced RAG techniques
Mastra goes beyond standard retrieval with context engineering techniques for more accurate, grounded responses. ReAG enables models to reason directly over your documents rather than retrieving pre-embedded chunks. Graph RAG and agentic RAG extend context engineering with structured knowledge and agent-driven retrieval.
Reasoning-Augmented Generation
Reason directly over documents for more accurate, context-aware answers with ReAG
Advanced Context Engineering
Shape context using memory, history and RAG for results grounded in real information
Frequently asked questions
How does RAG work in Mastra?
Mastra RAG enhances LLM outputs by incorporating relevant context from your own data sources. The pipeline chunks documents, generates embeddings, stores them in a vector database and retrieves relevant context at query time. Mastra provides standardized APIs for each step.
What vector databases does Mastra RAG support?
Mastra RAG supports multiple vector stores including pgvector, Pinecone, Qdrant and MongoDB. Configure your preferred vector database in the storage layer of your RAG pipeline. Mastra's standardized APIs work consistently across all supported vector stores.
What chunking strategies does Mastra RAG support?
Mastra RAG supports multiple document chunking strategies including recursive and sliding window approaches. Documents can be enriched with metadata during chunking. Configure chunk size and overlap to optimize embedding quality and retrieval accuracy.
How does Mastra RAG integrate with agents?
Mastra RAG gives agents access to relevant context from your own data sources at query time. Connect your vector store to an agent and Mastra retrieves semantically similar chunks to include in the LLM prompt, grounding agent responses in real information.
What is ReAG and how does Mastra support it?
Mastra supports Reasoning-Augmented Generation, or ReAG, which enables models to reason directly over your documents rather than retrieving pre-embedded chunks. Mastra also supports Graph RAG and agentic RAG for more advanced context engineering when standard retrieval does not provide sufficient accuracy.