Infrastructure for Extended Scale

Four fundamental capabilities for building persistent intelligence. Each designed to work independently or as an integrated system.

01/ Qdrant Client

Managed Qdrant connection with pooling, batch upserts, and collection management. Production-ready with zero boilerplate.

02/ Matryoshka Embeddings

Generate adaptive-dimension vectors from any text. Run search at 128d for speed or 1536d for precision — same corpus, no re-indexing required.

03/ Hybrid Retrieval

Dense vector search combined with sparse keyword matching in a single API call. Semantic recall with keyword precision — no dual-pipeline setup.

04/ Provenance Tracking

Every vector traces back to its source document, chunk, and timestamp. Auditable RAG pipelines out of the box — no extra tooling required.

engram · demo

01/ Qdrant

02/ Embeddings

03/ Retrieval

04/ Provenance

Infrastructure created to scale without limits or compromise

From single instance to distributed cluster — zero config horizontal scaling.

01/

Vectors

High-dimensional embeddings stored with Matryoshka encoding — full resolution when you need it, compressed when you don't.

02/

Indexes

HNSW indexes with multi-tenant isolation. Each agent's memory space stays separate, clean, and fast.

03/

Queries

Semantic search with oversampling and reranking built in. Better recall, lower noise, one API call.

DOCUMENTATION

Frequently Asked Questions

Everything you need to run high-performance vector retrieval in production.

01/ What is Extensa? +

Extensa is a composable vector infrastructure layer for AI memory systems. It handles Matryoshka embeddings, binary quantization, and multi-stage retrieval — giving you production-grade vector search without building the plumbing yourself.

02/ How do I install it? +

npm install @cartisien/extensa

Full TypeScript support. Use the memory adapter for testing, Qdrant adapter for production.

03/ Quick start — how does it work? +

import { createExtensa } from '@cartisien/extensa';

const extensa = createExtensa({ adapter: 'memory' });
await extensa.connect();

// Create a collection
await extensa.createCollection('memories', 768, 'cosine');

// Store vectors
await extensa.store('memories', [
  { id: 'm1', vector: embedding, payload: { text: 'Hello' } }
]);

// Search
const results = await extensa.search('memories', queryVector, { limit: 5 });
await extensa.disconnect();

04/ What are Matryoshka embeddings? +

Matryoshka Representation Learning (MRL) trains embeddings so that truncated versions remain semantically meaningful. Extensa uses full vectors for precision and truncated vectors for speed — letting you tune the accuracy/latency tradeoff without re-embedding.

05/ What is binary quantization and why does it matter? +

Binary quantization compresses float32 vectors to single bits — a 32x memory reduction. Extensa uses a multi-stage pipeline: oversample with quantized vectors, then rescore with full precision. You get near-full accuracy at a fraction of the memory footprint.

06/ What adapters are supported? +

Memory adapter — in-process, zero dependencies, ideal for testing and prototyping. Qdrant adapter — production-grade, supports filtering, payloads, and collection management. Swap adapters without changing application code.

08/ How do I get early access? +

The packages are available on npm — install and use them today. Source is private. The early access form gets you hosted API credentials, a provisioned environment, and integration docs sent within 24 hours. Request access →