Skip to main content

Hybrid Search

pgmemory uses a multi-stage search pipeline that combines pgvector cosine similarity with PostgreSQL full-text search for high-quality retrieval.

How the hybrid pipeline works

Query → [Vector Search (pgvector)] + [Full-Text Search (tsvector)] → RRF Fusion → MMR Re-ranking → Top-K

1. Meaning-based search (semantic)

The query embedding is compared against stored knowledge using pgvector's HNSW index with cosine distance:

  • Quality pre-filtering — items with quality_score < 0.05 are excluded (new, unscored items are kept)
  • Source scoping — optional prefix filter on the source field restricts results to specific knowledge sources
  • Oversampling — fetches extra candidates to give the fusion and diversity stages a rich pool

pgmemory uses an HNSW index (m=16, ef_construction=64) for fast approximate nearest neighbor search.

2. Keyword-based search (lexical)

A parallel PostgreSQL full-text search runs using ts_rank and plainto_tsquery against a GIN-indexed tsvector column. This catches exact keyword matches — acronyms, error codes, class names, specific config values — that embedding similarity alone might not rank highly.

3. Reciprocal Rank Fusion (RRF)

The two result lists are combined using RRF with smoothing constant k=60k = 60:

score(d)=L{vector,text}1rankL(d)+k+1\text{score}(d) = \sum_{L \in \{vector, text\}} \frac{1}{\text{rank}_L(d) + k + 1}

RRF is rank-based, not score-based, so it works naturally across the different scoring scales of vector similarity (0-1) and full-text relevance (unbounded). An item ranked highly by both searches scores highest; an item found by only one search still appears.

4. Maximal Marginal Relevance (MMR)

The fused results are re-ranked to maximize diversity:

MMR(d)=λrelevance(d)(1λ)maxdjSsim(d,dj)\text{MMR}(d) = \lambda \cdot \text{relevance}(d) - (1 - \lambda) \cdot \max_{d_j \in S} \text{sim}(d, d_j)

With λ=0.7\lambda = 0.7 — 70% weight on relevance, 30% on diversity. This prevents the top results from being five variations of the same knowledge. Instead, the AI tool gets context from different angles — a deployment procedure, a related debugging insight, and an architecture decision.

Why hybrid?

Consider a developer asking about ERR_CONN_REFUSED. A meaning-based search finds knowledge about connection errors in general — useful, but not specific. A keyword-based search finds the exact item that mentions that precise error code. Hybrid search combines both signals to deliver the best of each.

This matters especially for teams: the shared knowledge store contains a mix of high-level architectural context and specific technical details. Hybrid search surfaces both.

What this means in practice

  • Specific technical details (error codes, config values, API endpoints) are found even when the question is phrased broadly — thanks to full-text search
  • Conceptual knowledge (architecture decisions, design rationale) is found even when the question uses different terminology — thanks to vector search
  • Results are diverse — the AI tool gets context from multiple angles, not five versions of the same thing — thanks to MMR
  • Low-quality noise is filtered out — knowledge that was captured but never proved useful doesn't clutter results — thanks to quality pre-filtering

PostgreSQL indexes

pgmemory creates three indexes on the memories table:

IndexTypePurpose
memories_embedding_idxHNSW (pgvector, cosine)Fast approximate nearest neighbor for vector search
memories_content_ftsGIN (tsvector)Full-text search on content
memories_source_idxB-treeSource prefix filtering

These are created automatically during table migration — no manual setup required.

See Architecture & Design Decisions for the technical rationale behind the specific thresholds and algorithms.