May 1, 2025
Combining Traditional Keyword Logic with Cutting-Edge Vector Text Embedding Technology
Why hybrid search—combining traditional keyword logic with cutting-edge vector technology—has become essential for any data-driven product.
You know the moment: you punch “latest Volvo electric SUV safety reviews” into a site search, and—despite the fact that you know the documents exist—you’re staring at page three of irrelevant hits. Classic keyword search has failed you, yet pure “AI” search often misses the exact phrase you needed. The fix isn’t more synonyms or a bigger model. It’s teaching your stack to think in both words and meaning at the same time.
Welcome tohybrid search—the strategy that lets your application find the right document whether the user’s wording is precise, sloppy, or somewhere in between. In this thought-leadership tour we’ll unpack why hybrid search matters, how it actually works, and what trade-offs you face when you implement it in popular platforms like PostgreSQL + pgvector versus MongoDB Atlas.
To learn more about improving search and RAG for generative AI to find those needles in the haystack, check out this article on vector search, reranking and BM25.
Why We Outgrew “Either-Or” Search
Traditional Full-Text Search (FTS) engines excel when the query contains the exact tokens stored in your inverted index. They use ranking formulas such as BM25 (think: TF-IDF on steroids) to reward documents that repeat rarer terms in just the right proportion.
But FTS is lexically blind: switch “auto” to “car,” “regulation” to “compliance,” or change word order, and recall collapses.
Modern vector (semantic) search flips the script. Large Language Models (LLMs) embed whole sentences into a “dense” vector space where meaning dictates proximity. Ask for “ways to cross water” and a vector search happily proposes “ferry,” “boat,” and “canoe”—even if none of those words appear in your query.
Yet vectors struggle with needle-in-a-haystack keywords—serial numbers, medical codes, part IDs—that never appeared in the model’s training data.
Hybrid search fuses these strengths. You run two (or three) retrieval routes, then blend the ranked lists so users see answers that are both semantically on-topic and lexically precise.
Quick Jargon Decoder
Term | What It Really Means |
---|---|
BM25 | A ranking formula that balances term frequency and rarity. |
FTS (Full-Text Search) | Classic keyword search using an inverted index. |
Embedding | A numeric vector representing meaning. |
ANN (Approximate Nearest Neighbor) | Algorithms (e.g., HNSW, IVFFlat) that find vectors close—but not exactly nearest—fast. |
Sparse vs. Dense | Sparse vectors are huge but mostly zeros (great for word presence); dense vectors are smaller arrays of floats (great for meaning). |
RAG (Retrieval-Augmented Generation) | An LLM workflow that pulls external facts before answering. |
HyDE | “Hypothetical Document Embeddings”: ask an LLM to draft a guess first, then embed that guess to improve retrieval. |
Lexical Search 101: Why BM25 Still Pays the Bills
Full-text search works because of an inverted index: a map from each term to the documents that contain it. BM25 then scores matches by:
- TF (Term Frequency): how often the term appears.
- IDF (Inverse Document Frequency): how rare the term is.
- Length normalisation: prevents 10-page docs from always winning.
On PostgreSQL you store tokens in a tsvector
column, index it with GIN, and query via the @@
operator. MongoDB Atlas Search, OpenSearch, and Elastic all rely on Lucene’s implementation. Strengths: bullet-proof for IDs, SKUs, or strict phrasing. Weaknesses: synonyms kill it; context is invisible.
Semantic Search 101: Embeddings & ANN
Dense embeddings collapse a sentence like “Flying insects can pollinate crops” into, say, 768 floating-point numbers. Similar sentences cluster nearby. To search efficiently you build an ANN index such as HNSW; think of it as a tiny social network of vectors where you “hop” toward closest friends. Strengths: handles paraphrases, typos, and concept drift. Weaknesses: may skip over rare but crucial tokens; black-box scoring.
Two Routes, One Result List
The secret sauce isn’t running two searches—it’sfusing their rankings.
Reciprocal Rank Fusion (RRF)
A dead-simple trick: if a document is rank r in a result set, give it a score 1 / (k + r)
(with k≈60). Sum the scores from both lists. RRF ignores raw scores (which live on different scales) and rewards anything that cracks the top 100 of either search.
Relative Score Fusion (RSF)
Normalise each score to 0-1, weight them (e.g., 70 % vector, 30 % keyword), and add. RSF lets you emphasize precision or recall, but you must tame outliers.
Cross-Encoder Rerankers
Once you have your top 100, pass each (query, document) pair to a heavier BERT-style model that reads both together and re-orders again. Cohere Rerank, OpenAI /moderations
, or Hugging Face cross-encoder checkpoints plug in here. This step is optional but often lifts precision another 5-15 %.
Choosing Your Stack
PostgreSQL + pgvector
- Pros: ACID guarantees; SQL; easy to join search results with relational data.
- pgvector now supports HNSW, plus
sparsevec
for learned sparse models.
- pgvector now supports HNSW, plus
- Gotchas: You write two queries (FTS and vector) then DIY fusion—often in a CTE or in the app layer.
- Scaling = vertical first; sharding takes elbow grease.
Mini-Snippet
WITH kw AS (
SELECT id, ts_rank_cd(search_vector, q) AS score,
row_number() OVER (ORDER BY ts_rank_cd DESC) AS r
FROM docs, to_tsquery('english', :query) q
WHERE search_vector @@ q
LIMIT 100
),
vec AS (
SELECT id, 1 - (embedding <=> :query_vector) AS score,
row_number() OVER (ORDER BY embedding <=> :query_vector) AS r
FROM docs
ORDER BY embedding <=> :query_vector
LIMIT 100
)
SELECT id,
SUM(1.0 / (60 + r)) AS rrf_score
FROM (SELECT * FROM kw UNION ALL SELECT * FROM vec) x
GROUP BY id
ORDER BY rrf_score DESC
LIMIT 20;
MongoDB Atlas Search + Vector Search
- Pros: One aggregation pipeline; Atlas handles RRF or RSF for you.
- Sharding and index builds are managed.
- Gotchas: Eventual consistency; BSON document model may not suit relational joins.
Pipeline Sketch
db.collection.aggregate([
{
$vectorSearch: { queryVector: qVec, path: "embedding", k: 100 }
},
{
$search: {
index: "text",
text: { query: qText, path: ["title", "content"] }
}
},
{
$reciprocalRankFusion: { k: 60, weights: { vector: 2, text: 1 } }
},
{ $limit: 20 }
]);
Hybrid Search Supercharges RAG (and HyDE)
RAG pipelines live or die oncontext quality. If your retriever misses just one key paragraph, the LLM hallucinates or hedges. Hybrid search:
-Boosts recall: Vector path finds relevant-but-differently-worded docs. -Anchors facts: Keyword path guarantees IDs, names, and numbers appear.
WithHyDE, you can double-dip:
- Ask the LLM to draft a hypothetical answer.
- Embed that draftandmine its key nouns as a keyword query.
- Run hybrid search, fuse, rerank, then feed the top snippets back to the LLM.
Teams report 2-3 × reduction in hallucinations versus single-route retrieval.
Future Directions
-Learned sparse models(e.g., SPLADE, ELSER) promise lexical precision without classic BM25, blurring the line between sparse and dense. -Unified indicesthat store both vector and token payloads in one HNSW graph are in active research—a holy grail for latency. -On-the-fly query rewriting(LLM suggests synonyms) may shrink the need for deep fusion tuning.
But for 2025 roadmaps, the pragmatic recipe is clear:two routes + smart fusion + (optional) reranker. It’s battle-tested, open-source, and deploys on anything from a single Postgres replica to a global MongoDB cluster.
Parting Thoughts
Hybrid search isn’t another fad feature—it’s a recognition that language is messy and users are unpredictable. By letting your system thinkin exact words and in concepts, you deliver the search experiences people secretly expect—those moments where the first hit feels like mind-reading.
If your product ships any kind of search box—and especially if you plan to bolt on RAG—you owe it to your users (and your future sanity) to make hybrid the new default.
Because finding the right answer should feel magical, not lucky.
About the Author
Rick Hightower is a software architect and technology enthusiast specializing in search technologies and distributed systems. With extensive experience in implementing enterprise-scale search solutions, Rick combines practical engineering insights with a deep understanding of information retrieval theory.
When not exploring the latest developments in vector search and natural language processing, Rick can be found mentoring junior developers and contributing to open-source projects in the search ecosystem.
Connect with Rick: LinkedIn | GitHub
TweetApache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting