Implementing hybrid search: combining semantic and keyword retrieval

Semantic search transformed information retrieval by understanding meaning rather than matching keywords. Yet pure semantic search has blind spots. Exact terms, product codes, technical identifiers, and rare words often get lost in the semantic space. Hybrid search combines the contextual understanding of dense retrieval with the precision of traditional keyword matching, delivering better results than either approach alone.

The limitations of pure semantic search

Dense retrieval using embedding models excels at understanding intent and finding conceptually related content. A query about "reducing employee turnover" will match documents discussing "staff retention strategies" even without shared keywords. This semantic understanding is powerful but comes with limitations.

Vocabulary mismatch

Embedding models have fixed vocabularies learned during training. Domain-specific terms, product names, acronyms, and technical jargon may not be well-represented in the embedding space. A query for "HDMI 2.1 specifications" might not find documents about "HDMI 2.1" if the model lacks strong representations for this specific term.

Exact match requirements

Some queries demand exact matches. Serial numbers, case IDs, policy numbers, and other identifiers must match precisely. Semantic search might find documents about similar topics but miss the specific document containing "Case-2024-0847" when that exact identifier is needed.

Rare terms and named entities

Less common words may not have strong embeddings. Proper nouns, unusual technical terms, and domain-specific vocabulary can produce weak or unreliable vector representations. Traditional keyword search handles these cases reliably because it does not depend on learned representations.

When keyword search wins

Identifiers: Serial numbers, case IDs, policy numbers, SKUs
Technical terms: Error codes, API names, protocol versions
Proper nouns: Company names, product names, person names
Acronyms: Domain-specific abbreviations not in training data
Exact phrases: Legal language, regulatory citations, quoted text

Understanding sparse and dense retrieval

Before implementing hybrid search, it helps to understand what distinguishes the two retrieval paradigms.

Sparse retrieval (keyword-based)

Traditional keyword search represents documents and queries as sparse vectors where each dimension corresponds to a term in the vocabulary. Methods like BM25 weight terms based on frequency and document length. The resulting vectors are "sparse" because most dimensions are zero (documents contain only a fraction of all possible terms).

BM25 remains remarkably effective. It handles exact matches perfectly, scales to massive document collections, and requires no training data. Its main weakness is the vocabulary mismatch problem: documents about "automobiles" will not match queries about "cars" unless both terms appear.

Dense retrieval (embedding-based)

Dense retrieval represents documents and queries as dense vectors (typically 384-1536 dimensions) where every dimension has a non-zero value. These embeddings capture semantic meaning, allowing conceptually similar content to have similar vector representations regardless of the specific words used.

Dense retrieval excels at understanding intent and finding relevant content even when vocabulary differs. Its weaknesses include the limitations described above, plus computational cost and the need for appropriate embedding models.

Hybrid search strategies

Hybrid search combines sparse and dense retrieval results. Several strategies exist for how to perform this combination.

Score fusion

The most common approach retrieves results from both systems and combines their scores. Since sparse and dense scores exist on different scales, normalization is essential before combination.

Linear combination normalizes scores from each system (often to 0-1 range) and combines them with weighted sum: score = α × dense_score + (1-α) × sparse_score. The weight α controls the balance. Values between 0.5 and 0.7 for the dense component work well for many applications, but optimal values depend on your data and use case.

Reciprocal Rank Fusion (RRF) combines based on rank rather than score:RRF_score = Σ 1/(k + rank_i) where k is a constant (typically 60) and rank_i is the document's rank in each result list. RRF is simpler because it does not require score normalization and is more robust to outliers.

Choosing a fusion method

Linear combination provides more control through the α parameter but requires careful tuning and normalization. RRF is more robust and often works well without tuning. Start with RRF for simplicity, then experiment with linear combination if you need finer control.

Query-adaptive weighting

Not all queries benefit equally from semantic search. A query containing an identifier like "INV-2024-00847" should weight keyword search heavily. A conceptual query like "best practices for onboarding new employees" benefits more from semantic understanding.

Query-adaptive approaches analyze the query to adjust weights dynamically. Simple heuristics include detecting identifiers (patterns like numbers, hyphens, specific formats), checking for quoted phrases, or measuring query length. More sophisticated approaches use classifiers trained to predict which retrieval mode will perform better.

Two-stage retrieval

Rather than fusing at the score level, two-stage retrieval uses one method for initial candidate selection and another for reranking. A common pattern uses BM25 to retrieve a larger candidate set (e.g., top 100), then uses dense embeddings or a cross-encoder model to rerank the candidates.

This approach balances efficiency with quality. Sparse retrieval is fast and ensures keyword matches are not missed. Dense reranking improves precision on the smaller candidate set where computational cost is manageable.

Implementation approaches

Several vector databases and search platforms now support hybrid search natively. Understanding the options helps you choose the right approach for your infrastructure.

Native hybrid search

Modern vector databases increasingly support hybrid search out of the box:

Weaviate supports hybrid search combining BM25 and vector search with configurable fusion methods and weighting.
Qdrant provides sparse vectors alongside dense vectors, enabling hybrid retrieval with various fusion strategies.
Pinecone offers sparse-dense search combining keyword and semantic retrieval in a single query.
Elasticsearch with vector search plugins can combine traditional text search with dense retrieval.

Native support simplifies implementation and often provides optimized performance. The trade-off is less flexibility in fusion strategies compared to custom implementations.

Custom implementation

For more control, you can implement hybrid search by querying separate systems and fusing results in your application code. This approach offers maximum flexibility but requires more engineering effort and careful attention to latency.

A typical architecture queries a traditional search engine (Elasticsearch, OpenSearch, or Solr) for keyword results and a vector database for semantic results, then fuses in the application layer. Parallel queries help manage latency, and caching can reduce repeated computation.

Sparse vector approaches

An alternative to combining separate systems is using learned sparse representations that capture some semantic understanding while maintaining sparsity benefits.

SPLADE and learned sparse models

Models like SPLADE (Sparse Lexical and Expansion Model) learn to generate sparse representations that include term expansion. A query about "cars" might produce non-zero weights for related terms like "automobile" and "vehicle", addressing vocabulary mismatch while maintaining sparse representation benefits.

Learned sparse models can be used alongside dense embeddings or as a middle ground between pure keyword and pure semantic search. They maintain interpretability (you can see which terms contributed to the match) while gaining some semantic flexibility.

Tuning and evaluation

Hybrid search introduces parameters that affect retrieval quality. Systematic evaluation helps you find optimal settings for your specific use case.

Key parameters

Fusion weight (α): Balance between dense and sparse scores. Test values from 0.3 to 0.8 in increments of 0.1.
Number of candidates: How many results to retrieve from each system before fusion. More candidates improve recall but increase computation.
RRF constant (k): Affects how quickly rank-based scores decay. Default of 60 works for most cases.
Score normalization: Method for normalizing scores before linear combination (min-max, z-score, etc.).

Evaluation strategy

Create an evaluation dataset with queries representing your actual use cases. Include queries where keyword search should win (exact matches, identifiers), queries where semantic search should win (conceptual questions, paraphrases), and mixed queries that benefit from both.

Measure retrieval metrics (recall, MRR, NDCG) for each configuration. Pay attention to performance on different query types. A good hybrid configuration should not significantly degrade performance on either pure keyword or pure semantic queries while improving overall results.

Evaluation query categories

Identifier queries: "Find document INV-2024-00847"
Technical term queries: "GDPR Article 17 requirements"
Conceptual queries: "How to improve customer satisfaction"
Paraphrase queries: Same intent expressed differently
Multi-aspect queries: Combining specific terms with conceptual needs

Performance considerations

Hybrid search adds complexity and potentially latency. Consider these factors when designing your implementation.

Latency management

Running two retrieval systems instead of one can double query latency if done sequentially. Execute sparse and dense queries in parallel to minimize impact. The total latency becomes the maximum of the two rather than their sum.

Index synchronization

If using separate systems for sparse and dense retrieval, keep indexes synchronized. A document appearing in one system but not the other creates inconsistent results. Design your ingestion pipeline to update both systems atomically or implement reconciliation processes.

Resource requirements

Hybrid search requires resources for both retrieval methods. Consider whether native hybrid support in a single system (potentially simpler but less flexible) or separate specialized systems (more complex but potentially more powerful) better fits your operational constraints.

Recommendations

1.Start with native hybrid support. If your vector database offers hybrid search, use it. The implementation is simpler and usually well-optimized. Custom solutions are warranted only when you need specific fusion strategies.
2.Begin with RRF fusion. Reciprocal Rank Fusion works well without careful tuning. Start here and move to weighted linear combination only if you need finer control.
3.Build a diverse evaluation set. Include queries that favor keyword search, semantic search, and mixed approaches. Optimize for overall performance without sacrificing any category.
4.Consider query-adaptive weighting. If your queries vary significantly in type, adaptive weighting can improve results. Start with simple heuristics before building complex classifiers.
5.Monitor both retrieval paths. Track which retrieval method contributes to final results. If one method consistently dominates, you may be able to simplify or need to investigate why.

Hybrid search addresses fundamental limitations of pure semantic retrieval without abandoning its benefits. For enterprise RAG systems handling diverse query types and document formats, hybrid approaches typically outperform either pure method. The implementation complexity is manageable, especially with native database support, and the retrieval quality improvements are often substantial.