Back to Blog
EngineeringJanuary 10, 20248 min read

How Our AI-Powered Hotel Matching Works

By Engineering Team

Hotel name matching is a challenging problem. Different sources use different naming conventions, abbreviations, and translations. Our matching system uses a sophisticated two-stage approach to achieve over 92% accuracy.

Stage 1: Fuzzy Candidate Retrieval

The first stage uses rapidfuzz for fast fuzzy string matching. We retrieve the top candidates based on:

  • Token-based matching - Handles word order differences ("Hilton Paris" vs "Paris Hilton Hotel")
  • Geographic distance - When coordinates are available, nearby hotels are prioritized
  • City name matching - Filters candidates by location

This stage is optimized for speed, processing thousands of hotels per second using PostgreSQL's trigram indexes.

Stage 2: Semantic Reranking

The second stage uses BGE-Reranker-Large, a cross-encoder model that understands semantic similarity. Unlike traditional fuzzy matching, it can:

  • Understand that "Hilton" and "Hilton Hotels" are the same chain
  • Recognize abbreviations ("NYC" = "New York City")
  • Handle translations and transliterations
  • Distinguish between similar but different hotels

Score Calibration

Raw model scores aren't directly interpretable. We use Platt calibration (logistic regression) to convert scores into probabilities. This allows us to:

  • Set meaningful thresholds (0.54 for match/no-match)
  • Provide confidence levels you can trust
  • Optimize for precision/recall based on your needs

Benchmarks

Our system achieves:

Metric Score
F1 Score 0.95
Precision 0.95
Recall 0.95

Tested on a diverse dataset of 10,000 hotel pairs across multiple languages and regions.

Alternative Models

We've benchmarked several models:

Model Best F1 Threshold
Mapping production model 0.95 0.54
bge_reranker_v2_m3 0.9188 0.48
gte_multilingual_reranker_base 0.9119 0.48
ms_marco_minilm_l6_v2 0.8936 0.66

Our production model — a fine-tuned reranker on top of BGE-Reranker-Large — provides the best balance of accuracy and speed for hotel property matching.