Hotel Property ID Mapping: The Complete Guide for Travel Tech Teams
By Mapping Engineering
Hotel property IDs are one of the most frustrating data problems in travel tech. The same physical hotel — say, the Hilton Paris Opera — carries a different identifier on every platform it appears on. Booking.com assigns one ID, Expedia assigns another, Agoda a third, and the GDS systems (Amadeus, Sabre, Travelport) each have their own codes. None of them are the same.
This guide walks through why this happens, what it costs when you ignore it, and how to build a production-grade property matching pipeline to solve it.
Why Hotel IDs Are Fragmented
There is no universal hotel identifier. Unlike airlines, which have IATA codes, the hotel industry never standardized. Each distribution channel grew independently and assigned its own internal IDs.
OTA IDs are purely internal. Booking.com's property ID 12345 has no relationship to Expedia's property ID 12345 — they might not even be the same city. These IDs are assigned sequentially or pseudo-randomly as properties are onboarded into each platform.
GDS codes follow a different scheme. Amadeus uses a combination of chain code + property code (e.g., HH12345 for Hilton). Sabre and Travelport have their own formats. GDS codes were designed for a pre-internet era and prioritize brevity over global uniqueness.
Wholesaler and bed bank IDs (Hotelbeds, W2M, Webbeds, etc.) are yet another namespace entirely. Each wholesaler maintains their own catalog, and when the same property appears in multiple wholesalers, the IDs differ.
Chain direct IDs add a fifth dimension. Marriott's own reservation system uses Marriott property codes that map to none of the above.
The result: a single hotel can have 10–15 different identifiers across the systems your application touches, and none of them are cross-referenceable without a mapping layer.
The Cost of Not Mapping
Teams that put off building a mapping layer underestimate the downstream consequences.
Rate shopping errors are the most immediate problem. When you pull rates from multiple suppliers and can't confirm that BK-123456 and EX-789012 are the same property, you either show duplicate listings or, worse, compare rates from two different hotels as if they're competing for the same booking.
Inventory reconciliation failures compound over time. If your warehouse treats each supplier ID as a unique property, you end up with inflated property counts, split review scores, fragmented availability data, and reporting that is structurally wrong.
Duplicate bookings are the most operationally painful outcome. Without accurate mapping, customers sometimes book the same physical room twice through different channels because your deduplication logic can't recognize they're the same property.
Manual QA costs are the silent drain. Teams without a mapping layer typically assign someone — or a small team — to manually verify property identities for high-value bookings, new supplier onboarding, and discrepancy investigations. This labor is expensive and doesn't scale.
What Hotel Property Matching Solves
A property matching API accepts a property record from your system — typically a name, city, and optionally an address or coordinates — and returns a canonical identifier with a confidence score. The API has already indexed the OTA IDs, GDS codes, and chain codes across its database, so once you identify a property in the canonical namespace, you can look up all its other identifiers.
The matching step happens once per new property. After that, you store the mapping in your own database and reference it indefinitely.
Building the Pipeline: Step by Step
Step 1: Collect Your Inventory
Export a CSV from your current system with at minimum a property name and city. Additional fields improve accuracy:
name,city,country,address,supplier_code
Hilton Paris Opera,Paris,France,108 Rue Saint-Lazare,BK-123456
Marriott Times Square,New York,USA,1535 Broadway,EX-789012
Grand Hyatt Tokyo,Tokyo,Japan,2-1-1 Nishi-Shinjuku,AG-345678
Required columns: name, city
Recommended: country, address, supplier_code (your internal or OTA ID for audit trail)
Clean the data before submission. Remove duplicates, standardize encoding to UTF-8, and ensure the name column contains the property's common trading name rather than a legal entity name.
Step 2: Submit via Batch API
For files under a few thousand rows, submit as a single batch job:
curl -X POST https://api.mapping.travel/v1/batch \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F "file=@hotels.csv" \
-F "webhook_url=https://yourapp.com/webhooks/mapping"
The response returns a job ID immediately:
{
"job_id": "job_01HXYZ9ABC123",
"status": "queued",
"total_rows": 3,
"estimated_seconds": 12,
"webhook_url": "https://yourapp.com/webhooks/mapping"
}
For larger inventories, split into batches of 50,000 rows and submit in parallel. The API is stateless and jobs do not depend on each other.
Step 3: Retrieve Results and Handle Confidence Tiers
Poll the job status endpoint or wait for the webhook callback. The result set returns one record per input row:
{
"job_id": "job_01HXYZ9ABC123",
"status": "completed",
"results": [
{
"input_name": "Hilton Paris Opera",
"input_city": "Paris",
"match_id": "ht_4f8a2b1c",
"match_name": "Hilton Paris Opera",
"confidence": "HIGH",
"score": 0.9541,
"booking_com_id": "123456",
"expedia_id": "78901234",
"agoda_id": "12345678"
},
{
"input_name": "Grand Hyatt Tokyo",
"input_city": "Tokyo",
"match_id": "ht_9c3d7e2a",
"match_name": "Grand Hyatt Tokyo",
"confidence": "MEDIUM",
"score": 0.7218,
"booking_com_id": "654321",
"expedia_id": null,
"agoda_id": "87654321"
},
{
"input_name": "Some Unknown Hotel",
"input_city": "New York",
"match_id": null,
"match_name": null,
"confidence": "NO_MATCH",
"score": 0.3102,
"unmatch_reason": "BELOW_THRESHOLD"
}
]
}
Step 4: Process Results by Confidence Tier
import csv
from typing import Optional
CONFIDENCE_THRESHOLDS = {
"HIGH": 0.80,
"MEDIUM": 0.60,
"LOW": 0.54,
}
def process_batch_results(results: list[dict]) -> dict:
accepted = []
review_queue = []
rejected = []
for result in results:
confidence = result.get("confidence")
match_id = result.get("match_id")
if confidence == "HIGH" and match_id:
accepted.append({
"input_name": result["input_name"],
"canonical_id": match_id,
"score": result["score"],
"booking_com_id": result.get("booking_com_id"),
"expedia_id": result.get("expedia_id"),
})
elif confidence in ("MEDIUM", "LOW") and match_id:
review_queue.append({
"input_name": result["input_name"],
"suggested_id": match_id,
"suggested_name": result["match_name"],
"score": result["score"],
"confidence": confidence,
})
else:
rejected.append({
"input_name": result["input_name"],
"reason": result.get("unmatch_reason", "NO_MATCH"),
})
return {
"accepted": accepted,
"review_queue": review_queue,
"rejected": rejected,
}
Step 5: Maintain Your Mapping Table
Store results in a dedicated table in your database. A minimal schema:
CREATE TABLE property_mapping (
id SERIAL PRIMARY KEY,
canonical_id VARCHAR(32) NOT NULL,
supplier VARCHAR(64) NOT NULL,
supplier_id VARCHAR(128) NOT NULL,
confidence VARCHAR(16) NOT NULL,
score FLOAT NOT NULL,
verified_at TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW(),
UNIQUE (supplier, supplier_id)
);
Run re-matching periodically — monthly is sufficient for most inventories. New properties are added to the canonical database daily, so a property that returned NO_MATCH today may resolve in 30 days.
Handling Edge Cases
Multilingual property names are common for properties in Japan, China, the Middle East, and Eastern Europe. If your inventory uses a mixture of local scripts and romanized names, include both in the name field separated by a slash: 東京グランドハイアット / Grand Hyatt Tokyo. The semantic reranker handles transliterations without additional configuration.
Chain properties and soft brands create genuine ambiguity. A "Curio Collection by Hilton" property is both a Hilton and a distinct brand. When matching these, you will see MEDIUM rather than HIGH confidence unless the full brand name is present in the input. For chain portfolios, include the chain affiliation in the name field to improve resolution.
Same building, different operators happens in mixed-use developments where a hotel shares a name and address with a serviced apartment block or coworking space. If your inventory contains these, add a type field with hotel as the value — the API uses this to restrict candidate retrieval to hotel properties only.
Performance Benchmarks
Accuracy is measured on a held-out dataset of 10,000 property pairs spanning 42 languages and 180 countries.
| Metric | Score |
|---|---|
| F1 Score | 0.95 |
| Precision | 0.95 |
| Recall | 0.95 |
A 95% F1 means the matcher is correct on the overwhelming majority of cases. The remaining 5% — typically thin, ambiguous, or duplicate-prone records — surface as MEDIUM/LOW confidence so you can route them to human review or fall back to a stricter heuristic.
The system is calibrated to prefer not-matching over making a wrong match. NO_MATCH is a feature, not a failure: a confident "I don't know" is more useful than a guess.
Next Steps
- Sign up for a free account to access 2 mappings per day at no cost
- Use the batch API endpoint for bulk processing
- Read the confidence levels guide to tune your automation thresholds
- For high-volume use cases, contact us about Enterprise pricing and dedicated infrastructure