Methodology
How Luxstay collects, structures, validates, and updates the data on every page. The summary below is intentionally short — auditable detail lives in the codebase and the sources registry.
Step 1
Curate destinations
A human-curated list of destinations defines the Year-1 catalog (60 Vietnam destinations across three priority tiers). Curation considers search demand, traveler intent, and geographic coverage — not editorial favouritism.
Step 2
Hydrate from open sources
For every destination we fetch records from GeoNames (geography, population, timezone), Wikidata (cross-locale identifiers), Wikipedia (narrative summaries), and OpenStreetMap (POIs). Raw payloads are attributed to a row in the entity_sources audit table so we can trace every fact back to a source.
Step 3
AI structuring (extract, never invent)
Source data is fed to Anthropic's Claude with a tightly-scoped extraction prompt. The model is instructed to extract — not invent — factual values. Where source data is missing, the field is omitted rather than hallucinated. Each call returns a JSON document validated against a Zod schema before it touches the database.
Step 4
Cost & audit logging
Every Claude call writes a row to the content_generations table recording the model, prompt version, token counts, latency, and cost in USD. This makes content economics transparent and lets us regenerate specific pages without re-running the whole catalog.
Step 5
Comparison engine
Head-to-head comparison pages are generated from already- extracted destination facts — Claude only writes the comparison narrative, never the underlying numbers. Data points displayed in the comparison table are derived from structured facts, not free-form text.
Step 6
Update cadence
Source data is re-pulled on a scheduled cadence (Year 1: monthly). Pages are regenerated when source data changes meaningfully or when the prompt version is bumped. Every regeneration increments the page's generation_version so historical content is auditable.
Step 7
Affiliate independence
Editorial content (rankings, comparisons, recommendations) is produced before any affiliate or partner data is layered on. We never re-rank destinations or hide negatives based on commission. Affiliate links are clearly disclosed and tracked separately from page content.
Hard rules
Constraints we don't deviate from. Violations are bugs.
- No scraping of OTA pages (Airbnb, Booking, Vrbo, TripAdvisor) — affiliate APIs only.
- No copying of individual user reviews. Themes only, derived from our own affiliate-API data.
- No invented facts. Where a source lacks a value, the field is omitted.
- No editorial filler. Prose stays factual, hedged where source data is approximate.
- No consumer profiling. Subscriber emails are stored alone, not joined to behavioural data.
Spotted an inaccuracy? Email [email protected] with the page URL and we'll trace it back to the source row in the audit table.