Skip to content
Luxstay

Methodology

How Luxstay collects, structures, validates, and updates the data on every page. The summary below is intentionally short — auditable detail lives in the codebase and the sources registry.

  1. Step 1

    Curate destinations

    A human-curated list of destinations defines the Year-1 catalog (60 Vietnam destinations across three priority tiers). Curation considers search demand, traveler intent, and geographic coverage — not editorial favouritism.

  2. Step 2

    Hydrate from open sources

    For every destination we fetch records from GeoNames (geography, population, timezone), Wikidata (cross-locale identifiers), Wikipedia (narrative summaries), and OpenStreetMap (POIs). Raw payloads are attributed to a row in the entity_sources audit table so we can trace every fact back to a source.

  3. Step 3

    AI structuring (extract, never invent)

    Source data is fed to Anthropic's Claude with a tightly-scoped extraction prompt. The model is instructed to extract — not invent — factual values. Where source data is missing, the field is omitted rather than hallucinated. Each call returns a JSON document validated against a Zod schema before it touches the database.

  4. Step 4

    Cost & audit logging

    Every Claude call writes a row to the content_generations table recording the model, prompt version, token counts, latency, and cost in USD. This makes content economics transparent and lets us regenerate specific pages without re-running the whole catalog.

  5. Step 5

    Comparison engine

    Head-to-head comparison pages are generated from already- extracted destination facts — Claude only writes the comparison narrative, never the underlying numbers. Data points displayed in the comparison table are derived from structured facts, not free-form text.

  6. Step 6

    Update cadence

    Source data is re-pulled on a scheduled cadence (Year 1: monthly). Pages are regenerated when source data changes meaningfully or when the prompt version is bumped. Every regeneration increments the page's generation_version so historical content is auditable.

  7. Step 7

    Affiliate independence

    Editorial content (rankings, comparisons, recommendations) is produced before any affiliate or partner data is layered on. We never re-rank destinations or hide negatives based on commission. Affiliate links are clearly disclosed and tracked separately from page content.

Hard rules

Constraints we don't deviate from. Violations are bugs.

  • No scraping of OTA pages (Airbnb, Booking, Vrbo, TripAdvisor) — affiliate APIs only.
  • No copying of individual user reviews. Themes only, derived from our own affiliate-API data.
  • No invented facts. Where a source lacks a value, the field is omitted.
  • No editorial filler. Prose stays factual, hedged where source data is approximate.
  • No consumer profiling. Subscriber emails are stored alone, not joined to behavioural data.

Spotted an inaccuracy? Email [email protected] with the page URL and we'll trace it back to the source row in the audit table.