1. Address normalization
Property addresses come in many formats. We normalize addresses from both sources to match them reliably:
- Remove extra spaces and punctuation
- Standardize flat numbers (e.g., "Flat 2" vs "2" vs "FLAT 2")
- Handle common abbreviations (Rd/Road, St/Street)
- Account for building names and postcodes
2. Fuzzy matching algorithm
We use string similarity algorithms to match addresses even when they're not identical:
- Levenshtein distance — Measures character-by-character similarity
- Token matching — Breaks addresses into components and matches individually
- Postcode validation — Ensures listings and sales are in the same postcode
Match threshold: We only show properties with a match score of 90% or higher. This means we're highly confident the current listing is the same property as the historic sale.
3. Historical sale lookup
For each matched property, we find the most recent sale in the Land Registry data:
- Search for all sales at the matched address
- Select the most recent transaction
- Extract sale price and date
4. Price comparison
We compare the current asking price against the last sale price:
- Price drop (£) — Last sale price minus current asking price
- Price drop (%) — Percentage decrease from last sale price
- Filter threshold — Only show properties with a meaningful price drop (typically 5% or more)
5. Quality filters
We apply additional filters to ensure quality results:
- Time filter — Only compare against sales from recent years (currently 2020 onwards)
- Duplicate detection — Remove duplicate listings of the same property
- Outlier removal — Filter extreme cases that likely indicate data errors