- Print
- DarkLight
- PDF
Data Overview
- Print
- DarkLight
- PDF
The City Directory Report is built on a significantly expanded and refined data foundation. These improvements enhance coverage, consistency, and review quality.
Expanded coverage: a larger, more complete collection
The City Directory dataset has been digitized at scale, including nearly 28,000 historical directory volumes across the United States, representing:
Millions of scanned pages
Over 5 billion occupant records
Coverage dating back to the late 1800s
Added historical vintages
Digitization efforts identified gaps in historical coverage. Where possible, additional directory volumes were sourced to close those gaps.
As a result, many locations now offer:
Additional decades of coverage compared to prior reports
More consistent year-to-year data, reducing historical gaps
Coverage varies by location, but it is common to see one to two additional decades, with some areas seeing even greater historical depth.
Address Standardization
Historically, the same street often appeared under multiple naming variations (e.g., “E. Main St,” “East Main,” “Main Street E”).
The updated dataset standardizes these variations so that:
Each street appears once
All associated years are grouped together
Tagged Records
To support identification of high-risk properties, occupant records are tagged for:
Property type
Commercial
Residential
Use Tags
Dry Cleaner
Gas Station
Tagging is performed using a combination of NAICS/SIC codes and a Word2Vec natural language processing model. The model identifies keyword patterns within occupant listings to flag relevant uses. A high confidence threshold has been set with an emphasis on false positive reduction.
.png)