Data Overview
    • Dark
      Light
    • PDF

    Data Overview

    • Dark
      Light
    • PDF

    Article summary

    The City Directory Report is built on a significantly expanded and refined data foundation. These improvements enhance coverage, consistency, and review quality.

    Expanded coverage: a larger, more complete collection

    The City Directory dataset has been digitized at scale, including nearly 28,000 historical directory volumes across the United States, representing:

    • Millions of scanned pages

    • Over 5 billion occupant records

    • Coverage dating back to the late 1800s  

    Added historical vintages

    Digitization efforts identified gaps in historical coverage. Where possible, additional directory volumes were sourced to close those gaps.  

    As a result, many locations now offer:

    • Additional decades of coverage compared to prior reports

    • More consistent year-to-year data, reducing historical gaps

    Coverage varies by location, but it is common to see one to two additional decades, with some areas seeing even greater historical depth.

    Address Standardization

    Historically, the same street often appeared under multiple naming variations (e.g., “E. Main St,” “East Main,” “Main Street E”).

    The updated dataset standardizes these variations so that:

    • Each street appears once

    • All associated years are grouped together

    Tagged Records

    To support identification of high-risk properties, occupant records are tagged for:  

    • Property type

      • Commercial

      • Residential

    • Use Tags

      • Dry Cleaner

      • Gas Station

    Tagging is performed using a combination of NAICS/SIC codes and a Word2Vec natural language processing model. The model identifies keyword patterns within occupant listings to flag relevant uses. A high confidence threshold has been set with an emphasis on false positive reduction.  


    Was this article helpful?

    What's Next