Census geography: Bridging data for census tracts across time

The Longitudinal Tract Data Base (LTDB) provides estimates within 2010 tract boundaries for tract-level census data that are available for prior years as early as 1970 and also for 2015-2019 and 2020. Researchers increasingly work with information aggregated to the tract level from non-census sources, such as criminal justice, public health, and voting records. To meet their needs requires a tool to convert such data to the 2010 boundaries. The LTDB offers an open-source crosswalk to link data from 1970-2000 or 2020 to 2010, and it also provides user-friendly programming code to bridge data across years.

Note that much of the U.S. was not divided into census tracts in 1970. Some areas did not have tracts even in 1980, and some tracted areas did not have any enumerated population as late as 2000. These areas are omitted from the LTDB estimates.

To access the LTDB, select one of the following links:

    • Use the LTDB Click here for information about components and revisions of the LTDB and file downloads
    • Map the LTDB data using the web-based map system for 1970-2010 only, using the original LTDB estimates

Methods

Researchers can use the LTDB estimates with a high degree of confidence.  Still, these data are estimates and they are subject to error.  Fortunately, many census tracts have maintained constant boundaries over time, and in many other cases the changes are simple to deal with (e.g., if a 2000 tract is divided into two or more in 2010, the earlier tract composition can be calculated by adding up the parts). Unfortunately, many changes are much more complex, and sometimes even individual blocks are divided into different tracts.  On average, harmonized estimates are accurate, but if one is interested in a specific small area and how it changed over time, they should be interpreted cautiously.

The methodology of the original LTDB is described in detail in Logan et al 2014 (see below). Since then we have experimented with alternative approaches.

  • The original LTDB used a combination of area and population interpolation.  We conducted an early test of its efficacy by comparing our estimates of total tract population in 2000 to a data file released by the Census Bureau where individual records from 2000 had been aggregated within 2010 tract boundaries.  Logan et al (2016) compares the LTDB tract population estimates for 2000 with those provided by Geolytics (formerly the NCDB) and another source provided by NHGIS.. Based on our analysis we did not recommend the Geolytics data set, but both the NHGIS and LTDB estimates were very accurate.
  • Subsequently we were given access to the confidential data in the census Federal Statistical Research Data Center (FSRDC) system.  We used this opportunity to create a full set of “true” values of 2000 characteristics in 2010 boundaries, covering all 100+ variables in the LTDB.  We compared these true values to our estimates, and learned that the level of error for many of them was much higher than we had found for total population.  We now know that much estimation error results from an assumption that we refer to as “spatial stationarity.” In the case of tracts with complex boundary changes, the LTDB uses information on block-level populations to estimate what proportion of a tract’s population at a given time should be allocated to a given 2010 tract. This is population interpolation. Then persons in all population categories are allocated in the same proportions. But evidently there can be considerable clustering of people based on class, race, or other characteristics at the block level, and allocating them on the basis of population counts can be misleading.

    We tested an alternative in which small-area data for each specific category of persons are used for this interpolation (we refer to this as a “trait-based” or TB approach). NHGIS (see https://www.nhgis.org/time-series-tables) also deals with within tract heterogeneity in this way.  Logan et al (2024) shows that using block data greatly improves estimates for full-count variables in 2000, but not for the sample (“long form”) variables. We believe that the smaller sampling proportions and larger size of areal units (block groups) for the long form data introduce too much error to be used in this way.  The 2020 racial composition estimates in the current LTDB use this approach.

    To download and compare the original LTDB and TB estimates for all variables in 2000, click here; The TB interpolations were carried out by Dr. Zengwang Xu (University of Wisconsin, Milwaukee). This csv file uses the variable names in the LTDB codebook, with **_LTDB added to for the original estimates and **_NEW added to the TB estimates.
  • We have now focused on how to make more direct use of the “true” values that we can access within the FSRDC.  These tract data cannot be disclosed publicly.  One approach that we attempted was to introduce noise into the tract data that would protect confidentiality of individual records, using the same “differential privacy” (DP) algorithms now being applied to published census data.  We conducted a demonstration of this approach for a selected set of 12 full count and sample count variables.  The Census Bureau approved disclosure of the DP estimates for these variables, and we showed (Logan et al 2021) that these DP estimates – adding noise that was random and unbiased – resulted in very high accuracy. 

    To download the original LTDB and DP estimates for 12 variables, click here .  This file includes a codebook in a separate tab.  It also includes variables showing what kind of tract change occurred between 2000 and 2010 using two criteria to indicate a real change: 1) more than 1% of the land area of the tract shifted, or 2) more than 5% of the land area shifted. The Census Bureau has reviewed the DP estimates to ensure appropriate access, use, and disclosure avoidance protection of the confidential source data used to produce them. This research was performed at a Federal Statistical Research Data Center under FSRDC Project Number 2517. (CBDRB-FY20-208).
  • The Census Bureau did not approve disclosure of DP estimates for the many other variables in the LTDB, but a new noise injection algorithm has come into use (the Discrete Gaussian Mechanism, DGM).  We have found that the DGM estimates are less accurate than the DP estimates but much more accurate than the original LTDB estimates.  The  DGM estimates have now been approved for disclosure.  We now use them in the current LTDB release for 2000.  The Census Bureau has reviewed the DGM estimates to ensure appropriate access, use, and disclosure avoidance protection of the confidential source data used to produce them. This research was performed at a Federal Statistical Research Data Center under FSRDC Project Number 2517. (CBDRB-FY24-0016, CBDRD-FY24-0189).

    We have carried out an analysis of the robustness of the DG estimates, similar to those that we used to evaluate other estimates. These statistical tables were also approved for public disclosure (CBDRB‐FY24‐0301).  Click here to download the RMSE tables for both the DGM and DP estimates.

The DGM approach cannot be applied to other years at this time.  Records from the 2020 Census and recent American Community Survey years are not yet available in the FSRDC.  Records from 1970-1990 are available, but they cannot be mapped directly into 2010 tracts.  Researchers associated with NHGIS have been working to create usable locations for census respondents in 1990.  We are taking a different approach that can potentially be applied to decennial censuses as early as 1930, when all individual records are available either publicly (via IPUMS) or in the FSRDC, along with respondents’ locations at the scale of enumeration districts, which are comparable to contemporary block groups.  There is much potential for an improved 1930-2020 system within comparable boundaries.

References:
Logan, John R., Zengwang Xu, and Brian J. Stults. 2014. "Interpolating US Decennial Census Tract Data from as Early as 1970 to 2010: A Longitudinal Tract Database" The Professional Geographer 66(3): 412–420.
DOI: 10.1080/00330124.2014.905156.

Logan, John R., Brian J. Stults, and Zengwang Xu. 2016. "Validating Population Estimates for Harmonized Census Tract Data, 2000–2010" Annals of the American Association of Geographers. DOI:10.1080/24694452.2016.1187060.

John R. Logan, Wenquan Zhang, Brian J. Stults, and Todd Gardner. 2021. “Improving Estimates of Neighborhood Change with Constant Tract Boundaries” Applied Geography 132:1-11. DOI: 10.1016/j.apgeog.2021.102476 .

John R. Logan, Wenquan Zhang, and Zengwang Xu. 2024. “Using Public Data to Improve Population Estimates Within Consistent Boundaries” Professional Geographer, 76:398-407. DOI: 10.1080/00330124.2024.2306645.