Census geography: Bridging data from prior years to the 2010 tract boundaries>>Use the LTDB>>Input your own data>>

Input Your Own Data: Access and STATA Programs

The LTDB offers code in Access and STATA that can be used, for any given year, in conjunction with the crosswalk file and an input data file prepared by the researcher.

The Access and STATA programs allow researchers to input three types of variables.

1. One type is "count" variables. These are variables, like the population of a tract or the number of disabled persons in a tract, that should be handled by summing the counts in origin tracts (with appropriate weights provided by the crosswalk file) to counts in the 2010 tract. For such variables, the user simply needs to add/select input variable names from their data file.

2. There are other kinds of variables that cannot simply be summed across source tracts. The typical case is a median or average or rate. Consider the example of median household income. Here the best estimate is a weighted average, where one calculates the average value (not the sum) of the median incomes of source tracts but weights them according to the number of households being contributed by each source tract. The user must identify the variable name to be handled this way along with a corresponding variable to be used as the weight. Typically in cases of medians or rates, this weighting variable will be the number of cases that the median represents or the denominator that was used in calculating the rate.

  • In STATA these variable names need to be added into the initial lines of code. Both types of variables can be handled in a single run.
  • In Access the count variables and medians/rates with associated weighting variables are identified through queries. The program is set up to do only one type of variable at a time. Users with both types of variables simply need to run Access twice.

3. A special kind of variable that LTDB allows users to harmonize is a dummy variable that takes on values of 0 and 1. Consider the example of a dummy variable for whether there is a toxic waste facility in the tract. If there is such a facility in every source tract, it would be desirable for the 2010 value to be 1, and if there is none in any source tract the 2010 value should be 0. If there is a toxic waste facility in some source tracts but not others, the value should intermediate. Certainly this is not like summing a count across tracts. We adjust dummy variables to 2010 boundaries in a way similar to count variables. They will be allocated according to the same weighting procedures, but the output variable must be interpreted differently. If all source tracts are coded zero on the dummy variable, the value for the 2010 tract will be zero. If any of the source tracts is coded one, the value for the 2010 tract will be equal to the sum of the proportions of the source tracts that contribute to it. Users might often recode this output variable simply to zero and non-zero.

Users not familiar with Access should be aware of some details in the procedure. Microsoft may generate a security warning about a file (on a line above the Access windows), and the user needs to click on that warning ("Options") and enable use of the file. Using the "External Data" menu, Access can import a text file, Excel file, or DBF file to be used as the input data file. The input data file will then be added to the list of tables in the Tables window. This file should not have spaces in its name. If it does, the name can be changed after importing (right-click on the file name in the Tables window). At that point the queries in the main window can be responded to: select a census year for the interpolation, select the input data table name and identify the key field (the tract ID variable), and select any number of variables to be interpolated. (Or, alternatively, select variables for which a weighted average should be estimated and their associated weighting variables.) Provide a name for the output file (default is "interpolate_table), click on "Interpolate" and the output file will be created.

The output file from Access (a comma-delimited .txt file) or STATA (a STATA data file) lists all of the 2010 information about the tract from the crosswalk file and values of the input variables converted to 2010 boundaries.

CLICK HERE for the download page.