01-27663. Report of the Executive Steering Committee for Accuracy and Coverage Evaluation Policy and Statement of the Acting Director of the U.S. Census Bureau on Adjustment for Non-Redistricting Uses
-
Start Preamble
Start Printed Page 56006
AGENCY:
Bureau of the Census.
ACTION:
Notice of report and statement of Acting Director of the Census Bureau regarding adjustment decision.
SUMMARY:
This notice provides the Executive Steering Committee on Accuracy and Coverage Evaluation Policy (ESCAP) report and the statement of the Acting Director of the Census Bureau regarding the potential application of statistically adjusted data from Census 2000 for the following uses: (1) As controls to produce estimates from the Census 2000 long form (sample) data, (2) as demographic survey controls, and (3) as the base for producing post-censal estimates. The ESCAP report and statement of the Acting Director are attached as exhibits to the SUPPLEMENTARY INFORMATION section of this notice. In addition to publication in the Federal Register, the report is posted on the Census Bureau Web site at <http://www.census.gov/dmd/www/EscapRep2.html>,;, and the Acting Director's statement is available electronically at <http://www.census.gov/Press-Release/www/2001/cooper.pdf>.
Start Further InfoFOR FURTHER INFORMATION CONTACT:
John H. Thompson, Principal Associate Director for Programs, U.S. Census Bureau, FB-3, Room 2037, Washington, DC 20233. Telephone: 301 (457)-3946; fax: 301 (457)-3024.
End Further Info End Preamble Start Supplemental InformationSUPPLEMENTARY INFORMATION:
Background Information
The decennial census is mandated by the United States Constitution (Article I, Section 2, Clause 3) to provide the population counts needed to apportion the seats in the U.S. House of Representatives among the states. By December 28, 2000, the Census Bureau fulfilled its constitutional duty by delivering to the Secretary of Commerce the state population totals used for congressional apportionment. In accordance with the January 25, 1999, Supreme Court ruling, Department of Commerce v. House of Representatives, 119 S.Ct. 765 (1999), the Census Bureau did not use statistical sampling to produce the state population totals used for congressional apportionment.
However, the Census Bureau has examined the use of statistical sampling to produce statistically adjusted Census 2000 data for nonapportionment purposes. Pursuant to Title 15, Code of Federal Regulations, Part 101, issued by the Secretary of Commerce (66 FR 11232, February 23, 2001), the Acting Director of the Census Bureau submitted his recommendation, based upon the ESCAP's March 1, 2001 report, regarding the methodology to be used in producing the tabulations of population reported to states and localities pursuant to 13 U.S.C., Section 141(c) (data used for congressional and state and local legislative redistricting), to the Secretary of Commerce (66 FR 14003, March 8, 2001). The Secretary then made the final determination regarding statistical adjustment of the redistricting data (66 FR 14520, March 13, 2001).
After the issuance of the March 1, 2001, ESCAP report (“Recommendation Concerning the Methodology to be Used in Producing the Tabulations of Population Reported to States and Localities Pursuant to 13 U.S.C., Section 141(c)”), the Committee reconvened to examine the potential use of the statistically adjusted data for nonredistricting purposes—namely, as controls to produce estimates from the Census 2000 long form (sample) data, as demographic survey controls, and as the base for producing post-censal estimates. The ESCAP used analysis from reports on topics chosen for their usefulness in informing its recommendation regarding the suitability of using the statistically adjusted data for these nonredistricting purposes. The Committee also drew upon work from other Census Bureau staff, as appropriate. This notice provides the Committee's report and the Acting Director's statement regarding his determination of the appropriateness of statistical adjustment of the Census 2000 data for these purposes.
Start SignatureDated: October 30, 2001.
William G. Barron, Jr.,
Acting Director, Bureau of the Census.
Memorandum for Kathleen B. Cooper, Under Secretary for Economic Affairs
From: William G. Barron, Jr., Acting Director
Subject: Notification of Decision
I am attaching the recommendation of the Executive Steering Committee for A.C.E. Policy (ESCAP) on whether the Accuracy and Coverage Evaluation Survey should be used to adjust Census 2000 data for non-redistricting purposes. As in March, I asked ESCAP to provide a recommendation because I rely on the knowledge, experience, and technical expertise of the Committee and Census Bureau staff who have worked extremely hard with tremendous dedication and expertise through every phase of Census 2000.
After assessing considerable new evidence, ESCAP now recommends that unadjusted Census 2000 data be used for non-redistricting purposes. The effect of this new evidence is that the A.C.E. overstated the net undercount by at least 3 million persons. The cause of this error was that the A.C.E. failed to measure a significant number of census erroneous enumerations, many of which were duplicates. This level of error in the A.C.E. measurement of net coverage is such that the A.C.E. results cannot be used in their current form. This finding of substantial error, in conjunction with remaining uncertainties, necessitates that revisions, based on additional review and analysis, be made to the A.C.E. estimates before any potential uses of these data can be considered.
As a member of ESCAP and as Acting Director, I concur with and approve the Committee's recommendation that unadjusted data be used for non-redistricting purposes and have decided that the Census Bureau will release the remaining Census 2000 data products, post-censal estimates, and survey controls using unadjusted data. It is possible that further research and analysis could yield revised A.C.E. estimates, and that these revised estimates could be used to improve estimates developed as part of the Census Bureau's annual population adjustments for survey controls and other purposes.
Report of the Executive Steering Committee for Accuracy and Coverage Evaluation Policy on Adjustment for Non-Redistricting Uses
October 17, 2001
Recommendation
The Executive Steering Committee for A.C.E. Policy (ESCAP) recommended on March 1, 2001 that unadjusted census data be used for redistricting. After assessing considerable new evidence, ESCAP now recommends that unadjusted Census 2000 data also be used for non-redistricting purposes. The effect of this new evidence is that the Accuracy and Coverage Evaluation (A.C.E.) overstated the net undercount by at least 3 million persons. The cause of this error was that the A.C.E. failed to measure a significant number of census erroneous enumerations, many of which were duplicates. This level of error in the A.C.E. measurement of net coverage is such that the A.C.E. results cannot be used in their current form. This finding of substantial error, in conjunction with remaining uncertainties, necessitates that revisions, based on additional review and analysis, be made to the A.C.E. estimates before any potential uses of these data can be considered. The Census Bureau will release the remaining Census 2000 data products, post-censal estimates, and survey controls using unadjusted data. It is, however, reasonable to expect that further research and analysis may lead to revised A.C.E. estimates that can be used to improve future post-censal estimates.Start Printed Page 56007
The ESCAP review confirmed the finding in the first ESCAP Report that most Census 2000 and A.C.E. operations were of high quality. The evaluations continue to demonstrate that improvements were achieved over both the 1990 census and the 1990 coverage measurement survey. Important new information and methods are now available for assessing the A.C.E. and Census 2000. As will be discussed in more detail below, final analysis of this new information is still in progress. However, the Census Bureau believes that this analysis will confirm that Census 2000 made substantial gains in reducing the total net undercount, as well as reducing net differential undercount. Most of the A.C.E. operations were also seen to be well conducted, producing valuable information that, when combined with the other evaluation findings, provides important new research data. The ESCAP feels confident that its research program will enhance the evaluations of Census 2000, contribute to planning for the 2010 census, and, with further analysis, potentially improve future the post-censal estimates.
The ESCAP's primary concern in its March decision was that fundamental differences between the Demographic Analysis (DA) estimates and the A.C.E. estimates could not be explained. The estimates differed widely, both for the total national population and for important population groups. The Committee investigated this inconsistency extensively but could not adequately explain it in the time available for the March decision. The Committee concluded in March that the inconsistency must have resulted from one or more of three possible scenarios. The first scenario was that all available 1990 census data, including the census results, the coverage measurement survey, and the demographic analysis estimates, significantly understated the Nation's population, but that Census 2000 found this previously un-enumerated population. The second scenario was that demographic analysis underestimated population growth between 1990 and 2000. The third scenario was that the A.C.E. overestimated the Nation's population, raising the possibility of an undiscovered problem in the A.C.E. or census methodology.
The Census Bureau's extensive research over the past eight months has been directed at examining demographic analysis, the A.C.E., and Census 2000. Demographic analysis research examined historic levels of the components of population change to address the possibility that the 1990 demographic analysis estimates understated the national population (the first scenario). This analysis did not reveal any significant problems. The Census Bureau investigated the second scenario by revising the preliminary estimates of international migration, and hence the foreign-born population, using actual Census 2000 long form data. The Census Bureau also consulted with outside experts on this work. These studies resulted in revisions to the “Base DA” that was initially examined as part of the March 2001 decision. The revisions reflected a larger growth in the foreign-born population during the last decade. The current revised demographic analysis estimates are much closer to the Alternative DA considered during the March deliberations. The A.C.E. and demographic analysis evaluations, when analyzed together, explain many of the inconsistencies.
With regard to the third scenario, the ESCAP's review of the accuracy of the A.C.E. and Census 2000 was based on a number of evaluation studies, including reinterview studies, re-processing studies, and computer searches for duplicate enumerations. This research found that the A.C.E. did not account for a large number of Census 2000 duplicates, leading to an overstatement of the Census 2000 net undercount. As described previously, this finding, in conjunction with the revisions to demographic analysis, explains to a large degree the discrepancies between the A.C.E. and demographic analysis. The significance of the error in the A.C.E. treatment of duplicates compels the recommendation that the current A.C.E. estimates cannot be used to adjust the Census 2000 data.
The ESCAP notes that its extensive evaluation program has provided information that was unavailable for previous decennial censuses. This important new information was the result of outstanding and innovative work on the part of many Census Bureau employees. Additionally, the Committee notes that some of the information resulted from new methodologies not available in prior censuses. Census 2000 was the first census to capture name information in a way that permits nationwide computer matching. The evaluation results, including the new tool of name matching, will be extremely valuable for evaluating the accuracy of Census 2000, planning for the 2010 census, and potentially for improving future post-censal estimates. Both census taking and coverage measurement are processes that evolve and improve with each census. The Census 2000 experience will help refine both census and coverage measurement processes for future censuses.
While the ESCAP has recommended against use of the adjusted data, the A.C.E.'s original objective of addressing the differential undercount must still be pursued. The totality of the evidence considered by the Committee leads it to believe that while Census 2000 successfully lowered the historical pattern of the differential undercount, it did not eliminate it. The Census Bureau believes that the net undercount remains disproportionately distributed among renters and minority populations. With further research, it is reasonable to expect that new information can be used to produce revised A.C.E. estimates. These revised estimates may then be employed to improve post-censal population estimates by reducing remaining differential coverage error. It is also expected that planning for the 2010 census will greatly benefit from these findings, with improved operations to identify and remove duplicates and refined methods to improve the accuracy of all census operations. The Census Bureau will continue research to design improved operations, including coverage measurement studies, for future censuses and surveys.
Executive Summary
After assessing considerable new evidence, the second ESCAP Committee (ESCAP II) has recommended that unadjusted Census 2000 data also be used for non-redistricting purposes. New evidence indicates that the Accuracy and Coverage Evaluation (A.C.E.) overstated the net undercount by at least 3 million persons, and that the cause of this error was the A.C.E.'s failure to measure a significant number of census erroneous enumerations, many of which were duplicates. This level of error in the A.C.E. measurement of net coverage is such that the A.C.E. results cannot be used in their current form. This finding of substantial error, in conjunction with remaining uncertainties, necessitates that revisions, based on additional review and analysis, be made to the A.C.E. estimates before any potential uses of these data can be considered. The Census Bureau will release the remaining Census 2000 data products, post-censal estimates, and survey controls using unadjusted data. It is, however, reasonable to expect that further research and analysis may lead to revised A.C.E. estimates that can be used to improve future post-censal estimates.
ESCAP II has also confirmed the finding in the first ESCAP Report that most Census 2000 and A.C.E. operations were of high quality. More recent evaluations continue to demonstrate that improvements were achieved over both the 1990 census and the 1990 coverage measurement survey. Important new information and methods are now available for assessing the A.C.E. and Census 2000. As will be discussed in more detail below, final analysis of the effects of this new information is still in progress. However, the Census Bureau believes that this analysis will confirm that Census 2000 made substantial gains in reducing the total net undercount, as well as the net differential undercount. Most of the A.C.E. operations were also seen to be well conducted, producing valuable information that, when combined with the other evaluation findings, provides important new research data. The ESCAP feels confident that the Census Bureau's continuing research program will enhance the evaluations of Census 2000, contribute to planning for the 2010 census, and, with further analysis, potentially improve the post-censal estimates.
The ESCAP's primary concern in its March decision was that demographic analysis and the A.C.E. estimates differed widely, both for the total national population and for important population groups. The Committee concluded in March that the inconsistency must have derived from one or more of three possible scenarios. The first scenario was that all available 1990 census data, including the 1990 census, the 1990 coverage measurement survey, and the 1990 demographic analysis estimates significantly understated the Nation's population, while Census 2000 included portions of this previously un-enumerated population. The second scenario was that demographic analysis estimates underestimated population growth between 1990 and 2000. The third scenario was that the A.C.E. overestimated the Nation's population, raising the possibility of an undiscovered problem with the A.C.E. or Start Printed Page 56008census methodology. The ESCAP also identified additional technical concerns that are documented in the previous report.
Areas of Research
In the months since the ESCAP I Report, the Committee embarked on a second round of deliberations to address the concerns identified in the report and to enable the Census Bureau to recommend whether Census 2000 data should be adjusted for future uses. The future uses in consideration included the post-censal population estimates, demographic survey controls, and the production of Census 2000 long form data products. The ESCAP I Committee did not have current results for certain measures of A.C.E. accuracy, and was forced to use 1990 data on potential A.C.E. errors. The ESCAP therefore directed and documented that a number of evaluations be conducted to inform the deliberations. Some of the evaluations were designed to provide current measures of accuracy for the various components of error. These evaluations involved additional technical research, field work, and data processing, as well as new computer matching and simulation research. The evaluations include:
Demographic Analysis (DA) Research
The DA research program examined historical levels of the components of population change to address the possibility that the 1990 DA estimates understated the Nation's population and that demographic analysis did not capture the full population growth in the last decade. The Census Bureau consulted with outside demographic experts to plan and conduct its research program, focusing on the methodologies and underlying estimates of the components of population change. The research activities concentrated on two major areas—international migration and the robustness of the DA estimates.
Measurement of Erroneous Enumerations, Including Duplication
Erroneous enumerations refer to individuals who should not be included in the census counts because they are duplicated, fictitious, or live someplace other than where they were enumerated. While the ESCAP I Report did not identify erroneous enumerations as an area of concern, Census Bureau researchers quickly noted that Census 2000 erroneous enumerations differed substantially from 1990 measures in ways that were not readily understood. Studies included the Measurement Error Reinterview/Evaluation Followup (hereinafter called the EFU) and the Person Duplication Studies. EFU results were used to determine how well the A.C.E. identified erroneous enumerations. The EFU was based on a reinterview of a sample drawn from the A.C.E. clusters. The Person Duplication Studies used computer matching techniques to identify Census 2000 duplicate enumerations throughout the United States, and to determine whether the A.C.E. estimates had correctly accounted for these duplications. These studies used computer matching methods not available in earlier censuses.
Measurement of Census Omissions
Census omissions refer to individuals who should have been counted in the census but were not. The A.C.E. methodology must accurately account for both erroneous enumerations, as described above, and census omissions. The A.C.E. identifies omissions by matching an independent sample to the census. The accuracy of this measurement of omissions thus depends on the accuracy of the matching, as well as the accuracy of the information collected by the independent sample. Census omissions were evaluated in the Matching Error Study, in which expert matchers re-matched a sample of the A.C.E. to determine the accuracy of the A.C.E. matching process. Omissions were also evaluated in the EFU described above to measure the accuracy of the A.C.E. information on Census Day residence, including whether persons had moved since Census Day.
Missing Data Studies
Missing data occurs in the A.C.E. if, after all attempts, there remain persons for whom complete data are not available, including demographic characteristics such as age or race. Missing data also includes the status of whether a person matched, was a resident on Census Day, or was correctly enumerated. The latter types of missing data can seriously affect the accuracy of coverage measurement surveys such as the A.C.E. The A.C.E. used a statistical model to account for the effects of missing data. The ESCAP directed the development of alternative missing data models to assess the effect on the estimates of using different assumptions to predict the effects of missing data.
Balancing Error
The previous ESCAP report indicated concerns with balancing error. Balancing error occurs when the method used to determine the number of omissions is different from the method used to determine which records are correctly included in the census. The specific concern was that the area for matching to find omissions was different from the area used to determine erroneous enumerations. The ESCAP posited various scenarios that could explain the concerns with balancing error, ranging from small to very serious effects on the A.C.E. estimates. In order to investigate these concerns, additional field operations were conducted.
Synthetic Error Study
The A.C.E. estimation methodology produced estimated coverage correction factors which were carried down within the post-strata in a process referred to as synthetic estimation. The key assumption underlying synthetic estimation is that net census coverage is relatively uniform within the post-strata. Failure of this assumption leads to synthetic error. The Census Bureau is concerned with synthetic error since it may affect the accuracy of small area estimates and cannot be directly estimated. ESCAP I examined the effects of synthetic error by studying “artificial populations,” populations created with surrogate variables that are known for the entire population, and are developed to reflect the distribution of net coverage error. ESCAP II directed the preparation of additional artificial populations.
Evaluation Results
Demographic analysis research examined historical levels of the components of population change to address the possibility that the 1990 demographic analysis estimates understated the national population (the first scenario). This analysis did not reveal any significant problems. The Census Bureau investigated the second scenario by revising the estimates of international migration using preliminary Census 2000 long form data, and estimates of the numbers of births, using more current assumptions about birth registration. The Census Bureau also consulted with outside experts on this work. This analysis resulted in revisions to the Base DA that was initially examined as part of the ESCAP I decision. The revisions reflected a larger growth in the foreign-born population during the last decade. The current Revised DA estimates considered by ESCAP II are much closer to the Alternative DA considered during the ESCAP I deliberations. Many of the inconsistencies previously noted are removed when the Revised DA estimates are viewed in light of the A.C.E. evaluations. The Revised DA national estimate of 281.8 million for the U.S. resident population is 2.2 million higher than the Base DA and about 0.6 million lower than the Alternative DA. The Revised DA net undercount rate of 0.12 percent compares to a net overcount of 0.65 percent implied by the Base DA, and a net undercount of 0.32 percent using the Alternative DA.
Erroneous Enumerations
The studies examining the accuracy of the measurement of erroneous enumerations initially found serious errors that would have resulted in a large overstatement of the population by the A.C.E. The seriousness of these findings prompted the Committee to direct further work to make sure that the findings were correct. This additional review indicated that a significant problem existed with the measurement of erroneous enumerations, but also indicated that the study findings were subject to uncertainties resulting from a large number of cases left unresolved or conflicting. The Person Duplication Studies added additional information underscoring the seriousness of the errors in measuring erroneous enumerations. These duplication studies found that the A.C.E. had seriously understated the level of erroneous enumerations because of incompletely measuring census duplications, and that the EFU had not accounted for a significant part of this understatement. They also helped to explain some of the uncertainty that arose from the rework of the EFU. The net effect of these studies was the conclusion that the A.C.E. overstated the level of undercount by at least 3 million persons. The level of this error is such that the ESCAP determined that the unadjusted data should be used.
Census Omissions
With regard to studies of census omissions, the Matching Error Study indicated that the Start Printed Page 56009A.C.E. overstated the net undercount due to P-sample matching error by about 385,000. The EFU indicated that a substantial number of movers were changed to nonmovers and vice versa. The net effect of these mover status changes suggests an overestimate of the match rate and therefore an understatement in the A.C.E. estimates of about 450,000. At the national level there is therefore a small net effect of about 65,000 on the accuracy of the measurement of census omissions. However, more research must be conducted to further study these effects.
Missing Data
The Committee examined a variety of alternative models to account for the effects of missing data. These models gave a wide range of results, implying widely varying effects on the A.C.E. estimates. The data examined by the Committee make clear that alternative missing data models both understated and overstated the effects of missing data on the A.C.E. estimates, depending on the choice of model. The Committee ultimately viewed the choice of model as an increase in the uncertainty associated with the A.C.E. results, but did not find evidence of bias resulting from this choice of model. This uncertainty should be considered in further analysis of the A.C.E. estimates.
Balancing Error
ESCAP I's concern with balancing error has for the most part been resolved, as further research indicated that the previously observed discrepancy did not appreciably influence the A.C.E. estimates.
Total Error Model
ESCAP I used a total error model to consolidate its research and to produce an overall assessment of A.C.E. accuracy. ESCAP II directed that an updated model be prepared to account for information from the new evaluation studies. The timing of some of the new evaluations, along with the complexities of both the studies and the A.C.E. design, did not allow preparation of an updated model that would incorporate all errors that impact the A.C.E. estimates. As discussed more fully in the body of the report, the ESCAP could not develop or verify a new total error model that would take into account all of the errors discovered in the EFU, Matching Error Study, and Person Duplication Studies. Even without the information from an updated total error model, however, it was clear to the Committee that the magnitude of the discovered errors precluded a recommendation in favor of the adjusted data.
Synthetic Error
Consideration of the synthetic error studies requires the completion of the total error model and will be included in the continued research.
Other Concerns
Additional studies allayed other concerns about the A.C.E. and the census. Studies revealed no evidence of significant contamination bias. The Committee concluded that the effect of excluding reinstated census people from the A.C.E. was minimal. The Committee further concluded that the kind, level and pattern of whole person imputation in Census 2000 did not call the A.C.E. results into question.
Next Steps
While the ESCAP has recommended against use of the adjusted data, the A.C.E.'s original objective of addressing the differential undercount must still be pursued. The totality of the evidence considered by the Committee leads it to believe that while Census 2000 successfully lowered the historical pattern of the differential undercount, it did not eliminate it. The net undercount remains disproportionately distributed among renters and minority populations. With further research, it is reasonable to expect that new information can be used to produce revised A.C.E. estimates. The evaluation results, including the new measurement tool of name matching, will be extremely valuable for evaluating the accuracy of Census 2000, planning for the 2010 census, and potentially for improving the post-censal estimates. Both census taking and coverage measurement are processes that evolve and improve with each census. The Census 2000 experience will help refine both census and coverage measurement processes for future censuses.
Table of Contents
Executive Summary
Areas of Research
Evaluation Results
Next Steps
Table of Contents
ESCAP II Report
Introduction
Background
ESCAP II Proceedings
Non-redistricting uses of the data
ESCAP II Research
Demographic Analysis
International Migration
Measurement of Vital Events
Results of Revised DA
Research to Evaluate the A.C.E. and Census 2000
Matching Error Study
Evaluation Followup
Person Duplication Studies
Measurement of Erroneous Enumerations, Including Duplicates
Measurement of Census Omissions
Correlation Bias
A.C.E. Missing Data
Balancing Error
Conditioning
Reinstated Late Additions
Census 2000 Imputations
Total Error Model and Loss Function Analysis
Synthetic Estimation
Conclusion
Attachments
ESCAP II Report
Introduction
Background
On March 1, 2001, the Acting Director of the Census Bureau recommended to the Secretary of Commerce that unadjusted census data be used as the Census Bureau's official redistricting data. This recommendation was in accord with the recommendation of the Executive Steering Committee for A.C.E. Policy (ESCAP). The ESCAP [1] was unable to conclude, based on information available at the time, that adjusted Census 2000 data were more accurate for redistricting. The ESCAP I Report is available on the Census Bureau's website, along with a voluminous Administrative Record supporting this recommendation.
The primary issue that precluded ESCAP I from recommending use of the adjusted data was the unexplained difference between the A.C.E. and Demographic Analysis estimates of the population. Demographic analysis (DA) initially estimated the national total population to be below the census count, while the A.C.E. estimated the population to be above the census count. This discrepancy raised the significant possibility of an undetected problem with the A.C.E. or the census. ESCAP I also identified concerns with balancing and synthetic estimation error as potential problems in the adjusted data. The Committee directed the preparation of an extensive evaluation program to inform its deliberations relating to the proposed use of adjusted data for nonredistricting purposes.
ESCAP II Proceedings
In the months since the ESCAP I Report, the Committee has embarked on a second round of deliberations to address the concerns identified in the report and to enable the Census Bureau to recommend whether the adjusted Census 2000 data should be used for nonredistricting purposes. These evaluations, the ESCAP II report series, set forth the underlying data that support the Committee's findings. The future uses in consideration include post-censal population estimates, demographic survey controls, and the census long form data products. Some of the required evaluations involved additional research, including additional field work and matching work.
ESCAP II considered a wide variety of research and analyses, and heard presentations of the reports on the attached list (Attachment 1). Some of these presentations provided background information to help the Committee interpret the results of other studies, while others bore directly on the adjustment recommendation. While the Committee considered and deliberated on all of the listed reports, this discussion will focus on those most directly relevant to the comparative accuracy of the adjusted and unadjusted data. This research was conducted over many months and represents diligent and thorough statistical and demographic analysis.[2]
Start Printed Page 56010The Associate Director for Decennial Census originally chartered the ESCAP on November 26, 1999, and charged the Committee to “advise the Director in determining policy for the A.C.E. and the integration of the A.C.E. results into the census for all purposes except Congressional reapportionment.” Although there was a change in the Associate Director for the Decennial Census position in June 2001, ESCAP II continued to be chaired by John Thompson to maintain continuity. The ESCAP resumed meeting on March 7, 2001, and met a total of 32 times, sometimes with more than one meeting per day. The ESCAP represents a body of senior career Census Bureau professionals, with advanced degrees in relevant technical fields and/or decades of experience in the federal statistical system. All are highly competent to evaluate the relative merits of the A.C.E. data versus the census data and are recognized for their extensive contributions to the professional community.
As in the ESCAP I process, the early sessions were primarily educational, designed to inform Committee members of the research operations and to present general information about non-redistricting uses of the data. The second phase involved presentation by knowledgeable employees of the new data and analyses as they became available. The Committee reviewed the data and analyses, sometimes asking staff to provide additional and new information. The third phase was deliberation, where the Committee members met privately. The final and briefest stage was review, where Committee members commented on the draft report. Again, as in the ESCAP I process, minutes were prepared for all sessions, except for the final ones, which were private deliberations.
During the education and evidence presentation phases, the Chair generally arranged presentations on major issues, issues that he identified on his own initiative or on the suggestion of Committee members. During the evidence presentation stage, authors of the analysis reports presented their data and conclusions to the Committee. The deliberation and review phases were less structured with various members raising topics for discussion and asking for evidence. No formal vote was held; this Report reflects a consensus of the ESCAP.
Non-Redistricting Uses of the Data
The ESCAP's recommendation covers the three non-redistricting uses of census data: post-censal estimates, demographic survey controls, and Census 2000 long form products. Certain Census Bureau data products have already been issued using only the unadjusted data, including the Census 2000 Redistricting Data Summary File, Demographic Profiles, Congressional District Demographic Profiles, Summary File 1 Data, and reports in the Census 2000 Brief Series.[3]
Post-censal estimates are made by updating the most recent census base with estimates of population change (births, deaths, and net migration). As directed by the Census Act, the Census Bureau prepares post-censal estimates at the national, state, and county level every year, and at the functioning governmental unit level every other year.[4] These estimates have a variety of uses, most notably in funding allocations, as the basis for sample survey controls, and as denominators for many important statistics.
The accuracy of the post-censal estimates for funding allocations is critical, as about $200 billion are allocated based on these data each year. Medicaid (Title XIX) is the largest program to distribute federal funding based on population estimates, distributing over $100 billion each year based on the post-censal estimates. Community Development Block Grants from the Department of Housing and Urban Development, and Title I Basic, Concentration, and Targeted Grants from the Department of Education are two additional federal programs that use post-censal estimates as factors in their funding formulas to distribute federal monies. The individual states also have within-state fund allocation programs, many of which use post-censal estimates to allocate funds to sub-state areas.
Many federal agencies use post-censal estimates as denominators to produce per capita statistics. Examples are per capita income, crime statistics, incidence of certain health conditions, birth rates, and mortality rates. The numerators of these statistics can be obtained at various points in time throughout the decade. In the absence of updated information, calculating these kinds of statistics on a static 2000 denominator would be misleading; therefore, many federal agencies use post-censal estimates of population.
Demographic survey controls are used by many national sample surveys to transform the data they collect into nationally representative estimates. The most notable is the Current Population Survey, or CPS, which is used to calculate the monthly unemployment rate. Sample surveys generally have poorer coverage than a census; therefore, in order to improve the accuracy of estimates from a sample survey, the survey estimates are controlled to independent measures of the number of people in certain age, sex, race, and Hispanic origin groups, such as the post-censal estimates.
The ESCAP Committee also considered whether adjusted or unadjusted Census 2000 data should be used for the controls for estimates based on data from the Census 2000 long form. The long form collects more extensive characteristic data from a sample of about seventeen percent of the population. Long form data are used to provide local communities with data on education, employment, housing, and various other social and demographic characteristics essential to efficient planning. Additionally, the long form provides the detailed local demographic and social characteristics used in some federal formula allocation programs. In order to produce estimates for the country as a whole from this sample, Census 2000 data from the short form items are used as controls.
ESCAP II Research
In the months since the ESCAP I Report, the Committee embarked on a second round of deliberations to address the concerns identified in the report and to enable the Census Bureau to recommend whether adjusted Census 2000 data should be applied for non-redistricting uses. ESCAP II, therefore, directed the preparation of a number of evaluation studies, as described in detail in Attachment 2. Research centered around four areas, demographic analysis, the A.C.E., Census 2000, and synthetic error. The results of this research are set forth below.
Demographic Analysis
ESCAP I's primary concern was that DA estimates were inconsistent with A.C.E. estimates. The Census Bureau expected, based on past experience, that demographic analysis would posit a higher estimate of the total population than the A.C.E. because of the presence of correlation bias, and that the two estimates would agree generally on the coverage of certain populations. Instead, the Base DA estimates were lower than both the Census 2000 population counts and the A.C.E. estimates. In response, the Census Bureau developed its Alternative DA estimates by doubling the unauthorized immigration assumed to have occurred during the 1990's. Doing so yielded a number of foreign born for 2000 that was roughly consistent with that reported by the March 2000 Current Population Survey.[5] The Alternative DA estimates were, however, still significantly lower than the A.C.E. estimates. The Alternative DA indicated that Census 2000 undercounted the population by 0.32 percent, while the A.C.E. produced a net undercount estimate of 1.15 percent.[6]
ESCAP I concluded that the inconsistency in the estimates of the total national population must have derived from one or more of three possible explanations. The first explanation was that all available 1990 census data, including the census results, the 1990 coverage measurement survey, and the 1990 DA estimates, significantly understated the Nation's population, but that Census 2000 found this previously un-enumerated population. The second explanation was that DA underestimated population growth between 1990 and 2000. The third explanation was that the A.C.E. overestimated the Nation's population. ESCAP II directed that further research on demographic analysis be conducted. It focused on two main topics: international migration and measurement of vital events like births and deaths.
International Migration
Assumptions regarding international migration were the most uncertain component in the demographic analysis estimates completed by March 1, 2001. Start Printed Page 56011Although the research agenda for the March through October period focused primarily on those components of international migration that are less well measured (e.g., emigration, temporary migration, and unauthorized migration), the work also included research into legal immigration and the demographic characteristics of migrants used in the March 2001 DA estimates.
Part of the analysis involved discussions with independent experts on demographic analysis and international migration. The purpose of a March 20, 2001, was to explain how the DA estimates differed from the A.C.E. estimates, and to discuss how to prioritize short-term and long-term research activities. Attendees included experts from the statistical community, academia, state agencies, the Census Bureau's advisory committees, professional organizations, and international organizations. A nearly unanimous recommendation from these experts was to focus on assumptions and estimates of the components of international migration, as these numbers were subject to the most uncertainty. Because of scheduling conflicts, two smaller meetings with other migration experts were held at the annual meeting of the Population Association of America on March 29-30, 2001.
Expert advice was sought again, on September 24, 2001, after completion of the original research activities (validation of the 1990 estimates and updated 2000 estimates) that produced the revised DA estimates. Although these experts generally agreed with the methodology used to calculate components of international migration, they had concerns about the assumptions regarding the undercount of international migrants. Specifically, they believed the undercount assumption of 15 percent for unauthorized migrants, which was incorporated in the Revised DA, was probably too high, especially given the A.C.E. undercounts for other hard-to-enumerate groups. In addition, they urged renaming the residual migrant category as the residual foreign-born, or separating the residual foreign born into known components (“quasi-legal” migrants) and the implied unauthorized migrant population. Both of these suggestions were incorporated into a subsequent sensitivity analysis.
The sensitivity analysis of assumptions about coverage of various components of the foreign-born population showed that the total number of foreign born did not vary enough to have much effect on the DA estimate of the total population. For example, the lower bound assumption of 3.3 percent net undercount of the foreign-born equated to a population of 281.3 million, or more than 3 million people lower than the A.C.E. total population. The upper bound assumption of 6.7 percent was consistent with a population of 282.5 million—still more than 2 million lower than the A.C.E. total population. These results led the Census Bureau to conclude that the Revised DA was an appropriate benchmark for assessing Census 2000 and the A.C.E. estimates.
Measurement of Vital Events
Other research examined the remaining assumptions underlying the DA components of change, including the birth, death, and Medicare components. Although estimates of deaths and the size of the elderly population did not change much, the estimates of historical births changed because of this research. The principal outcome was a revision in the assumptions about registration completeness of births since 1968. The previous DA estimates assumed that all births in years since 1968 (the last year of testing birth registration completeness) were registered at the same percent (99.2 percent). For the Revised DA estimates, registration completeness gradually reached 100 percent by 1985 (the first year natality statistics were reported electronically from all the States), and remained at 100 percent through 2000. This revision lowered the estimated number of births for 1968-2000 by 715,000 (which lowered the Revised DA estimate of the total population in 2000 by the same amount).[7]
Results of Revised DA
The research undertaken between March and October allayed two fundamental concerns: first, the possibility that the Alternative DA did not capture the full growth of the population between 1990 and 2000, and second, the possibility that the 1990 DA was lower than the true population. In fact, the cumulative effect of the research on immigration, births, and deaths led to Revised DA estimates that were only slightly different from the Alternative DA. In other words, the inconsistency between the Alternative DA and the A.C.E. estimates was not the result of unexplained problems in DA. These results, in combination with other evidence, led the ESCAP to conclude that the A.C.E. overestimated the Nation's total population.
More specifically, the Revised DA lowered the net undercount rates from 1.85 to 1.65 percent in 1990, and from 0.32 to 0.12 percent in 2000, but did not alter the DA finding that the net undercount rate in 2000 was substantially lower than in 1990.[8] The Revised DA continued to measure a lower net undercount than the A.C.E., and in fact was very close to the Alternative DA estimate used by ESCAP I in March. The Revised DA estimated a net undercount of 0.3 million, or 0.12 percent, compared with the A.C.E. estimate of a net undercount of 3.3 million, or 1.15 percent. Population totals from the Base DA, Alternative DA, and Revised DA, along with the Census 2000 counts and the A.C.E. estimates, are shown in Table A. The corresponding numerical and percentage undercounts are shown in Figure 1.
Start Printed Page 56012Table A.—Resident Population Totals from Census 2000, Demographic Analysis, and the A.C.E.: April 1, 2000
Source Total population Base DA (March) 279,598,121 Census 2000 281,421,906 Revised DA (September) 281,759,858 Alternative DA (March) 282,335,711 A.C.E. 284,683,782 As shown in Table B below, the Revised DA implied a greater reduction than the A.C.E. in net undercount in Census 2000 compared with the 1990 census. Under the revised DA, the net undercount rate was reduced by 1.53 percentage points, from 1.65 percent in 1990 to 0.12 percent in 2000. In contrast, the A.C.E. estimate of 1.15 percent net undercount in 2000 was 0.43 percentage points lower than the 1.58 percent in 1990. Additionally, both DA and the A.C.E. measured a reduction in the net undercount rates of Black and nonBlack children compared with 1990. Both methods also measured a reduction in the net undercount rates of adult Black men and women.
The revised DA and A.C.E. estimates continued to disagree in that DA found a reduction in the net undercount rates of nonBlack men and women in Census 2000 compared with the rates of previous censuses. The A.C.E. indicated no change or a slight increase in undercount rates for nonBlack adults as a group.
Demographic analysis also provided evidence that correlation bias was not reduced between 1990 and 2000. Comparisons of the DA and A.C.E. sex ratios (men per 100 women) showed that correlation bias in the survey estimates was not reduced for Black men between 1990 and 2000. The A.C.E. sex ratios for Black adults were much lower than the expected sex ratios based on DA, implying that the A.C.E. did not capture the high undercount rate of Black men relative to Black women. The size of this bias was about the same as in the 1990 coverage measurement survey.
Start Printed Page 56013Table B.—Estimates of Percent Net Undercount, by Race, Sex, and Age: 1990 and 2000
[a minus sign denotes a net overcount]
Revised demographic analysis PES/A.C.E. Category 1990 2000 PES 1990 A.C.E. 2000 Total 1.65 0.12 1.58 1.15 Black 5.52 2.78 4.43 2.07 0-17 5.27 1.30 7.05 2.92 Male, 18+ 9.57 7.67 3.76 2.10 Female, 18+ 2.05 0.75 2.64 1.28 NonBlack 1.08 −0.29 1.18 1.01 0-17 1.12 0.54 2.46 1.27 Male, 18+ 1.74 0.29 1.19 1.43 Female, 18+ 0.44 −1.02 0.34 0.44 Source: U.S. Census Bureau. Note: Estimates by race shown for 2000 are based on the “average” of Model 1 and Model 2, as described in ESCAP II Report No. 1, “Demographic Analysis Results.” Research to Evaluate the A.C.E. and Census 2000
A number of the studies described more fully in Attachment 2 evaluate the accuracy of the A.C.E. and Census 2000. The A.C.E. is composed of two samples, the E-sample, which measures erroneous enumerations, and the P-sample, which measures census omissions. The E-sample is also used to estimate the number of census persons who do not have sufficient information to be used in A.C.E. matching and followup operations. The Dual System Estimates (DSEs) are computed by combining E-sample estimates of erroneous enumerations and insufficient information with P-sample estimates of omission. Therefore it is critical that the E-sample correctly account for erroneous enumerations and that the P-Sample correctly account for omissions. The evaluations were designed to measure the accuracy of both the P- and E-Samples.
Three studies in particular produced substantial new information for ESCAP II: the Matching Error Study, the Evaluation Followup (EFU), and the Person Duplication Studies.
Matching Error Study 9
The Matching Error Study provided the P-sample matching error rate and the E-sample processing error rate. Expert matchers clerically rematched all of the people in a one-fifth subsample of the A.C.E. clusters to determine the best match code. This information was compared to match codes assigned in production of the actual A.C.E. estimates.
Evaluation Followup 10
The EFU consisted of a reinterview of households in the same one-fifth subsample of A.C.E. clusters used in the Matching Error Study, with additional subsampling. EFU results helped determine the accuracy of the production data processed and collected in the P- and E-Samples. The EFU interview results were used to measure the accuracy of the classification of correct and erroneous census enumerations as determined by the E-Sample. The results were also used to measure the accuracy of the P-Sample data regarding mover status and Census Day residence.
Person Duplication Studies 11
The Person Duplication Studies took advantage of the fact that Census 2000 was the first census to record name information in the data capture system in a way that permits computer matching. This new methodology permitted the Census Bureau to direct a nationwide computer matching operation to measure the level of duplication in the census. These studies also examined how well the A.C.E. accounted for these duplicates. While the A.C.E. matched respondents in the same block and surrounding blocks, this new tool permitted the Census Bureau to search for duplicates throughout the country. The Person Duplication Studies involved only computer matching, as the Census Bureau lacked the resources and time to match to the entire country using both computer and clerical matching. The computer matching thus understated the actual level of duplication. These studies also compared the results of the EFU with the Person Duplication Studies to determine whether the EFU correctly measured these duplications.
Some of the error components produced in these studies suggest that the A.C.E. overestimated the net undercount while others suggest the net undercount was underestimated. The results of these studies are discussed below, and are the basis for the recommendation that the adjusted data not be used due to a significant problem in the measurement of erroneous enumerations resulting in an overstatement of the net undercount by at least 3 million people.
Measurement of Erroneous Enumerations, Including Duplicates
The evaluations of the accuracy of the A.C.E. indicated that the A.C.E. did not measure a significant portion of the Census 2000 erroneous enumerations. The measurement of erroneous enumerations is critical to both the national net undercount and to sub-national estimates. The effect of this error resulted in the A.C.E. significantly overstating the net Census 2000 undercount by at least 3 million people, with an approximate range of 3 to 4 million. The significance of this error was such that the ESCAP recommended that the unadjusted data be used for Census 2000 non-redistricting purposes.
The EFU and the Person Duplication Studies described above provided the most significant information regarding the measurement of erroneous enumerations. The initial EFU results gave evidence of a significant understatement in the A.C.E. measurement of erroneous enumerations. Because of the significance of the understatement, the EFU was extensively reviewed. The revised EFU again also indicated a significant problem with understating the level of erroneous enumerations, and resulted in a high level of cases left unresolved or conflicting. The Person Duplication Studies found that a significant number of duplicate enumerations were not measured by the A.C.E., and that the EFU did not pick up significant portions of this error. The Person Duplication Studies also resolved a portion of the cases left unresolved or conflicting by the EFU Review.
The EFU initially found a 3.5 percent change in enumeration status from that measured by A.C.E. production. A total of about 2,800,000 production “correct enumerations” (SE 223,000) were re-coded as “erroneous enumerations,” while about 900,000 production “erroneous enumerations,” (SE 99,000) were re-coded as “correct enumerations.” [12] The net difference found by the EFU was 1,900,000. The EFU also included about 4,500,000 cases (SE 353,000) that could not be resolved. This study indicted that, at a minimum, the A.C.E. overstated the level of net undercount by about 2 million people.
Because of the EFU's potentially significant implications for the A.C.E. estimates, ESCAP decided that further EFU analysis was needed. Accordingly, more highly trained matching analysts from the National Processing Center (NPC) directly reviewed a subsample of the EFU and production cases. Matching analysts are employees at NPC with many years of training in matching, some with over 20 years of experience, who supervise and perform quality assurance for all the A.C.E. matching operations.
This additional review confirmed that there were errors in the A.C.E.'s identification of erroneous enumerations. A total of about 1,800,000 enumerations (SE 189,000) that were coded as correct in production were subsequently coded erroneous in the evaluation, while the number of enumerations coded as erroneous in production that were then coded as correct in the review was about 361,000 (SE 46,000).[13] Consequently, the net difference in the “correct enumeration” to “erroneous enumeration” and “erroneous enumeration” to “correct enumeration” cells was estimated at 1,450,000, rather than the initial level of 1,900,000. However, the review identified over 15 million cases which could not be resolved or for which conflicting results were observed. Depending on assumptions that could be made regarding the enumeration status of these cases, the overstatement of the net undercount could range from about 1.45 million to up to 5.9 million people.[14]
The Person Duplication Studies found that a significant number of duplicate enumerations were not correctly measured by the A.C.E. or by the EFU. Furthermore, when the Person Duplication Studies results are combined with the EFU results, some of the unresolved and conflicting cases can be explained. Based on this work, more refined ranges for the level of the A.C.E. overstatement were developed. Direct estimates were produced from the Person Duplication Studies that indicated that the level of A.C.E. error not measured was about 3 million persons. In addition, it is also expected that further refinements to the treatment of the unresolved and conflicting cases would lead to about an additional 800,000 errors. Thus, the approximate range of the potential overstatement of the net undercount was reduced to between 3 and 4 million persons.Start Printed Page 56014
Finally, the EFU provided information regarding whether the A.C.E. accurately measured Census 2000 discrepant enumerations.[15] This study showed that the net effect of erroneously identifying discrepant persons as correct enumerations in production and vice versa is an overstatement of about 6,000 correct enumerations in production, with a standard error of about 30,000.[16] This difference is statistically insignificant.
Measurement of Census Omissions
Measurement of census omissions is based on the P-Sample. Therefore, accurate matching of the P-sample to the census, and the correct classification of mover status and Census Day residence, are important components of the P-Sample. Information about the accuracy of the matching was produced by the Matching Error Study. Information about the accuracy of the classification of movers and Census Day residence was derived from the EFU.
The Matching Error Study indicated that the level of matching error from the P-Sample would result in about a 385,000 overstatement of the net undercount.[17]
The EFU demonstrated that misclassification of movers in the A.C.E. may have resulted in an understatement of about 450,000 in the net undercount.[18] It should be noted that this final effect was the result of significant changes in mover status. These changes involved a large number of movers becoming nonmovers and vice versa. The EFU indicated that about 4.5 million people classified as “movers” in production became “nonmovers,” and that about 2.4 million people classified as “nonmovers” in production became “movers.” At the national level there is therefore a small net effect of about 65,000 on the accuracy of the measurement of census omissions. However, more research must be conducted to further study these effects.
The ESCAP was concerned about the EFU measurement of movers who became nonmovers, specifically about whether the EFU measured too few movers, due to its questionnaire design. To be classified a nonmover, the EFU required less detailed information than needed to be classified a mover. An examination of the bias caused by mover status changes indicates that the effect of mover-to-nonmover changes was greater in absolute value then the effect of nonmover-to-mover changes. Therefore, if there was an over reporting of nonmovers in the EFU, the effect would be to lower the measured net bias described above. Additional work must clearly be conducted to clarify this information. Furthermore, even though the net effects of these errors cancel at the national level, assessment of the subnational effects also requires further research.
Correlation Bias
Correlation bias refers to the tendency for people enumerated in the census to be more likely to be included in the A.C.E. than those missed in the census. Correlation bias usually results in a downward bias in the DSE. This type of bias can result from causal dependence, that is, the tendency of some people to be more likely to be included in the A.C.E. because they had been included in the census, or vice versa, or from heterogeneity. Heterogeneity bias can arise because different people within poststrata both have different chances of being counted in the census and different chances of being included in the A.C.E. To cause a bias, these chances must be correlated, for example, those likely to be missed by the census are also most likely to be missed by the A.C.E. ESCAP I assessed possible correlation bias in the A.C.E. estimates by comparing the A.C.E. and DA results. Correlation bias estimates available for the March ESCAP recommendation used DA estimates as of February 26, 2001. ESCAP II directed that the correlation bias estimates be recomputed to use the Revised DA estimates and other newly available data. Revised correlation bias estimates were computed and discussed by the Committee.
Like ESCAP I, ESCAP II was faced with the fact that while correlation bias exists, it is difficult to quantify. Correlation bias is an important component of assessing the A.C.E.'s accuracy because assumptions regarding correlation bias have a large effect. ESCAP II considered several models of correlation bias, including whether correlation bias should be assumed only for the Black population, whether the Hispanic population should be assumed to have the same degree of correlation bias as the Black population, and whether correlation bias should be assumed to be the same for owners and renters. Correlation bias would mean that the A.C.E. estimates of total population were too low by about 750,000 to 1.3 million, depending on which model for correlation bias is assumed.[19] Currently the Census Bureau has no means of incorporating these net biases in the production DSEs.
A.C.E. Missing Data
Missing data occurs in the A.C.E. if, after all followup attempts, there remain households that were not interviewed, or households with some portions of the person data missing, such as age or race. Sometimes the missing item involves the status of whether a person matched, was a resident on Census Day, or was correctly enumerated. Statistical models are used to account for missing data. ESCAP I viewed the rates of occurrence of unresolved A.C.E. cases for match status, correct enumeration status, and mover status as low enough to preclude serious biases in the A.C.E. results. ESCAP II directed development of additional missing data models to assess the effect on the estimates of using alternative models.
The treatment of missing data can have a large effect on the A.C.E. estimates under certain assumptions. ESCAP II examined a variety of models to predict the effects of missing data. Seven basic methods for addressing the components of missing data in the A.C.E. estimates were considered in various combinations. Each resulting alternative model was used to compute new DSE. The alternatives considered indicated that the choice of missing data model can have a significant effect on the resulting estimates of coverage error, causing the DSEs to be over- or under-stated. The Census Bureau chose to represent the effects of these alternative models in the form of increased uncertainty in the A.C.E. estimates.
The DSEs that resulted from the alternative models were used to calculate a measure of variation similar to a sampling error. This research found that non-sampling variability from the use of alternative missing data models was considerable. At the national level, the overall magnitude of the variation resulting from all combinations of the alternative missing data models (about 530,000) was higher than the DSE sampling error (about 380,000).[20] When some alternative models were excluded, the standard deviation was of approximately the same magnitude as the DSE sampling error, but there is no evidence to suggest that the measure of variation based on all methods is unreasonable. In fact, arguments could be made that this measure understates the actual levels of variation due to missing data because it assumes that the alternatives considered were randomly distributed around an average, that is, each alternative was equally likely.
ESCAP II also examined information describing the level and distribution of A.C.E. missing data compared to the 1990 coverage measurement survey. The purpose of this review was to put the levels of missing data in context with 1990, and to add to the understanding of the alternative missing data model analysis previously described. The 2000 unresolved rates were slightly higher than those in 1990, but were not initially viewed as high enough to cause major concern. The alternative model analysis indicated that missing data had a more significant effect than anticipated, possibly due to changes in the methods for incorporating movers into the DSE, or to a more diverse set of alternative models.
Balancing Error
The ESCAP I Report had identified balancing error as a potential problem, noting that the A.C.E. found 3 million more matches in surrounding blocks than correct enumerations, a result which could have affected the accuracy of the estimates. The A.C.E. matching is carried out in a defined search area consisting of the A.C.E. sample blocks (clusters) and a targeted area of blocks surrounding or bordering the A.C.E. blocks. Significant differences were discovered between the number of matches and correct Start Printed Page 56015enumerations found in the surrounding blocks. Various scenarios were identified that could explain the difference, and ESCAP II directed that evaluations be conducted to investigate the source of this difference, identify the scale of any error, and assess whether its magnitude could significantly affect the accuracy of the adjusted data. This analysis necessitated additional field work.
The evaluations indicated that the causes of the discrepancies were for the most part related to a scenario that does not significantly affect the resulting DSEs. That is, most of the 3 million difference was attributable to the A.C.E. listing housing units in the blocks surrounding the sample blocks, which had little, if any, effect on the DSE. The evaluations did, however, detect about 246,000 A.C.E. people (SE 82,000) located out of the surrounding blocks.[21] The evaluations also estimated that an additional 195,000 people (SE 56,000) were incorrectly identified as having been correctly enumerated, but although they were found to have been out of the search area. The effect of these errors is an approximate overstatement of the net undercount by about 450,000 persons. It appeared that a portion of these errors were also included in the results of the EFU and Matching Error Study. While some additional work is required to completely resolve the potential effects of balancing error, the ESCAP believes that most of the previous concerns regarding balancing error have been addressed.
Conditioning
Conditioning, or contamination bias, refers to the situation where the A.C.E. influenced the census. ESCAP I assumed in its deliberations that any effects of conditioning or contamination bias were minimal, and could be ignored. This assumption was based on previous experiences in the 1990 census. Evidence presented to ESCAP II confirmed that contamination bias was not a problem in Census 2000, as research did not identify any evidence of its presence.[22]
Reinstated Late Additions
While ESCAP I did not identify Census 2000 late additions as a source of error, levels of these additions were significantly higher than in the 1990 census. Late additions refer to persons included in the final census count who were excluded from A.C.E. matching and dual system estimation because of their late inclusion. For Census 2000, the late additions consisted exclusively of housing units that were temporarily removed from the census because they were suspected to duplicate other housing units, but which were later (after the A.C.E. matching process started) reinstated into the final census after further research. ESCAP I determined that if the reinstated people were a small percentage of the correct enumerations in the census, or if their A.C.E. coverage rate was similar to the A.C.E. coverage rate for census people included in A.C.E., then there would be a minimal effect on the DSEs.[23] To validate this assumption, additional research was conducted.
Based on this additional work, ESCAP II concluded that the effect of excluding reinstated census people from the A.C.E. was minimal. The A.C.E. coverage rate may have been overestimated by 0.034 to 0.082 percentage points.[24] This result confirmed the assumption, previously made in the ESCAP I Report, that the effect of the reinstated people on the DSEs would be small.
Census 2000 Imputations
Census 2000 experienced a higher rate of whole person imputations than in the 1990 census. Whole person imputations were excluded from A.C.E. matching activities, but reflected in the census coverage error as measured by the A.C.E. ESCAP I was concerned that information was not available at the time to validate that the whole person imputations were explainable by Census 2000 design features (and thus should have no discernible impact on the A.C.E.). ESCAP II concluded that the kind, level, and pattern of whole person imputations in Census 2000 raised no additional issues relative to the accuracy of the A.C.E. adjustment.
Approximately 5.77 million persons had all their characteristics (short form data items) imputed in Census 2000, compared to 1.97 million persons in the 1990 census. Approximately 1.2 million of these persons were added to the census count through a count imputation process. The remaining 4.6 million persons were counted directly through the census enumeration process, but had all their person characteristics imputed because information about them was substantially missing from the census records.[25] Research into the sources of the whole person imputations identified that changes in the census design contributed to the level of housing units requiring imputation. Furthermore, the count imputation rate was comparable to the rate experienced in the 1970 and 1980 censuses.
Characteristics of the imputed persons were also examined. The age, race and sex characteristics of the population requiring some form of imputation was similar to the data-defined population with the exception of the age category under 18. The relatively higher percent of the population under age 18 in the imputed population was due to the high proportion of younger people in the “within household” category and reflected the fact that large households (greater than 6) were likely to have children not able to be accommodated by the 6-person mail-return form, and thus require imputation.[26]
Total Error Model and Loss Function Analysis
The total error model is designed to incorporate the results of the evaluations to produce a composite estimate of the bias and variability (both sampling and non-sampling) in the A.C.E. These measures are used to correct the A.C.E., thus producing measures of the “true” population that can be used to assess the accuracy of the adjusted and unadjusted census data. The total error model produces measures of this “true” population in the form of target populations which are based on various assumptions because the truth is not known.[27] The total error model used by ESCAP I relied in part on 1990 data, as complete Census 2000 evaluations of the A.C.E were not then available. This preliminary model adapted the 1990 total error model to the Census 2000 environment. For the current deliberations, the ESCAP II wanted to base recommendations on current data. Therefore, development of a new total error model was undertaken to incorporate the results of the Census 2000 evaluations. The complexities of the revised EFU study and the A.C.E. design did not allow for the development and validation of a new total error model. Therefore, the ESCAP has had to rely on the individual evaluations described above. It is also apparent that a significant amount of additional research and development will be necessary before a complete total error model is available. ESCAP II believes that the information currently available is strong enough to preclude the use of adjusted data for any further Census 2000 purposes, but that future research may lead to improved A.C.E. estimates, that could, in turn, be used to improve the post-censal estimates.
Synthetic Estimation
The A.C.E. estimation methodology produces estimated coverage correction factors for each post-stratum. These factors were carried down within the post-strata in a process referred to as synthetic estimation. The key assumption underlying synthetic estimation is that net census coverage is relatively uniform within the post-strata. Failure of this assumption leads to synthetic error. Synthetic error affects both the adjusted and unadjusted census results. ESCAP I analyzed the effects of synthetic error by using artificial populations, which are populations created with surrogate variables to reflect the distribution of net coverage error. Additional synthetic estimation analysis for ESCAP II focused on expanding the scope of the earlier artificial population work.
ESCAP II continues to be concerned with synthetic error because it is not included directly in the total error model. However, as the synthetic error analysis must be considered in conjunction with loss function analysis based on the total error model, there is no need to consider the effects of synthetic error at this point.Start Printed Page 56016
Conclusion
ESCAP II recommends that unadjusted Census 2000 data be used for non-redistricting purposes. The Committee was persuaded by new evidence indicating that the A.C.E. overstated the net undercount by at least 3 million individuals as a result of the survey's failure to measure a significant number of census erroneous enumerations. However, the Committee believes that, while Census 2000 successfully lowered the differential undercount, it did not eliminate it. Therefore, the Census Bureau will conduct further research and analyses to attempt to produce revised A.C.E. estimates that can be used to improve future post-censal estimates.
The ESCAP II recommendation, if accepted, means that Census 2000 long form results will be weighted with unadjusted population counts, and that post-censal population estimates and survey controls will also rely on unadjusted data. The Census Bureau will continue research on the issues discovered with the A.C.E., particularly the issue of census duplicates and their estimation or detection. It is quite possible that this research will develop methods to improve future population estimates by combining information from the census, A.C.E., and the A.C.E. evaluations, including the Person Duplication Studies. Post-censal estimates and survey controls are updated annually, offering the opportunity to incorporate improvements. Even if the research does not lead to improved post-censal estimates, it will still further our understanding of the nature of census duplications and other erroneous enumerations, and the problems with their estimation by the A.C.E. This knowledge will be vitally important to the planning of the 2010 census and to the improvement of future coverage surveys.
Both census taking and coverage measurement are processes that evolve and improve with each census. The Census 2000 experience will help refine both census and coverage measurement processes for future censuses.
Attachments
1. List of ESCAP II Reports
2. Analysis Plan for Further ESCAP Deliberations Regarding the Adjustment of Census 2000 Data for Future Uses
3. Field Operations to Answer the Concerns about Lack of Balance
Attachment 1.—ESCAP II Reports
Report No. Title Author/Presenter 1 ESCAP II: Revised Demographic Analysis Results J. Gregory Robinson. 2 ESCAP II: Evaluation of Lack of Balance and Geographic Errors Affecting Person Estimates Tamara Adams, Xijian Liu. 3 ESCAP II: Evaluation Results for Changes in A.C.E. Enumeration Status David A. Raglin, Elizabeth A. Krejsa. 4 ESCAP II: A.C.E. Eerroneous Enumerations errors: Analysis of Census Discrepant Persons Elizabeth A. Krejsa. 5 ESCAP II: E-Sample Erroneous Enumerations Roxanne Feldpausch. 6 Census Person Duplication and the Corresponding A.C.E. Enumeration Status Roxanne Feldpausch. 7 ESCAP II: Accuracy and Coverage Evaluation Matching Error Susanne L. Bean. 8 Accuracy of the 2000 Census and A.C.E. Estimates Based on Updated Error Components—Total Error Model Rita J. Petroni. 9 Evidence of Additional Erroneous Enumerations from the Person Duplication Study Robert E. Fay. 10 ESCAP II: Estimation of correlation Bias in 2000 A.C.E. Estimates Using Revised Demographic Analysis Results William R. Bell. 11 ESCAP II: Analysis of Unresolved Codes in Person Matching Xijian Jim Liu, John A. Jones, Roxanne Feldpausch. 12 ESCAP II: Analysis of Missing Data Alternatives for the Accuracy and Coverage Evaluation Don Keathley, Anne Kearney, William R. Bell. 13 ESCAP II: Effect of Excluding Reinstated Census People from the A.C.E. Person Process David A. Raglin. 14 Conditioning of Census 2000 Data Collected in Accuracy and Coverage Evaluation Block Clusters Katie Bench. 15 ESCAP II: Analysis of Movers Xijian J. Liu, Rosemary L. Byrne, Lynn M. Imel. 16 ESCAP II: Evaluation Results for Changes in Mover and Residents Status in the A.C.E David A. Raglin, Elizabeth A. Krejsa. 17 ESCAP II: Census 2000 Housing Unit Coverage Study Diane F. Barrett, Michael Beaghen, Damon Smith, Joseph Burcham. 18 ESCAP II: P-sample Nonmatch Analysis Glenn Wolfgang, Tamara Adams, Peter Davis, Xijian Liu, Phawn Stallone. 19 ESCAP II: Analysis of Non-Matches and Erroneous Enumerations Using Logistic Regression Michael Beaghen, Roxanne Feldpausch, Rosemary Byrne. 20 ESCAP II: Person Duplication in Census 2000 Thomas Mule. 21 ESCAP II: Analysis of Census Imputations Fay F. Nash. 22 ESCAP II: Characteristics of Census Imputations Signe I. Wetrogan, Arthur R. Cresce. 23 ESCAP II: Sensitivity Analysis for the Assessment of the A.C.E. Synthetic Assumption Richard Griffin, Donald Malee. 24 ESCAP II: Results of the Person Followup and Evaluation Followup Forms Review Elizabeth A. Krejsa, Tamara Adams. July 26, 2001.
Attachment 2—Analysis Plan for Further ESCAP Deliberations Regarding the Adjustment of Census 2000 Data for Future Uses
Background
On March 1, 2001, The Census Bureau issued the Executive Steering Committee for A.C.E. Policy (ESCAP) recommendation that the Census 2000 Redistricting Data not be adjusted based on the Accuracy and Coverage Evaluation (A.C.E.) program data. The ESCAP was unable to conclude, based on information available at the time, that the adjusted Census 2000 data were more accurate for redistricting.
By mid-October, the Census Bureau will recommend whether Census 2000 data Start Printed Page 56017should be adjusted for future uses, such as the census long form data products, post-censal population estimates and Census Bureau demographic survey controls. In order to inform this decision, further research will be conducted generating data for ESCAP's review. The analyses will focus on resolving the concerns that ESCAP identified during its deliberations for the redistricting adjustment decision. This document describes the research agenda and is organized by the topic areas of concern.
The broad, overarching concern was that the Demographic Analysis and the A.C.E. estimates of the population were inconsistent. Even though alternative demographic estimates were produced by varying the assumptions underlying the Demographic Analysis, the highest reasonable estimate indicated that Census 2000 undercounted the population by 0.32 percent, while the A.C.E. produced a net undercount estimate of 1.15 percent.[28] In previous censuses since 1960, the Demographic Analysis estimates were used to evaluate decennial census coverage. The estimate derived through the 1990 coverage measurement survey was reasonably consistent with the 1990 Demographic Analysis estimate of the total population. When the corresponding estimates for Census 2000 were found to reflect substantial differences in the population estimates, this concerned the ESCAP. Four scenarios were identified that could explain this result:
- The 1990 census coverage measurement survey (Post Enumeration Survey), 1990 Demographic Analysis estimates, and the 1990 census may have understated the Nation's population, while Census 2000 included portions of this previously unidentified population.
- Demographic Analysis estimates might not have captured the full growth between 1990 and 2000, specifically due to static assumptions about critical components of international migration such as unauthorized migration, temporary migration, and emigration.
- Census 2000, as adjusted by the A.C.E., might overestimate the Nation's population. This situation raises the possibility of an undiscovered problem with the A.C.E. or Census 2000 methodology.
- A combination of these explanations.
To address these possibilities, further research is required into the quality of the three independent measures of the population—the Demographic Analysis estimate, the A.C.E. estimate and the census count itself. Specifically, research will address whether the Demographic Analysis estimate was too low and/or whether the adjusted estimate was too high. The latter situation could have occurred if either the A.C.E. did not measure the coverage error accurately or the census count had coverage error reflected by components not measured by the A.C.E.
In addition, the ESCAP was concerned about two other issues related to the A.C.E. estimates—balancing error and synthetic error. Balancing error occurs in the A.C.E. when cases are handled differently in the two independent samples (the P- and E-samples) when identifying gross omissions and erroneous enumerations. This is explained more fully under section B.1.a below. Synthetic error reflects the extent that net census coverage within a post-stratum is not relatively uniform. Uniformity of coverage is the underlying assumption of the synthetic estimation process of carrying coverage correction factors down to the block level. The concerns regarding synthetic error are described more fully in section D below.
The analysis agenda is organized around four basic areas of research: 1) recalculation of Demographic Analysis estimates using new migration assumptions as well as new birth and death data, 2) A.C.E. issues, including balancing error, 3) Census 2000 issues and 4) synthetic error.
A. Demographic Analysis (DA) Research
This area of research addresses the discrepancy of the demographic analysis data and the A.C.E. adjusted estimates of population. Specifically, this area of research will reexamine the historic levels of the components of population change to address the scenarios dealing with the possibility that the 1990 Demographic Analysis estimates understated the Nation's population and that demographic analysis did not capture the full growth between 1990 and 2000. Consultation with demographic experts inside and outside the Census Bureau has led to a research program consisting of a variety of research projects focused on the methodologies and underlying estimates of the components of population change. The research activities are concentrated in two areas:
1. International Migration
Assumptions regarding international migration are the most uncertain component of the demographic analysis estimates. The international migration component represents a combination of several components. Some of these components, e.g. legal immigration, are measured through continuous administrative data. For other components, e.g. temporary migration, emigration, and unauthorized migration, we do not have administrative data to provide continuous and current measurements. In the past, we have relied upon the most recent decennial data to develop a once a decade measure of these components. Thus, for the 1990 to 2000 decade, we would have relied upon the measurement from the 1990 census to develop an estimate for the 1990 to 2000 decade.
This work will involve examining preliminary data from the Census 2000 long form and the Census 2000 Supplementary Survey (C2SS) to provide information to update the measurement of the international migration components. Although the research will focus primarily on those components less well measured, e.g. emigration, temporary migration, and unauthorized immigration, the work will also include research into all of the current assumptions relating to the components of international migration The first goal is to validate for the 1990 to 2000 period, the calculation of the components of international migration used in previous estimates. Then, using the preliminary data from the Census 2000 long form and possibly the C2SS, we will develop some updated measures of the components of international migration. The second goal is to assess if the documented calculation of the 1990 to 2000 migration components affect the DA estimate for 2000 and thus account for some of the discrepancy with the A.C.E. results. Research to be conducted includes the following:
- We will examine the assumptions about international migration flows, specifically for unauthorized migration, legal immigration, emigration, temporary migration, and migration from Puerto Rico. Utilizing preliminary long form data from Census 2000 and other information sources (including C2SS), we can prepare the first set of documentation for our current international migration assumptions and we can assess the accuracy of assuming a continuation of the estimates developed from the 1990 Census data. Specifically, we will estimate migration using available long-form data on place of birth, citizenship, and year of entry and compare this estimate to the estimates previously used that were developed from the 1990 Census long form data. Thus we will evaluate differences in size and characteristics of previously implied flows based on current data sets. If appropriate, we will recalculate the demographic analysis estimates for 2000 employing any revised levels of international migration.
- We will assess the quality of the foreign-born and Hispanic population data (important because these data are major inputs to the setting of assumptions noted above). We will review edit and allocation procedures for foreign-born and Hispanic populations in the 1990 and 2000 censuses and attempt to quantify the effect (or at least address the direction of the effect) of any differences. We also will review the impact of any change in the edits and allocation procedures on the size and characteristics of these population groups.
2. Robustness of Demographic Analysis
In addition to the research aimed at examining the components of international migration used in the demographic analysis estimates, we will examine the remaining assumptions underlying the Demographic Analysis components of change. These components include the birth, death, and Medicare components. This work will entail the following:
- We will examine the consistency of the components by cohort and age/sex groups across time (1935 to 2000), including the historical international migration components. We will construct DA undercount rates for the 1940 to 2000 decennial censuses and examine them for consistency. We will examine the consistency of sex ratios across cohorts and age/sex groups. Inconsistent or anomalous results will be noted, and possible reasons identified.
- We will review the assumptions about the completeness of vital statistics registration. Specifically, we will review the historic levels of births and deaths used to Start Printed Page 56018develop existing DA estimates and the assumptions about the underregistration of births and registration of infant deaths. We will evaluate both the procedures for adjusting births for underregistration and the level of historical deaths (both total and by age). If appropriate, we will redevelop the historical annual levels of births and deaths to 1990 and 2000.
- We will examine the assumptions about the variation and coverage of Medicare data. This work will include documenting the differences in the sources of Medicare data used in the 1990 and 2000 DA estimates, evaluating the adjustment rates used for underenrollment in the 1990 and 2000 DA estimates, and reconciling the differences in the Medicare files for 1990 and 2000.
- If appropriate, we will recalculate the demographic analysis estimates for 1990, compare them to the original 1990 Demographic Analysis estimates, and assess their impact on the DA estimates for 2000.
- We will analyze the consistency of DA estimates of the population, by race, ethnicity, and nativity, with Census 2000 and A.C.E. This work will entail (1) developing DA benchmarks of the population, by selected race, ethnicity, and nativity groups, (2) obtaining census tabulations of the native and foreign-born populations from preliminary Census 2000 and the 1990 Census long forms, and (3) comparing to the DA benchmarks to derive coverage estimates by selected age, sex, and race groups.
B. A.C.E. Issues and Planned Research
1. Major Areas of Research
a. Balancing Error
The A.C.E. was conducted using a defined area of search, the sample blocks and surrounding blocks for clusters selected for targeted extended search. There were concerns, since there was a change in the 1990 procedure of expanding the search area to surrounding blocks for all sample blocks. We found 3 million more matches in surrounding blocks than correct enumerations after expanding the search area. This difference must be explained in terms of its impact on subsequent estimates of total population. There are two scenarios:
- The unit is located in the surrounding block with no effect on estimates of coverage, but would explain the three million difference.
- The unit is outside the search area and the corresponding people should have been coded erroneous enumerations. This would result in an overestimate of the net undercount.
This may have been compounded by the targeting used in the A.C.E. to match in an area of search around the sample blocks, i.e., the search area. This targeting to make searching effective may have introduced limitations and/or biases into our measurement of coverage. There were three specific concerns in our review of the 2000 A.C.E.
- There were a number of census people that might have been coded as correctly enumerated although the housing unit was not actually located in the sample block. If we didn't estimate the correct number of erroneously enumerated cases, the result would be an overestimate of the net undercount.
- The P-sample may have incorrectly included some housing units in a neighboring block, then in the extended search, the people would have been recorded as matching to the census in the surrounding blocks. Hence, these cases would appear to be balancing error when, in fact, the extended search was compensating for the original listing error. If the P sample had more geocoding error than expected, the Targeted Extended Search (TES) would have compensated for the error and the impact would be trivial and would have little or no impact on final coverage estimates. This would help explain some of the differences of the apparent lack of balance of 3 million.[29]
- Problems in identifying census geocoding errors may have affected the sampling used to select people for extended search outside the sample blocks. That is, the TES sample could have excluded cases it should have included and thus, not matched or followed up on them correctly. The effect of their exclusion would be an overestimate of the net undercount.
It is likely that all of these errors occurred to some extent. What is not yet known is the scale of the error and whether the magnitude of the error was such as to significantly affect the relative accuracy of the A.C.E. adjusted numbers. The additional geographic field work is described in more detail in the attachment.
b. Erroneous Enumerations
Subsequent to the March 1st decision, a new area of concern was identified. In comparing the A.C.E. measures to the comparable measures from the 1990 Census, the Census 2000 erroneous enumerations were found to differ substantially from the 1990 measures. These differences indicate concerns that the level of erroneous enumerations may be understated for Census 2000. Therefore, these differences must be explained because an understatement of erroneous enumerations results in an overstatement of net undercount. Research described below will quantify the accuracy of the A.C.E. measures of erroneous enumeration.
- The Analysis of Measurement Error Study will determine how well the A.C.E. identified erroneous enumerations and correct enumerations. This study is based on a reinterview of a sample of E-sample records. This is described more fully in section B.1.c below.
- Another evaluation based on results from the “E-sample Erroneous Enumeration Study” will analyze the erroneous enumerations for various characteristics. This evaluation will compare the rates of the different types of erroneous enumerations for Census 2000 with corresponding 1990 rates. This evaluation will also recategorize people with unresolved status into the appropriate erroneous enumeration categories by using data from the followup forms. The goal of this work will identify explanations for differences between 1990 and 2000 coding of erroneous enumerations.
- The duplication study discussed in Section C1 will also provide information regarding the differences between 1990 and 2000. This study will validate whether the A.C.E. process is correctly coding census 2000 duplicate enumerations as erroneous.
c. Total Error Model and Loss Functions
Loss function analyses, reviewed by the ESCAP during its deliberations on whether to adjust the census redistricting data, were based on a total error model that corrected the A.C.E. for biases, thus producing measures of the “true” population that could be used as standards for comparing the adjusted and unadjusted census results. The 1990 total error model was adapted to the extent possible to “fit” the 1990 coverage measurement survey error components into the 2000 survey design. This model was updated with available Census 2000 data, but retained several error component measures obtained from the 1990 coverage measurement survey and 1990 evaluations, because the 2000 A.C.E. evaluation data were not yet available. Thus, the error model assumed that the actual A.C.E. error rates for these components were similar to those reflected by the 1990 coverage measurement survey results. This was viewed as conservative because it was expected that the A.C.E. was of higher quality than the 1990 coverage measurement survey. Work is underway to validate that the assumption above is correct.
We are conducting studies to revise the 1990 total error model to reflect actual A.C.E. error components, as measured by 2000 evaluations. Because of methodological changes between 1990 and 2000, there are issues that influence the comparability of this updated analysis to the March 2001 analysis. The analysis will include a discussion of the comparability.
The A.C.E. error components that were previously based on 1990 data will now be measured and input into the revised total error model are:
—P-sample matching error
—P-sample data collection error
—P-sample discrepancy error
—E-sample processing and data collection errors
Synthetic error is not included in the total error model—this component of error is discussed later. A.C.E. error rates for these total error model components will be obtained from the following evaluation studies.
- The Matching Error Study will provide the A.C.E. P-sample matching error rate and E-sample processing error rates. The methodology consists of the clerical rematching of all of the people in a one-fifth subsample of the A.C.E. clusters by expert matchers to determine the best match code possible. We will compare that match and residence information to the production codes.Start Printed Page 56019
- The Analysis of Measurement Error Study uses the results of the Evaluation Followup Interview to provide the error components for E-sample and P-sample data collection error relating to person coverage, and P-sample discrepancy error. The methodology consists of revisiting some of the households in a one-fifth subsample of the A.C.E. clusters and using that information to rematch the Census and A.C.E. people in those households. The results of this study will determine the accuracy of the data going into the person matching process, such as the results from Census and A.C.E. questionnaires. This can involve reclassification of correct and erroneous enumerations. We will determine the accuracy of the residence status of A.C.E. people and how well the A.C.E. process identified Census erroneous enumerations (EEs) and correct enumerations (CEs).
Once the total error model is updated with current data, new loss function analyses will be conducted. The loss function analyses will be expanded to analyze the accuracy of governmental units, in addition to states and counties. No loss function analyses will be run for congressional districts.
d. Correlation Bias
Correlation bias in Dual System Estimates (DSEs) results from a failure of the general independence assumption underlying DSEs due either to causal dependence or heterogeneity. Causal dependence occurs when the act of being included in the census makes someone more likely or less likely to be included in the A.C.E. Heterogeneity occurs when the census and A.C.E. inclusion probabilities vary over persons within post-strata. When heterogeneity within post-strata exists it is generally suspected to be of the form where persons more likely to be missed in the Census are also more likely to be missed in the coverage survey (A.C.E.). This will lead to underestimation of true population by the DSEs. The direction of the effect of causal dependence, if it exists, is less certain.
Correlation bias in the A.C.E. estimates, whether due to heterogeneity or causal dependence, was assessed by comparing A.C.E. and DA results. Correlation bias estimates available for the March 1, 2001 ESCAP recommendation used DA estimates as of February 26, 2001. If further DA research results in revisions to the DA estimates, then the correlation bias estimates will be recomputed. The revised correlation bias estimates will then be used as inputs for revisions of the total error model and loss function analyses.
2.Auxiliary Areas of Research
This section describes other areas that did not preclude ESCAP from recommending that Census 2000 data should be adjusted for redistricting purposes, but for which ESCAP would have preferred additional data. Further research in these areas will be conducted in order to confirm the ESCAP's conclusions.
a. Missing Data
Missing data occurs in the A.C.E. if after all followup attempts there remain households that were not interviewed or households with some portions of the person data missing such as age or race. Sometimes the missing item involves the status of whether a person matched, was a resident on Census day or was correctly enumerated.
For a small number of people in the P-Sample, there was not enough information available to determine the match status (whether or not the person matched to someone in the census in the appropriate search area) or the resident status (whether or not the person was living in the block cluster on Census Day). Determining residence status was important for the P-Sample because Census Day residents of the block clusters in the sample were used to estimate the proportion of the population who were not counted in the census. Similarly, some people in the E-Sample lacked information to determine whether the person was correctly enumerated. Generally for cases with missing status a probability of resident, match, or correct enumeration was assigned based on information available about the specific case and about cases with similar characteristics.
The rates of occurrence of unresolved A.C.E. cases for match status, correct enumeration status, and mover status were viewed as low enough to preclude serious biases in the A.C.E. results. We are now doing analysis of the missing data model to determine if the assumptions are correct.
We will develop and apply alternative models for the treatment of missing data. These alternative models will be carried through A.C.E. estimation process so that the effect on DSEs can be assessed.
b. Late Census 2000 Additions
The levels of late Census 2000 additions were significantly higher than in the 1990 census. Late additions are those persons included in the final census counts, but which due to their late inclusion were excluded from in the A.C.E. matching and dual system estimation processes. For Census 2000, the late additions consisted exclusively of housing units that were temporarily removed from the census because they were suspected to duplicate other housing units, but which were later (after the A.C.E. matching process started) reinstated into the final census after further research was conducted. This differs from the 1990 Census in which the late additions were persons who were enumerated too late in the census cycle to be included in the matching and dual system estimation processes and were not factored into the coverage ratios. The A.C.E. design treated the late census data appropriately in measuring the census undercount. Two areas of concern require further investigation—whether calculating DSEs without these additions resulted in a bias in the estimates and whether these impacted the assumptions underlying the synthetic estimation model.
There is no expectation of a bias in the dual system estimate caused by excluding late additions. The dual system estimate can be expressed as a product of the (1) number of A.C.E. people and (2) the ratio of census complete and correct enumerations to the number of people in both systems. Consequently, any effect must come from one of these two sources. Excluding the late additions does not impact the estimate of the number of A.C.E. people, which come solely from the A.C.E. enumerated sample. Excluding the late additions also will not affect the dual system estimate of the true population if the number of matches is reduced proportionately to the number of census correct enumerations. Given the traditional dual system independence assumption, one would expect this result. Consequently, there is no expectation of a bias in the dual system estimate caused by excluding late additions. Data were not available at the time to validate this assumption.
We will now attempt to validate this assumption by performing a rematch of the P- and E-samples, with the late additions included in the E-sample, to attempt to measure the impact on the rates for correct enumerations and duplicates. This rematch will be conducted in a one-fifth subsample of A.C.E. clusters. This study has limitations because only computer and clerical matching can now be performed; that is, no field work will be conducted. Consequently, a high rate of unresolved cases is expected.
The concerns regarding synthetic error are addressed in Section D. “Synthetic Error”.
c. Conditioning
Conditioning error occurs under two scenarios:
1. Census data collection affects the A.C.E. This will be measured in the correlation bias.
2. A.C.E data collection affects the census. This will be examined in the evaluation described below.
The effect of potential conditioning of Census 2000 respondents by the A.C.E. operations was assumed to be minimal, similar to the 1990 findings. The research is necessary to confirm this assumption.
An evaluation will examine whether census and A.C.E. operations were kept operationally independent. The analysis will be based on comparing Census 2000 results in A.C.E. and non-A.C.E. blocks.
- Mover Status Analysis
The match rate portion of the DSE formula (M/P) uses persons with all types of mover status (nonmovers, outmovers, and inmovers), differentiating between the different types of mover status. Therefore, misclassification of mover status could cause the DSEs to be overstated, understated, or both, depending on the post-strata.
The Measurement Error Reinterview Analysis will measure the extent of mover misclassification by using the results from the Evaluation Followup Interview.
- Housing Unit Coverage
The coverage of housing units will be available in the late summer of 2001. These data will be examined in relation to person coverage estimates for 2000. These data from 2000 will be compared to the 1990 estimates of person and housing unit coverage.
In addition, another study will assess the impact of housing unit coverage on person coverage. This study looks at the P-sample to analyze the effect of housing unit nonmatches on the person nonmatches. The E-sample is also examined to help understand the relationship of housing unit status to person status. The correctly enumerated people in erroneously Start Printed Page 56020enumerated housing units are of particular interest.
- P-sample Nonmatch Analysis
The P-sample nonmatches are examined for variables such as race domain and age/sex group to see if the nonmatches are different for various types of people. This aids in the understanding of the components of A.C.E. and also helps explain the differences between A.C.E. and DA. In addition, the nonmatches from 2000 are compared to the nonmatches from 1990. In conjunction with the analysis of the E-sample, it helps explain the differences between 1990 and 2000.
C. Census 2000 Issues and Planned Research
Research will be conducted into two components of the census—duplication issues and imputation of persons. A high level of duplication not measured by the A.C.E. design could cause the adjusted census estimate to be too high. The effect of imputed persons records are also not measured by the A.C.E. The number of person records that were imputed in Census 2000 was significantly higher than in the 1990 census. The assumption is that the imputed persons are no different than the persons included in the A.C.E. process and therefore match rates are not impacted.
1. Duplication Not Measured in A.C.E.
The A.C.E. methodology by design did not measure duplication between components of the population living in group quarters and in housing units because group quarters were outside the A.C.E. universe. The A.C.E. also did not measure duplication within the group quarters population. Significant duplication of these types could explain some of the differences between demographic analysis and the adjusted Census 2000 data.
The A.C.E. E-sample will be computer matched to the entire census to determine the extent of duplicate enumerations that were not in scope for the A.C.E. This analysis will potentially explain some of the differences between demographic analysis and the A.C.E.
We also plan an extended computer search within the A.C.E. E-sample for duplicate census enumerations among housing units and also between housing units and group quarters persons (which were out-of-scope for A.C.E.) This will help to explain differences between the A.C.E. and the 1990 coverage measurement survey.
2. Census Person Imputations
Census 2000 imputed a higher number of cases than in the 1990 census that came through the process with little or no information as to the occupancy status, or with an occupied status, but with no definitive population count. In addition, Census 2000 imputed more whole person records in cases with known household sizes, but with all the person data missing for some or all of the household members. Although the A.C.E. handled imputed persons appropriately in the estimation process, there was concern about not having information as to what census design processes contributed to the number of imputed persons when compared to the 1990 census.
Given the potential impact that this level of imputations may have on Census 2000 data, it is essential to understand the demographic characteristics of the imputed people and how this may help explain the difference between the census and demographic analysis, as well as, how the imputations affect differences between the E-sample in 1990 and the E-sample in 2000.
There were concerns expressed regarding the effect of whole household imputations on the heterogeneity assumption but these concerns are studied under the synthetic error analysis in Section D.
D. Synthetic Error
The synthetic assumption states that census net coverage does not vary within post-strata. For example, the synthetic assumption implies that census counts in Florida in a particular Hispanic post-stratum have the same net coverage as the census counts in the same Hispanic post-stratum but in New York. The synthetic assumption within post-strata will permit the Census Bureau to draw conclusions from the A.C.E. sample about the population as a whole and then apply them to individuals living in geographic areas smaller than post-strata. The synthetic assumption is necessary to permit correction for small geographic areas based on a sample. This adjustment is only correcting for systematic biases and not local census errors. The error that is introduced when the synthetic assumption does not hold is called synthetic error.
Synthetic estimation methodology is directed at correcting for a systematic under- or overcount in the census. The synthetic estimates will not result in the correction of random counting errors that occur for any entity (blocks tracts, counties, etc). Therefore, the synthetic estimate will not result in extreme changes in small geographic entities, nor will it correct for extreme errors. It is designed to remove the effects of systematic errors so that when small entities are aggregated, systematic and differential coverage errors are corrected.
In the assessment of accuracy, the Census Bureau is concerned with synthetic error since it is not included directly in the total error model. The analysis of the effects of synthetic error were based on the construction of “artificial populations.” These are populations that are created with surrogate variables that are known for the entire population, and are developed to reflect the distribution of net coverage error. This analysis of synthetic error and its effect on the loss functions was limited.
Our additional analysis will expand the scope of the earlier artificial population work and add an approach using direct estimates of coverage at lower geographic levels.
1. Using Artificial Populations
We will do a sensitivity analysis on the results from B-14. B-14 gave results for weighted and unweighted loss functions using one of two methods for distributing targets to post-strata and one of 8 models for correlation bias and percent of 1990 processing bias. This work will concentrate on the weighted loss functions and analyze the sensitivity of the B-14 results over both the methods for distributing targets to post-strata and all 8 models. Once again this analysis will be conducted for states and congressional districts.
2. Using Direct Estimates
We will calculate direct DSEs for census divisions and for states having sufficient sample size to produce direct estimates with reasonably low variance. Assuming the resulting direct DSE population estimates are unbiased, the mean square error of the production synthetic estimate of total population will be estimated.
E. Schedule
Some of the A.C.E. evaluation work being undertaken involves field work and/or additional computer or clerical matching work. The Evaluation Followup Interview was conducted in the field during the winter of 2001. The Matching Error Study matching work was completed in the spring. Results from these studies are being processed, with initial data being available for review in early summer. Field and clerical work for the TES2 and TES3 (described in the attachment) studies began in the winter and will continue into July. Results from these studies won't be available for ESCAP review until later in the summer. Matching for the late census adds evaluation is scheduled for late-July, with data available for review in August. Other research is being conducted on a flow basis as data become available and analyses are conducted.
The ESCAP began holding weekly (or more frequent) meetings to review analyses of data related to the topics of concern beginning on June 18. It is expected that all of the research and analyses described will be completed by the end of September. The ESCAP will then discuss how the results impact their concerns and will make a recommendation by mid-October as to whether adjusted or non-adjusted census data should be used for subsequent purposes.
During the September through October time frame, analysts will document the results of their research in evaluation reports, finalizing them in time for release to the public concurrently with the ESCAP recommendation.
Attachment 3—Field Operations to Answer the Concerns About Lack of Balance
In order to answer these concerns and explain the lack of balance present due to Targeted Extended Search (TES) and to explain the lack of balance that may be introduced due to TES, we will be examining the results of Targeted Extended Search 2 (TES2) and Targeted Extended Search 3 (TES3). TES2 followed up E-sample housing units that were coded as erroneous enumerations in the initial housing unit phase to determine if the unit was inside or outside the block cluster and surrounding rings. TES3 will followup other types of units, both P-sample and E-sample, that may contribute to a lack of balance.
In TES2 we are evaluating the housing units coded during the housing unit matching as not existing as housing units within the cluster. The block containing the housing unit selected for additional geographic work and the surrounding blocks were identified on a map. The field representative identified the block where the housing unit existed and the housing unit was classified as:
Start Printed Page 56021- Existing in the surrounding blocks
- Existing outside the surrounding blocks
- Existing within the block cluster
- Not a housing unit
- Unresolved
So, a housing unit may be coded as in surrounding blocks or outside the search area when it was part of the block cluster.
In TES3 we are also sending to the field a sample of census housing units classified as correctly enumerated in the block cluster. If a housing unit was classified as correctly enumerated in the block cluster in error, the housing unit was not eligible for targeted extended search in person matching. This could explain more of the lack of balance identified in the person matching.
In addition, we are sending additional types of P-sample cases for more geographic field work and a sample of matches in the sample block as a control. These types of cases are:
- P-sample people matched in surrounding blocks
- Not matched P-sample housing units
- P-sample people matched in the sample block cluster
Footnotes
1. For clarity, the Committee that produced the March 1, 2001, ESCAP Report is sometimes referred to herein as “ESCAP I” and the March 1 report as the “ESCAP I Report.” The Committee that has been meeting since March 1, 2001, is referred to as “ESCAP II.”
Back to Citation2. The ESCAP II Report Series does not represent the entirety of the Census Bureau's evaluation of Census 2000. The Census Bureau's formal Census 2000 Evaluation Program provides a comprehensive evaluation of all Census operations and programs. The reports in the ESCAP II series are only those necessary to inform the ESCAP's recommendation.
Back to Citation3. These models can be found at http://factfinder.census.gov.
Back to Citation5. The March Current Population Survey was reweighted using the Census 2000 counts by age, race, sex, and Hispanic origin for this comparison.
Back to Citation6. This figure differs from the 1.18 percent usually quoted for the A.C.E. because the A.C.E. and DA estimate different populations. DA estimates the total population, while the A.C.E. estimates the household population, which excludes group quarters.
Back to Citation7. ESCAP II Report No. 1, “Demographic Analysis Results.”
Back to Citation8. ESCAP II Report No. 1, “Demographic Analysis Results.”
Back to Citation9. ESCAP II Report No. 7, “Accuracy and Coverage Evaluation Matching Error.”
Back to Citation10. ESCAP II Report No. 3, “Evaluation Results for Changes in A.C.E. Enumeration Status,” ESCAP II Report No. 4, “A.C.E. Erroneous Enumeration Errors: Analysis of Census Discrepant Persons,” ESCAP II Report No. 16, “Evaluation Results for Changes in Mover and Residence Status in the A.C.E.,” and ESCAP II Report No. 24, “Results of the Person Followup and Evaluation Follow-up Forms Review.”
Back to Citation11. ESCAP II Report No. 6, “Census Person Duplication and the Corresponding A.C.E. Enumeration Status,” ESCAP II Report No. 9, “Evidence of Additional Erroneous Enumerations from the Person Duplication Study,” and ESCAP II Report No. 20, “Person Duplication in Census 2000.”
Back to Citation12. ESCAP II Report No. 3, “Evaluation Results for Changes in A.C.E. Enumeration Status.”
Back to Citation13. ESCAP II Report No. 24, “Results of the Person Followup and Evaluation Followup Forms Review.”
Back to Citation14. ESCAP II Report No. 24, “Results of the Person Followup and Evaluation Followup Forms Review.”
Back to Citation15. Discrepant results include falsification (the amount is uncertain), but do not include honest mistakes made by the interviewers or respondents. A person is classified as discrepant during the matching operation if three knowledgeable respondents indicate not knowing him or her in either the EFU or production interview.
Back to Citation16. ESCAP II Report No. 4, “A.C.E. Erroneous Enumerations Errors: Analysis of Census Discrepant Persons.”
Back to Citation17. ESCAP II Report No. 7, “Accuracy and Coverage Evaluation Matching Error.”
Back to Citation18. ESCAP II Report No. 16, “Evaluation Results for changes in Mover and Residence Status in the A.C.E.”
Back to Citation19. ESCAP II Report No. 10, “Estimation of Correlation Bias in 2000 A.C.E. Estimates Using Revised Demographic Analysis Results.”
Back to Citation20. ESCAP II Report No. 12, “Analysis of Missing Data Alternatives.”
Back to Citation21. ESCAP II Report No. 2, “Evaluation of Lack of Balance and Geographic Errors Affecting Person Estimates.”
Back to Citation22. ESCAP II Report No. 14, “Conditioning of Census 2000 Data Collected in Accuracy and Coverage Evaluation Block Clusters.”
Back to Citation23. Howard Hogan (March 2001). “Accuracy and Coverage Evaluation Survey: Effect of Excluding ‘Late Census Adds,’ ” DSSD Census 2000 Procedures and Operations Memorandum Series No. Q-43.
Back to Citation24. ESCAP II Report No. 21, “Analysis of Census Imputations.”
Back to Citation25. ESCAP II Report No. 21, “Analysis of Census Imputations.”
Back to Citation26. ESCAP II Report No. 22, “Characteristics of Census Imputations.”
Back to Citation27. Mulry, Mary H. and Spencer, Bruce D. (March 2001), ESCAP II Report No. B-19*, “Overview of Total Error Modeling and Loss Function Analysis,” DSSD Census 2000 Procedures and Memorandum Series No. B-19*.
Back to Citation28. The 1.15 percent and 0.32 percent of the undercount rates are based on census counts that include both the housing unit and group quarters populations.
Back to Citation29. Assume 2.6 million of the P-sample are listed in the surrounding blocks. If 95% of them are in the search area (a plausible percentage), and if 90% match (about the overall match rate), then we have accounted for 2.2 million matches to the surrounding blocks. When we divide this 2.2 million by the P-sample coverage of 0.94, we have accounted for about 2.36 million of the 3 million lack of balance.
Back to Citation[FR Doc. 01-27663 Filed 10-31-01; 12:05 am]
BILLING CODE 3510-07-P
Document Information
- Published:
- 11/05/2001
- Department:
- Census Bureau
- Entry Type:
- Notice
- Action:
- Notice of report and statement of Acting Director of the Census Bureau regarding adjustment decision.
- Document Number:
- 01-27663
- Pages:
- 56005-56021 (17 pages)
- Docket Numbers:
- Docket Number 011026262-1262-01
- RINs:
- 0607-XX66
- EOCitation:
- of 2001-10-16
- PDF File:
- 01-27663.pdf