Understanding common errors in surveys and statistical reports
Errors extensively exist in most of the survey results, whose sources vary from nonresponse error, measurement error, framing error, etc. When reading statistics from a survey, we need to recognize all sorts of potential errors behind it, in order to avoid being misled. What is even better is to learn the methods for evaluating data quality and reducing errors in statistics.
Referring to the empirical research paper of Meyer, Mok and Sullivan (2015), they compared several survey results with administrative data, with the latter serving as a proxy of true value to measure the magnitude of errors in the surveys. For example, they compared the total dollar value of food stamp benefits reported by all respondents in a survey to that awarded as recorded in US department of Agriculture, Food and Nutrition Services administrative data.
The paper found a noticeable rise in the threat of item nonresponse, unit nonresponse and measurement error to survey quality in many important datasets for social science research and government policy in the US. For example, from 2000 to 2012, over 50% of welfare dollars and nearly half of food stamp dollars were missed in several major surveys when compared to administrative data. Biased or incorrect results in surveys might misguide policymakers concerning the adjustment of existing policies and formulation of new policies.
Some errors are rather common in surveys that are worth paying attention to when comprehending statistical numbers in it. We discuss four of them here.
4 types of common errors in surveys
I. Framing error
A framing error occurs when units drawn from the sampling could not correctly represent the population of interest, which includes erroneous inclusions, erroneous exclusions and multiple inclusions.
Erroneous inclusions mean the sampling frame includes some units not in the population of interest; erroneous exclusions mean some eligible units are excluded from the sampling frame that has no chance of being sampled; multiple inclusions mean some units appear multiple times in the sampling frame that has a higher chance of being sampled.
The ideal sampling framing should include the entire population of interest where each unit has an equal probability of being sampled. With framing error, some units might be sampled with a higher probability or some ineligible units might be sampled, leading to incorrect results that are unable to reflect the condition in the population.
For example, if a survey aims to examine the overall perception of residents in Shatin district towards a certain policy, in which a telephone list that includes entire residents in Shatin district in 2020 is used as a sampling frame to draw respondents randomly. Erroneous inclusions occur if some residents that have moved out of Shatin are sampled; erroneous exclusions occur if some residents’ phone numbers are not put in the list; multiple inclusions occur if some residents have multiple phone numbers so that their chance of being sampled is higher.
II. Unit nonresponse error
Unit nonresponse error occurs when a sample does not respond to a survey at all so that the response rate is less than 100%. This type of error does not necessarily lead to seriously mistaken statistics if the nonresponse samples are distributed randomly. However, it matters when the characteristics of the absent group are systematically different from those replying to the survey.
For example, when being asked whether one is receiving social welfare, the beneficiary may tend to deny it because of the stigma, while the non-beneficiary is more likely to tell the truth. If the survey intends to count the number of beneficiaries, the number will be understated.
One of the solutions to unit nonresponse error is that interviewers try harder to reach parts of the nonresponse samples and get their statistics, then assign extra weight to those statistics to represent the whole group of nonresponse samples, whereas those samples unable to reach will be given up.
III. Item nonresponse error
Item nonresponse error occurs when a sample does not respond to certain questions in the survey. Without the data needed, the statistics from the survey will be understated, given that no remedy is made.
Item nonresponse error appears in most surveys. One of the solutions is to adopt imputation techniques to gauge the missing data, which includes regression, cold-deck, hot-deck, propensity score matching and etc. The imputed values will be used as a replacement for those missing data.
It is possible that errors also exist on the imputed values, for which they are not entirely equal to the true values. Given we have access to true values, the following equation [weighted value imputed to those not responding to the question – the true value received by that group/ total true value] can be used to estimate the magnitude of such an error.
IV. Measurement error
Measurement error occurs when the reporting value is different from the true value, due to problems of overreporting or underreporting. Suppose there is a survey with a 100% response rate, the result of which suffering from a huge measurement error will be still misleading and meaningless. To reduce the measurement error, survey questions should be devised in the sense that any respondent can easily understand what to answer and provide accurate answers. Confusing wording of questions or bad questionnaire designs might obstruct the comprehension of questions by respondents and reduce the correctness of answers provided.
Although it is unlikely to have access to the true value as the benchmark for comparison, Meyer, Mok and Sullivan (2015) compared the values from surveys and administrative data sets to estimate measurement error. We can use the following equation [ (value recorded by non-imputed respondents – true value of non-imputed respondents) / total true value], to find out the magnitude of the error.
They suggested that by matching survey data with administrative data by respondents’ IDs, any misreporting value in the survey can be detected swiftly when compared to the true value stored in the administrative database. However, this method is related to privacy issues where no easy compromises can be made especially in developed countries.
Some thoughts from Wallace, the co-author of “Household Surveys in Crisis”
As mentioned above, Meyer, Mok and Sullivan (2015) compared several surveys to administrative data sets recorded by the US government, with the latter serving as the proxy of true data to measure the magnitude of errors existing in the surveys. Some might wonder about the meaning to conduct surveys when the intended data sets have already existed in the form of administrative data.
- Why are surveys needed when administrative data sets are available?
Administrative data sets are mostly summary statistics where no microdata such as gender and age of each unit is provided, which make it difficult to conduct further research. In contrast, most survey data sets tend to collect microdata and provide statistics breakdowns by different groups, which facilitate more focused analysis such as tracking the opinions of some groups towards some issues.
For example, to find out how the underprivileged thinks of some welfare policies, a survey with microdata allows us to identify the targeting group (e.g., annual income < $x), and extract responses from them to assess their perception towards certain welfare policies.
- What limitations should we be aware of when comparing survey results to administrative data and how can we deal with it?
Although comparing surveys to administrative data seems to be convenient for spotting sources of errors and measuring their magnitude, the two kinds of data sets are rarely comparable due to several limitations between them, such as sample size difference, incomplete information in the surveys, timing difference in data recording. Some adjustments are needed before making a comparison.
Regarding solutions to the sample size difference, we need to find out the population weighting of certain variables in targeting groups (using census data) first and then scale the sample statistics according to the population weighting.
For example, let us assume the population ratio of males and females are 3:7. If the ratio of gender in the survey is 5:5, the statistics from that survey should be adjusted to fit the ratio of 3:7. To illustrate clearly, if the total annual income of males and females are 20,000 and 10,000 in the survey, they will be revised to 12,000 and 14,000.
Regarding solutions to incomplete information, suppose a set of administrative data contains an aggregate value of x1, x2, x3, while the survey only provides an aggregate value of x1 and x2. We need to manually subtract the estimate of x3 in administrative data, to make it comparable to the survey data.
Regarding the problem of timing difference in data recording, often the reference period of the administrative data is recorded on a fiscal year basis, period of which normally starts from 1 April in year t to 31 March in year t+1. However, the survey data is often recorded on a calendar year basis starting from 1 January to 31 December in a year. Most of the time, we should convert the fiscal year administrative data into a calendar year basis before making a comparison with survey data.
All in all, when interpreting any statistical number, never should we fully accept the number for granted at first glance but try to understand the methodologies that generate the number and evaluate the potential errors in the survey, by which we will be less likely to be misled by mistaken statistics.
Meyer, Bruce D., Wallace K. C. Mok, and James X. Sullivan. 2015. “Household Surveys in Crisis.” Journal of Economic Perspectives, 29 (4): 199-226.