When it comes to selecting those who are most likely to vote, pollsters have long relied more on art than science. Nearly 30 years after respected polling expert Irving Crespidescribed the identification of likely voters as “a major measurement problem in pre-election polling,” survey researchers continue to struggle with the best way to model the likely electorate.
As explained by the Pew Research Center in theirreport on likely voter models earlier this year, identifying truly likely voters is a challenge because (a) the goal is to model the future electorate, “a population that does not yet exist at the time the poll is conducted,” and (b) because many people who say they intend to vote do not actually cast a ballot. The net result is that pollsters have no way to predict with complete certainty which of their respondents will vote.
After reviewing the previous research and available evidence in our own data, NBC News and SurveyMonkey have concluded that the best approach for our tracking survey data is a “likely voter model” that makes only modest adjustments to our self-reported “registered voter” results.
Three aspects support our conclusion:
1) Our registered voter samples already resemble ‘likely voters’
The NBC News|SurveyMonkey Weekly Election Tracking Poll screens for adults who say they are registered to vote, and we weight the data to reflect the demographic composition of registered voters (by age, race, sex, education and region) using the Census Bureau’s Current Population Survey. Demographically, our results closely mimic the population of registered voters.
In addition, we also believe that the method used by SurveyMonkey to recruit respondents into the weekly tracking poll selects the most likely voters from among the population of people taking SurveyMonkey surveys. Our respondents are selected from the nearly 3 million people who take surveys on the SurveyMonkey platform each day. To do so, for a random sample of those taking a survey, SurveyMonkey displays a map of past election results colored in gradations of red, blue and purple and they ask those individuals to “help us predict the 2016 elections.” Because individuals choose whether or not they want to help predict the election, those individuals choosing to participate are arguably more likely to be politically interested and likely to vote than respondents who see the same invitation and decline to participate.
How well the self-selected sample of respondents to our weekly political survey resembles the electorate after being weighted to the population of registered voters is an empirical question on which we can get some leverage. We used these findings to help us determine the most appropriate likely voter model for our tracking poll.
From July 25 to August 21, we asked respondents whether they voted in past elections. While it is well known that self-reports of political participation are too high, nearly 79 percent of the registered voters reported voted in the general election in 2012, and 60 percent reported voting in the 2014 midterm elections. Using data obtained from NBC’s data partner TargetSmart -– a leading voter-file company in the United States -- among those currently registered to vote, 60 percent actually voted in 2012 and 40 percent actually voted in 2014.
Even accounting for the over-reporting of self-reported turnout, the fact that we find higher levels of past participation than the population of registered voters in general suggests that the registered voters who choose to participate are indeed more likely to be voters than a randomly selected registered voter. This is consistent with the opt-in nature of our recruitment procedure selecting registered voters who are more likely to participate than the average registered voter sample. While our weighted sample of registered voters almost certainly includes some who are unlikely to vote in 2016, the fact that our opt-in survey is likely to select the registered voters who are most likely to vote suggests that the need for “correction” is relatively small.
2) The demographic composition of the most likely electorate is very close to the composition of all registered voters
The demographic composition of likely voters in 2016 is nearly identical to the composition of all registered voters in 2012 -- which we used to construct the weighting target we have applied to NBC News|SurveyMonkey Weekly Election Tracking Polls this year. Put differently, even though we are weighting our sample to the demographics of the population of registered voters, the demographics of actual voters are very similar to those of registered voters.
The following reports the demographic profile of voters for presidential elections going back to the year 2000. This pattern reflects an often overlooked characteristic of voter demographics: While it evolves to match long term changes in the population -- growing less white over time, for example -- the between election changes in the presidential electorate shows more stability than change.
In an election where voter preferences show huge differences by demographics such as gender, age, race and level of education, accurate polling depends on anticipating the demographics of the likely electorate. While no one can predict who will vote in 2016 with absolute certainty -- much less their precise demographic composition -- the relative stability of the presidential electorate over time and the similarity between that composition and the composition of registered voters confirms our belief that our current weighting targets also reflect the best estimate available of the likely electorate.
3) Screening too tightly for “likely voters” can create unrealistic shifts in demographic composition
Because of the close relationship between the demographics of registered voters and the national electorate, employing screens and cut-offs to identify “likely voters” can distort the demographic distribution of the sample and risks doing more harm than good. In fact, aPew Research report on likely voter models released earlier this year demonstrates that the various “cutoff methods” used by pollsters to select likely voters are “very sensitive to the chosen turnout threshold.” Relatively small changes in the somewhat arbitrary decisions that pollsters make about how to narrow their sample to identify “likely voters” can affect and distort the results. In a detailed report on their own 2012 polling misfire, for example,Gallup reported that their likely voter model had essentially over-corrected their samples of all registered voters, shifting their results too far toward Mitt Romney.
Analyzing the more than 14,000 responses collected between September 12 and 18 reveals the impact of screening the data based on responses to a self-reported likelihood of voting. The results in the left-most column are those of all our respondents, weighted to match the demographics of registered voters as described above. The other columns indicate how much each “likely voter” screen changes the composition. If we use only the 81 percent of our respondents who indicate that they are “Absolutely Certain” to vote, the gender composition is unchanged -- 53 percent in each -- but the sample becomes older (e.g., the percentage over the age of 60 increases by 4 percent), less likely to contain Hispanics (the percentage of non-Hispanic whites increases by 2 percent and the percentage of Hispanics falls by 2 percent), and less likely to support independent candidates (by 5 percent). Given such shifts, the difference between Clinton and Trump narrows from 5 percent to 4 percent. The shifts that are predicted seem unreasonably large given historical patterns -- while possible, it is hard to believe that 35 percent of the electorate will be above the age of 60 or that the racial composition of the electorate will be more likely to reflect 2008 than 2012.
If we broaden the screen and use the 90 percent who indicate that they are either “absolutely certain” or there is a “large chance” of voting then the the composition changes, but not as much. If we expand further to also include those who indicate only a “50-50” chance of voting, the results are nearly identical to the overall sample as 96 percent of the respondents indicate at least a 50-50 chance of voting.
Identifying likely voters by asking them how likely they are to vote shifts the electorate from the demographic composition observed in recent presidential elections towards a whiter, older electorate. While it is always possible that the electorate will be older and whiter in 2016 than it was in 2012 and 2008 -- we will not know what the electorate in 2016 looks like until Election Day -- given the demographic stability evident in past presidential elections we are reluctant to rely on screens that shift the composition of the electorate too far away from the composition of recent elections. Given historical patterns and the relative stability of presidential voting, our working assumption is that the electorate in 2016 is more likely than not to resemble the 2012 electorate; we trust the stable patterns in the data more than self-reported responses.
Taking all of this data into account, we will therefore report vote preference estimates for the final weeks of the election based on a very light screening of a sample that we believe is already very close to the likely electorate – dropping only those whose intent to participate is “50-50” or less. Given the tendency to over-report turnout, voters who are indifferent between voting or not are more likely to find reasons to not cast a ballot than to take the time to do so. Of course, some who indicate they are 50-50 will decide to vote -- just as some who indicate that they are “absolutely certain” will not, but given that only a small fraction of individuals self-identify as being indifferent (6 percent) -- a fraction that will presumably decrease as Election Day approaches -- removing these individuals allows us to account for the way that the current electorate may differ from that of prior presidential elections while also producing a predicted demographic composition that is in line with historic trends. Finally, while the decision is certainly more art than science, the results reassuringly show that decision does not greatly affect the relative support for the Clinton and Trump.