Big Data's Big Misses: 2016 Was a Bad Year For PredictionsDec. 21, 201603:10
Goodbye 2016. The journalists and analysts who work with numbers will not miss you.
From the United States to the United Kingdom, the last 12 months will be remembered for missed calls, surprises and upsets that didn’t just beat the odds, but that shattered them – and not just in politics. On the most basic level, 2016 was a bad year for data.
Of course, President-elect Donald Trump is a big part of the story here.
Going into Election Day, most odds makers had Trump as long long shot, with a less than 30% chance of winning the presidency. The data mavens at fivethirtyeight.com, gave Trump a 28.6% chance of winning. The Upshot at the New York Times gave him a 15% chance of winning. Others had the numbers much lower, 2% or less.
In January, he’ll be redecorating the Oval Office
Even before Trump there was Brexit, the June vote wherein the United Kingdom decided to leave the European Union. Polling showed a mixed picture, but the “smart money” was with the UK staying in the EU. The evening of the vote, Ladbrokes, the British gambling company, put the odds of a “leave” vote at only 10%.
Today, the UK is busy trying to figure out how to unwind its EU ties.
And the surprises extended far beyond politics to sports. Some of the biggest headlines in athletics were shockers.
In June, the Cleveland Cavaliers shook off their (and their city’s) losing reputation and came back from a 3-games-to-1 deficit against the Golden State Warriors to win the team’s first NBA Championship. That was a remarkable feat. In the history of the NBA no team had ever come back after falling down 3-1 in the Finals.
The odds against the Cavaliers climbed as high as 40-1, before the team stormed back to win.
In October, the Chicago Cubs ended a 108-year World Series championship drought when they overcame their own 3-games-to-1 deficit against the Cleveland Indians. Before 2016, there had been 34 World Series where one team took a 3-1 lead, and only 5 times had the trailing team won. That’s fewer than 15% of the occurrences.
The Cubs changed those numbers to 6 out of 35, 17%.
And back in May, again on the other side of the pond, Leicester City came from nowhere to win the English Premier League soccer championship. Before the season, the odds of Leicester’s “Foxes” winning were an astounding 5000-to-1.
All those results made the numbers, and the people who created them, look silly. In other words, many of us may be focused on how pollsters “got it wrong” in the U.S. presidential race, but those data wranglers had plenty of company in Las Vegas and London. (And remember, as many pollsters will remind you, Hillary Clinton did win the popular vote.)
But the misses on Trump and Brexit were more complicated than the sports books. They were about misreading and mis-predicting the behavior of millions of people. And in those two cases, the data tools may be part of the problem.
Most of the traditional data measures on the 2016 election data were pretty consistent in showing a big Clinton win. It could be that 2016 is telling us the electorate now functions differently.
Perhaps all the Trump buzz on social media was saying something about enthusiasm. Maybe the fact that Trump masks were bigger Halloween sellers than Clinton masks was a sign that there was a movement afoot.
And there was the electoral/popular vote split, which suggested national polls didn’t mean as much as what was going on in individual states.
But the bigger 2016 data lesson, where journalism is concerned, may be to stop worrying so much about predicting the future. Journalism is supposed to report on what is actually happening not offer odds on what may happen.
Data are still important when we use them correctly. Polls help us better grasp where the public stands on issues and personalities. Economic numbers and consumer surveys help explain what may be the core story for American politics in 2017: The divides coursing through the United States.
Using numbers that way could make 2017 a very good year for data, especially in political coverage.
What about the rest of data’s bad 2016? Is there a larger lesson in the numbers? Not necessarily. The stories behind 2016’s data-defiers are all different. The Cavs had Lebron. The Cubs had pitching. Leicester City had … whatever Leicester City had.
Then again, it could be that 2016 wasn’t an outlier. It was just the beginning of a new data trend, one where down is up losers become winners and long shots have the advantage.
As the NFL playoffs near one thought: New England Patriots, you’re on notice.