{"id":263,"date":"2019-01-27T00:00:30","date_gmt":"2019-01-27T00:00:30","guid":{"rendered":"http:\/\/www.electiongraphs.com\/?p=263"},"modified":"2019-02-27T01:03:31","modified_gmt":"2019-02-27T01:03:31","slug":"polling-averages-vs-reality","status":"publish","type":"post","link":"https:\/\/electiongraphs.com\/blog\/2019\/01\/27\/polling-averages-vs-reality\/","title":{"rendered":"Polling Averages vs Reality"},"content":{"rendered":"\r\n<p>2018 is over. Multiple candidates have announced they are at least investigating running for President in 2020, and a few are even past that stage. But before Election Graphs starts posting new graphs and charts for 2020, one more look back at the past.<\/p>\r\n\r\n\r\n\r\n<p>This will be the first in a series of blog posts for folks who are into the geeky mathematical details of how Election Graphs state polling averages have compared to the actual election results from 2008, 2012, and 2016. If this isn't you, feel free to skip this series. Or feel free to skim forward and just look at the graphs if you don't want or need my explanations.<\/p>\r\n\r\n\r\n\r\n<p>For those of you who just want to know about 2020&#8230; Keep checking in&#8230; actual new graphs and charts and analysis for 2020 will be here before too much longer! How much longer? I'm not sure. But after this series of posts is done, getting up the basic framework of the 2020 site is my next priority!<\/p>\r\n\r\n\r\n\r\n<p>Now, for the small group of you who may be left&#8230; any thoughts, advice, or checks on my math are welcome. While this is all interesting on its own, some of this is just me thinking aloud as I figure out what (if anything) I am going to do differently for 2020. Please email me at\u00a0<a href=\"mailto:feedback@electiongraphs.com\">feedback@electiongraphs.com<\/a>\u00a0for longer discussions, or just leave comments here. Raw data, Excel spreadsheets, etc., are available on request to anybody who wants them, although fair warning, they aren't all cleaned up and annotated to be scrutable to anyone other than me without some explanation or effort.<\/p>\r\n\r\n\r\n\r\n<p>Anyway, one of the key elements I called out in my <a href=\"http:\/\/www.abulsme.com\/2016\/11\/29\/electoral-college-post-mortem\/\">2016 Port Mortem<\/a>\u00a0was the need to \"trust the uncertainty\" and look at the range of possibilities, not just the prediction's center line. Part of this is just being vigilant in avoiding the temptation to reduce things to a single point estimate rather than a range of possibilities. Another part is repeating over and over again that a 14% chance of something happening isn't the same as 0%. Although pretty much everybody doing poll analysis did explain these things to some extent in 2016, in retrospect it is clear that there was still too much emphasis on that centerline by most people, including by me.<\/p>\r\n\r\n\r\n\r\n<p>But another important element is defining what that uncertainty is. Ever since I started doing presidential election tracking in 2008, I have used \"margin less than 5%\" to define states I was going to categorize as close enough you should take seriously the possibility they could go either way. I also used 5% as the limit of \"too close to call\" for the tipping point metric. I had a second boundary at 10% on the state polls to mark off the outer boundaries that you could even imagine being in play if a candidate made a huge surge.<\/p>\r\n\r\n\r\n\r\n<p>Those numbers were just arbitrary round numbers though. With three election cycles of data behind me now though, it is time to do some actual analysis of the real life differences between the final polling averages and the actual election results, to get a better idea of what kind of differences are reasonable to expect.<\/p>\r\n\r\n\r\n\r\n<p>Over the next few blog posts I will look at this in several different ways, then decide if anything about Election Graphs should change for 2020.<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\">One sided histogram<\/h2>\r\n\r\n\r\n\r\n<p>First, let's just look at a simple histogram showing how far off the poll averages were from the actual margins.<\/p>\r\n\r\n\r\n\r\n<p>For each of the three election cycles, we have 50 states and DC. For 2012 and 2016, we also have the five congressional districts in Maine and Nebraska. (I didn't track those separately in 2008 unfortunately.) For each one of those results, I look at the unsigned delta between the final poll average margin and the margin in the actual election results, and show a histogram for each of the three election cycles, and a combined line using all 163 data points.<\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.electiongraphs.com\/blog\/wp-content\/uploads\/2018\/11\/Screen-Shot-2019-01-07-at-04.37.34105.png\"><img loading=\"lazy\" decoding=\"async\" width=\"3130\" height=\"2240\" class=\"wp-image-200\" src=\"http:\/\/www.electiongraphs.com\/blog\/wp-content\/uploads\/2018\/11\/Screen-Shot-2019-01-07-at-04.37.34105.png\" alt=\"\" srcset=\"https:\/\/electiongraphs.com\/blog\/wp-content\/uploads\/2018\/11\/Screen-Shot-2019-01-07-at-04.37.34105.png 3130w, https:\/\/electiongraphs.com\/blog\/wp-content\/uploads\/2018\/11\/Screen-Shot-2019-01-07-at-04.37.34105-150x107.png 150w, https:\/\/electiongraphs.com\/blog\/wp-content\/uploads\/2018\/11\/Screen-Shot-2019-01-07-at-04.37.34105-300x215.png 300w, https:\/\/electiongraphs.com\/blog\/wp-content\/uploads\/2018\/11\/Screen-Shot-2019-01-07-at-04.37.34105-768x550.png 768w, https:\/\/electiongraphs.com\/blog\/wp-content\/uploads\/2018\/11\/Screen-Shot-2019-01-07-at-04.37.34105-1024x733.png 1024w, https:\/\/electiongraphs.com\/blog\/wp-content\/uploads\/2018\/11\/Screen-Shot-2019-01-07-at-04.37.34105-1200x859.png 1200w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/a><\/figure>\r\n\r\n\r\n\r\n<p>Somewhat improperly using the \"N\u03c3\" notion&#8230; with 1\u03c3 being about 68.27% of the time, and 2\u03c3 being about 95.45% of the time, we see that 68.27% (1\u03c3) of the poll averages were within 5.49% of the actual results. But to get to 95.45% (2\u03c3) you have to move out to 13.39%.<\/p>\r\n\r\n\r\n\r\n<table class=\"wp-block-table tg\">\r\n<tbody>\r\n<tr>\r\n<td>\u00a0<\/td>\r\n<td><strong>68.27% (1\u03c3)<\/strong><\/td>\r\n<td><strong>95.45% (2\u03c3)<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Margin<\/strong><\/td>\r\n<td>5.49%<\/td>\r\n<td>13.39%<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n\r\n\r\n\r\n<p>[Note for sticklers #1: There is no way this kind of analysis is significant to 0.01%, so showing two digits after the decimal point is false precision and the value depends on my choices for how to interpolate&#8230; among a variety of other things. But I've standardized on two digits after the decimal for everything in this series of posts anyway because&#8230; well, just because. Feel free to round to the nearest 0.1% or even 1% if you prefer.]<\/p>\r\n\r\n\r\n\r\n<p>[Note for sticklers #2: Each election cycle I made some modifications to the fiddly details of how I calculated the averages, including things like what I did if there were less than 5 polls, how I dealt with polls that included more than one version of the result (for instance registered vs likely voters or with and without third party candidates), and if I used the end or middle of the field dates as the date used to determine poll recency. These differences may technically make it improper to do calculations that combine data from these three cycles without recalculating everything based on the same rules. I contend that the differences in my methodology over the three cycles were minor enough that it wouldn't substantially change this analysis, but given the amount of work that would be involved, I have NOT spent the time to convert 2008 and 2012 to match my 2016 methodology in order to confirm this.]<\/p>\r\n\r\n\r\n\r\n<p>It is very tempting in the context of Election Graphs to just move my boundaries between \"Weak\" and \"Strong\" states from \u00a05% to 5.49%, and the boundary between \"Strong\" and \"Solid\" from 10% to 13.39%. Both of those numbers are kind of close to where the old boundaries are, just expanded a bit to show a bit more uncertainty than before, which seems intuitively right after the 2016 election cycle.<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\">Two sided histogram<\/h2>\r\n\r\n\r\n\r\n<p>But wait, why just look at the magnitude of the errors? Isn't the direction of the errors important too? Are the polls systematically favoring one side or the other? Very possible. Time to do that histogram again, but taking into account which direction the polls were off:<\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.electiongraphs.com\/blog\/wp-content\/uploads\/2018\/11\/Screen-Shot-2019-01-07-at-04.45.13770.png\"><img loading=\"lazy\" decoding=\"async\" width=\"3136\" height=\"2256\" class=\"wp-image-202\" src=\"http:\/\/www.electiongraphs.com\/blog\/wp-content\/uploads\/2018\/11\/Screen-Shot-2019-01-07-at-04.45.13770.png\" alt=\"\" srcset=\"https:\/\/electiongraphs.com\/blog\/wp-content\/uploads\/2018\/11\/Screen-Shot-2019-01-07-at-04.45.13770.png 3136w, https:\/\/electiongraphs.com\/blog\/wp-content\/uploads\/2018\/11\/Screen-Shot-2019-01-07-at-04.45.13770-150x108.png 150w, https:\/\/electiongraphs.com\/blog\/wp-content\/uploads\/2018\/11\/Screen-Shot-2019-01-07-at-04.45.13770-300x216.png 300w, https:\/\/electiongraphs.com\/blog\/wp-content\/uploads\/2018\/11\/Screen-Shot-2019-01-07-at-04.45.13770-768x552.png 768w, https:\/\/electiongraphs.com\/blog\/wp-content\/uploads\/2018\/11\/Screen-Shot-2019-01-07-at-04.45.13770-1024x737.png 1024w, https:\/\/electiongraphs.com\/blog\/wp-content\/uploads\/2018\/11\/Screen-Shot-2019-01-07-at-04.45.13770-1200x863.png 1200w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/a><\/figure>\r\n\r\n\r\n\r\n<p>The pattern isn't symmetrical, although it is certainly possible (perhaps even likely) that if I had data for a few more election cycles it would become more so. At the moment though, while the peak looks to be very slightly on the side of Democrats doing better than the poll average (in other words the polls showed Republicans doing better than they actually were), when you average out all the polls, the bias is actually that the Republicans beat the poll averages by 0.69%. (In other words, the poll averages showed Democrats doing slightly better than they actually did.)<\/p>\r\n\r\n\r\n\r\n<p>The asymmetry is notable here. When the poll averages overestimate the Republicans, most of the time the error is 6% or less, but when the poll average overestimates the Democrats it is often by quite a bit more.<\/p>\r\n\r\n\r\n\r\n<p>I didn't put it on the plot because it was already pretty busy, but you can also use this to get the ranges for the central 1\u03c3 and 2\u03c3:<\/p>\r\n\r\n\r\n\r\n<table class=\"wp-block-table tg\">\r\n<tbody>\r\n<tr>\r\n<td width=20%>\u00a0<\/td>\r\n<td width=40%><strong>Middle 68.27% (1\u03c3)<\/strong><\/td>\r\n<td width=40%><strong>Middle 95.45% (2\u03c3)<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Range<\/strong><\/td>\r\n<td>D+4.61% to R+7.45%<\/td>\r\n<td>D+12.21% to R+13.32%<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Avg Limit<\/strong><\/td>\r\n<td>6.03%<\/td>\r\n<td>12.77%<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n\r\n\r\n\r\n<p>Maybe these numbers could be used to define category boundaries? You would have to either explicitly have different category boundaries for the two parties or use the averages of the R and D boundaries to make it symmetric. The median is also not quite at the center&#8230; it is at an 0.01% Democratic lead. But that is probably small enough to just count as zero.<\/p>\r\n\r\n\r\n\r\n<p>But looking at how many polls are a certain amount off favoring one party or another doesn't really hit exactly what we want.<\/p>\r\n\r\n\r\n\r\n<p>See, for what we care about on a site like Election Graphs, if we have a poll average at a certain point, we don't actually care that much if the actual result is that the leading candidate wins by an even bigger margin. We only really care if the polls are wrong in the direction that leads the opposite candidate to win.<\/p>\r\n\r\n\r\n\r\n<p>I'll look into that in the next post&#8230;<\/p>\r\n\r\n\r\n\r\n<p>You can find all the posts in this series here:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li><a href=\"https:\/\/www.electiongraphs.com\/2019\/01\/27\/polling-averages-vs-reality\/\">Polling Averages vs Reality<\/a><\/li>\r\n<li><a href=\"https:\/\/www.electiongraphs.com\/2019\/01\/28\/win-chances-from-poll-averages\/\">Win Chances from Poll Averages<\/a><\/li>\r\n<li><a href=\"http:\/\/www.electiongraphs.com\/2019\/01\/29\/polling-error-vs-final-margin\/\">Polling Error vs Final Margin<\/a><\/li>\r\n<li><a href=\"http:\/\/www.electiongraphs.com\/2019\/01\/30\/predicting-2016-by-cheating\/\">Predicting 2016 by Cheating<\/a><\/li>\r\n<li><a href=\"https:\/\/www.electiongraphs.com\/2019\/01\/31\/criticism-and-tipping-points\/\">Criticism and Tipping Points<\/a><\/li>\r\n<li><a href=\"http:\/\/www.electiongraphs.com\/2019\/02\/01\/so-what-to-do-for-2020\/\">So what to do for 2020?<\/a><\/li>\r\n<\/ul>\r\n","protected":false},"excerpt":{"rendered":"<p>2018 is over. Multiple candidates have announced they are at least investigating running for President in 2020, and a few are even past that stage. But before Election Graphs starts posting new graphs and charts for 2020, one more look back at the past. This will be the first in a series of blog posts &hellip; <a href=\"https:\/\/electiongraphs.com\/blog\/2019\/01\/27\/polling-averages-vs-reality\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> \"Polling Averages vs Reality\"<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[8,11,7,6],"class_list":["post-263","post","type-post","status-publish","format-standard","hentry","category-metaanalysis","tag-errors","tag-histogram","tag-historical","tag-polling-averages"],"_links":{"self":[{"href":"https:\/\/electiongraphs.com\/blog\/wp-json\/wp\/v2\/posts\/263","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/electiongraphs.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/electiongraphs.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/electiongraphs.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/electiongraphs.com\/blog\/wp-json\/wp\/v2\/comments?post=263"}],"version-history":[{"count":10,"href":"https:\/\/electiongraphs.com\/blog\/wp-json\/wp\/v2\/posts\/263\/revisions"}],"predecessor-version":[{"id":468,"href":"https:\/\/electiongraphs.com\/blog\/wp-json\/wp\/v2\/posts\/263\/revisions\/468"}],"wp:attachment":[{"href":"https:\/\/electiongraphs.com\/blog\/wp-json\/wp\/v2\/media?parent=263"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/electiongraphs.com\/blog\/wp-json\/wp\/v2\/categories?post=263"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/electiongraphs.com\/blog\/wp-json\/wp\/v2\/tags?post=263"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}