Biden Looking Strong in Pennsylvania

Since the last update, Emerson released a new poll in Pennsylvania.

This new poll resulted in the following changes:

Biden vs. Trump:

  • Biden vs. Trump state category change: PA has moved from Weak Biden to Strong Biden [Chart]
  • Trump best case vs. Biden has changed: Biden 230 to Trump 308 -> Biden 250 to Trump 288 [Chart]

I'm starting to feel like I keep repeating myself, but the story of the state level polls so far continues to be Biden doing better than other Democrats against Trump. The newest update is just the latest poll to move things in that direction.

Looking specifically at Biden, the "envelope" looks like this at the moment:

The upper side of that chart shows the optimistic side for Trump dropping from a 106 electoral vote win to only a 38 electoral win as more and more polls show Biden with more strength than the simple average of the last five elections would have predicted.

Looking at some of the other candidates as well, we see this trend is not universal at all.

Biden and Warren have improved in this view, while Sanders and O'Rourke have gotten weaker. Booker hasn't yet moved this metric.

Finally, let's look at just Pennsylvania:

Biden ahead of the pack as usual, and the only one in "strong" territory against Trump. Sanders and Warren are also stronger than the "starting point." Nobody has polled Booker in Pennsylvania so far. O'Rourke though is looking weaker.

585.2 days until polls start to close.

For more information:

This post is an update based on the data on the Election Graphs Electoral College 2020 page. Election Graphs tracks a poll-based estimate of the Electoral College. The charts, graphs, and maps in the post above are all as of the time of this post. Click through on any image to go to a page with the current interactive versions of that chart, along with additional details.

Follow @ElectionGraphs on Twitter or Election Graphs on Facebook to see announcements of updates. For those interested in individual poll updates, follow @ElecCollPolls on Twitter for all the polls as I add them. If you find the information in these posts informative or useful, please consider visiting the donation page.

 

Sanders and O'Rourke in the Midwest

Since the last updateFirehouse/0ptimus released three new polls in Pennsylvania [20 EV], Michigan [16 EV], and Wisconsin [10 EV] resulting in the following notable changes in Election Graphs status:

  • Sanders vs. Trump
    • Michigan has moved from Strong Sanders to Weak Sanders
    • Wisconsin has moved from Weak Sanders to Strong Sanders
    • Trump best case vs. Sanders has changed: Sanders 220 to Trump 318 -> Sanders 214 to Trump 324
  • O'Rourke vs. Trump
    • Michigan has moved from Strong O'Rourke to Weak O'Rourke
    • Trump best case vs. O'Rourke has changed: O'Rourke 216 to Trump 322 -> O'Rourke 200 to Trump 338

Biden vs. Trump was the only other candidate pairing looked at in these polls, but no category or tipping point changes resulted.

Before I show all the relevant graphs, I thought I'd show another comparison for these three candidate pairs. Let's look at the current spectrum of the states:

Click through on any of these to look at the specific numbers or details. The point here is to look at the overall pattern though. You can see here how at the moment as you move from Biden to Sanders to O'Rourke, Trump gains strength, moving more states from Weak Dem to Weak Trump, and more areas slip from Strong Dem to Weak Dem. There is even one move from Strong Trump to Solid Trump (Texas).

This is just another view of the theme that has been present in all the Election Graphs update posts so far: Biden doing better than other Democrats against Trump. This situation will very likely change. The other candidates will get more exposure, and name recognition factors will fade. But that hasn't happened yet.

OK. Now to graphs.

On the national level, Trump's "best case" changed against these three candidates. Best case is perhaps a misnomer here, as 2016 showed, it is possible for a candidate to do better than the "best case" described here, which is merely the candidate winning all of the close states. But doing better than this best case is enough of an outlier to be a surprise. So it still serves as a good reference for the upper end of likely outcomes.

Looking at the Trump best cases only, not the "expected case" or "tipping point" that we usually spend more time looking at, Trump does best against O'Rourke, followed by Sanders, then Booker, then Warren, then finally does worst against Biden. (As usual, I am looking only at the five best-polled candidate pairs.)

Finally, a quick look at the comparison charts for the three states included in this latest batch of polls. Remember only Biden, Sanders, and O'Rourke were in this batch of polls.

Pennsylvania was a mix with Biden improving, while Sanders and O'Rourke did worse.

Trump improved against all three in Michigan.

All three Democrats improved their position in Wisconsin.

And yet again Biden is doing better against Trump than the other Democrats in all three states.

587.0 days until polls start to close on Election Day 2020.

For more information:

This post is an update based on the data on the Election Graphs Electoral College 2020 page. Election Graphs tracks a poll-based estimate of the Electoral College. The charts, graphs, and maps in the post above are all as of the time of this post. Click through on any image to go to a page with the current interactive versions of that chart, along with additional details.

Follow @ElectionGraphs on Twitter or Election Graphs on Facebook to see announcements of updates. For those interested in individual poll updates, follow @ElecCollPolls on Twitter for all the polls as I add them. If you find the information in these posts informative or useful, please consider visiting the donation page.

Iowa Update

Changes to the Election Graphs analysis based on a new poll from Emerson in Iowa:

  • Booker vs. Trump replaces Harris vs. Trump in the Election Graphs "five best-polled candidate matchups" group that we'll spend time talking about here on the blog.
  • Sanders vs. Trump tipping point change: Sanders by 0.9% in IA -> Sanders by 1% in IA
  • Booker vs. Trump tipping point change: Booker by 1.1% in IA -> Booker by 0.3% in IA

With these updates, the overall picture we have described here in previous posts remains the same. Biden is doing better against Trump than the other Democrats.

Looking at the tipping point since that is what changed with this update…

At the moment Biden leads Trump by 1.2% in the tipping point metric, Sanders leads by 1.0%, and Booker leads by 0.3%.

Warren and O'Rourke both trail Trump by 0.1%.

And specifically in Iowa:

With this new poll, Biden's lead over Trump in the Election Graphs Iowa average increases from 1.2% and Sanders improves by 0.1%.

Meanwhile, Warren falls further behind Trump by 0.4%, and Booker's lead drops by 0.8%.

Emerson didn't poll O'Rourke this time, so no change there.

Once again, as with the national picture, Biden does better than the other Democrats against Trump.

It is important to note though that in both Iowa and the country as a whole, the numbers so far continue to show an exceptionally close race. Biden's 1.2% tipping point lead is less than the 1.6% tipping point lead Clinton had on election day 2016.

And we still have incredibly sparse data. Biden vs. Trump has the best polling, and even in that matchup only 9 out of 56 jurisdictions that determine electoral votes have been polled at all, and none have had more than three polls. So the Election Graphs numbers are still mostly based on prior election history rather than actual 2020 polling.

But with the state polls so far, mirroring the primary polling we have seen, the theme, for now, is that Biden does best, followed by Sanders, with others trailing significantly. The Biden/Sanders advantage is likely still driven primarily by name recognition, but as the year progresses, that will be less and less of a factor.

589.9 days left until election day polls start to close.

For more information:

This post is an update based on the data on the Election Graphs Electoral College 2020 page. Election Graphs tracks a poll-based estimate of the Electoral College. The charts, graphs, and maps in the post above are all as of the time of this post. Click through on any image to go to a page with the current interactive versions of that chart, along with additional details.

Follow @ElectionGraphs on Twitter or Election Graphs on Facebook to see announcements of updates. For those interested in individual poll updates, follow @ElecCollPolls on Twitter for all the polls as I add them. If you find the information in these posts informative or useful, please consider visiting the donation page.

Biden Strong in Wisconsin

Since the last update, there have been polls in Michigan and Wisconsin.

The only category changes that resulted were in the Biden vs. Trump matchup.

  • Michigan moved from Solid Biden to Strong Biden
  • Wisconsin moved from Weak Biden to Strong Biden.

Only the second of these changes the Election Graphs "expected range," lowering Trump's "Best Case" from a 98 electoral vote win to a 78 electoral vote win by taking Wisconsin out of the list of swing states.

That dip in the upper right of the "envelope" represents this change.

Now, keep in mind that this is the result of the very first Biden vs. Trump poll in Wisconsin. It showed Biden leading Trump by 8.6% in a two-way race, and leading by 6.8% if you include Schultz as a third party candidate. Since the 2000 election result, where the Democrat won by 0.2% was getting knocked off the average, this represents an improvement.

The average goes from a Democratic lead of 4.1% based on the last five presidential elections, to a 5.6% Biden lead based on the previous four presidential elections, plus this first poll.

But this is still only one poll. And of course people will point out that the final Election Graphs average in Wisconsin in 2016 was Clinton up by 7.1%, and Trump still won by 0.8%. So yes, polls can be wrong. But that was one of only a very few. For now though, with this minimal data, Biden is looking very good in Wisconsin.

And just checking the comparisons with how other candidates are doing against Trump, Biden is still doing better than the other four "best polled" Democrats in both Michigan and Wisconsin.

596.6 days until polls start to close. Stay tuned.

For more information:

This post is an update based on the data on the Election Graphs Electoral College 2020 page. Election Graphs tracks a poll-based estimate of the Electoral College. The charts, graphs, and maps in the post above are all as of the time of this post. Click through on any image to go to a page with the current interactive versions of that chart, along with additional details. Follow @ElectionGraphs on Twitter or Election Graphs on Facebook to see announcements of updates. For those interested in individual poll updates, follow @ElecCollPolls on Twitter for all the polls as I add them. If you find the information in these posts informative or useful, please consider visiting the donation page.

The First 2020 Polls

A few days ago I looked at where all the states ended up if you just look at the average results of the last few elections. No 2020 specific data.

But there have already been state level 2020 polls. Sixteen by my count. They are now all included on Election Graphs.

When I do updates here, unless there is a strong reason to do otherwise, I'm going to only discuss the five "best polled" candidate pairs. At the moment, the only Republican polled so far has been Trump. (Sorry Weld fans.) The five best polled Democrats against him right now are Biden, Warren, O'Rourke, Sanders, and Harris… in that order.

So with that out of the way, let's start looking at some graphs!

The chart above shows the "expected" electoral college result for each of these five candidate pairs, assuming each candidate wins all the states they lead in the Election Graphs five poll averages. This average still includes previous election results since there are no states where there are actually five polls yet.

So what do we see? From the very limited polling we have so far, we see Biden doing noticeably better than any of the other four Democrats when pitted against Trump. In the electoral college, Biden leads Trump by a 42 electoral vote margin. Sanders leads Trump by 6 electoral votes. And then Warren, O'Rourke, and Harris all lose to Trump by 6 electoral votes.

The "tipping point" is perhaps a better way to look at things. It is similar to looking at a popular vote margin, but adjusted for the structure of the electoral college. Here Biden has a 1.2% tipping point margin over Trump and Sanders has a 0.9% tipping point margin over Trump, but Warren, O'Rourke, and Harris all trail Trump by an 0.1% margin.

Frankly, ALL of these results, for all five possible opponents to Trump, are firmly within the "too close to call" zone. Even if it was the day before the election, not 607 days before the election. With this much time left, and with the extremely limited polling so far, any patterns we see may easily disappear as new polls come in. It might even be safe to say they will probably disappear. It is still very very early.

Having said that, the Biden advantage is even more striking as you start looking state by state. The following charts show how the polling average in each state has moved as the polls so far have come in.

Note: Keep in mind these are all measuring how each of the Democrats would fare against Trump in the general election, NOT how they might fare against each other in the primaries.

Watch the red line. The red line is Biden. Down is better for Democrats.

OK, in California Biden hasn't broken out. He hasn't actually been polled in California yet. So his average is just the average of the last five presidential elections.

Biden doing better than the other four Democrats in Texas.

Biden doing better than the other four Democrats in Ohio.

Biden doing better than the other four Democrats in Michigan.

Biden doing better than the other four Democrats in North Carolina.

Biden doing better than the other four Democrats in Arizona.

OK, none of the top five have been polled in Minnesota yet. (Only Klobuchar has been.) So they are all still on one line.

Biden doing better than the other four Democrats in South Carolina.

Biden doing better than the other four Democrats in Iowa.

OK, we finally have a state where Biden has been polled and he isn't doing better than the other Democrats. In New Hampshire, both Sanders and Warren do better against Trump than Biden does.

But that is it. Out of 10 states where we have state polls so far, Biden does better than the other four Democrats in 7. In 2 Biden hasn't been polled yet. Only in 1 does another Democrat do better against Trump.

Now, to be clear, at this stage in a Presidential race, this may be due entirely to name recognition. Most people may still not have much of an idea who Warren, O'Rourke, and Harris even are. But surely they would know who Sanders is, right? His name recognition must be comparable to Biden's. Right?

In any case, the way Biden consistently is outperforming other Democrats against Trump in the polls so far is striking. And he hasn't even officially said he is running yet.

It will be interesting to see if this pattern continues as the field starts to gel, and the other candidates get better known.

Election Graphs will of course update as the new polls come in…

For more information…

This post is an update based on the data on the Election Graphs Electoral College 2020 page. Election Graphs tracks a poll-based estimate of the Electoral College. The charts, graphs, and maps in the post above are all as of the time of this post. Click through on any image to go to a page with the current interactive versions of that chart, along with additional details. Follow @ElectionGraphs on Twitter or Election Graphs on Facebook to see announcements of updates. For those interested in individual poll updates, follow @ElecCollPolls on Twitter for all the polls as they are added. If you find the information in these posts interesting or useful, please consider visiting the donation page.

The 2020 Starting Line

Well, here we are. A few months later than I intended, but it is finally time to launch the Election Graphs coverage of Election 2020.

Before I start rolling in the state polls that have already come out, let's look at a more general Democratic vs Republican view based on the last few elections, just as I did four years ago.

Election Graphs uses a five poll average generally (with certain exceptions I won't get into here). When there have not yet been five polls, we use previous election results to "jump start" the poll averages.

The map above represents the average percentage margins (Republican-Democrat) over the last five election cycles. That would be 2000, 2004, 2008, 2012, and 2016. This is a very long time period. Things look different in 2019 than they did in 2000.

You could argue that just using the 2016 results would be a better starting point, but averaging over a long time period like this to some degree gives us an impression of where states stand independent of the specifics of any one specific election.

Looking at specific states and where those averages are, you get this spectrum:

Specifically, let's zoom in and look at the states where the margin in this 5 cycle average is under 10%:

These are the states (and Maine and Nebraska congressional districts) that look like they are in play based on the historical results from 2000 to 2016.

Once we get close to the election, "Strong" states mostly will slip out of reach. (Although Wisconsin was a "Strong Clinton" state and Trump still managed to win it, so it CAN happen, it is just exceedingly rare.) This far out though, just under 612 days until the first election night polls close, this whole zone is potentially up for grabs.

If the right set of events happens, even some states outside of this group may end up becoming competitive. Major changes do happen during presidential campaigns. But here at the beginning, it seems unlikely that any of the states (or CD's) outside of those above will be in play.

Things will move quickly as new polls putting actual candidates against each other get added. But this is the starting line.

Where does this put us in terms of who might be ahead or behind? Election Graphs traditionally not only shows an electoral college breakdown based on who is ahead in every state but also where things would end up if you let all of the "Weak" states swing to one or the other candidate to generate "Best Case" scenarios for each candidate.

Here is what that looks like at the starting line:

DEM REP Margin
REP Best 216 322 REP by 106 EV
Expected 272 266 DEM by 6 EV
DEM Best 332 206 DEM by 126 EV
The tipping point state is Iowa where DEM is ahead by 1.1%.

So… we start at essentially a dead heat. The average of the last five elections (three where Republicans won, and two where Democrats won), is a Democratic squeaker. When you throw the "Weak" states from side to side, you get a fairly large range as well.

That's as good a place to start a presidential race as any. Dead even.

Now the specific candidates for 2020 will start differentiating themselves with their campaigns, and they will move this to one side or another… or of course, we can arrive at the election itself with things still too close to call… which is always fun.

Before I close out this post and start adding in actual 2020 polls though, it is worth also looking at how this "five election average" has shifted over the years. I currently have elections back to 1980 entered into the data Election Graphs pulls from, so we can look at these averages back through 1996.

First up, here is the electoral college trend itself:

From 1996 through 2008, each election moved the electoral college results based on these averages toward the Democrats… even when the Republicans won… because they won by less than the Republicans from the earlier elections that dropped out of the average.

That stopped in 2012. Obama won in 2012, but not as strongly as Clinton won in 1992, which dropped off the five election average that year.

Then Trump won in 2016, which was obviously a more Republican result than Clinton winning in 1996.

2016 also reversed another trend, which was that the electoral college totals of the states that were closer than 5% had been going down with each election. In other words, there were fewer close states each time. But dropping 1996 and adding 2016 expanded the range between "best case" scenarios a bit. We have a wider playing field going into 2020 than we did going into 2016.

Next, the tipping point:

This basically shows the same pattern, so I won't spend time digging deeper there.

But there is one more place to spend a little time. Time to look at how each state moved from the 1996-2012 average to the 2000-2016 average.

Let's start this historical review with an animation of the national view flipping between how things looked four years ago, and how they look today:

As expected, since we are replacing Clinton's 1996 win with Trump's 2016 win, the overall map becomes redder and blue retreats.

Now let's look at individual states that moved to different Election Graphs categories ("Solid", "Strong", "Weak" for either party) in this change.

Of these category changes, only two favored the Democrats:

Every other category change was toward the Republicans:

All of the states moved of course. The ones highlighted here are just the ones that shifted between my categories. And of course, many of those were not decisive. But the shift toward the Republicans is clear.

The stage is set.

Next up, very soon, I'll start folding in the polls on specific candidate matchups that have already been released on the state level, and we'll see how things look once we are looking at real data on the 2020 race rather than just long term generic averages.

The race is on, and Election Graphs will be tracking it for you.

Here we go.

So what to do for 2020?

This is the sixth and LAST in a series of blog posts for folks who are into the geeky mathematical details of how Election Graphs state polling averages have compared to the actual election results from 2008, 2012, and 2016. If this isn't you, feel free to skip this series. Or feel free to skim forward and just look at the graphs if you don't want or need my explanations.

If you just want 2020 analysis, stay tuned, that will be coming soon.

You can find the earlier posts here:

The electoral College trend chart

In the last few posts, I spent a lot of time on looking at various ways of determining what is a "close state". This is because in the past Election Graphs has defined three classifications:

  • "Weak": Margin < 5% – States that really are too close to call. A significant polling error or rapid last minute movement before election day could flip the leader easily.
  • "Strong": 5% < Margin < 10% – States where one candidate has a substantial lead, but where a big event could still move the state to "Weak" and put it into play.
  • "Solid": Margin > 10% – States where one candidate's lead is substantial enough that nobody should take seriously the idea of the leader not actually winning.

The "main" chart on Election Graphs has been the Electoral College Trend Chart. The final version on Election Day 2016 looked like this:

The "band" representing the range of possibilities goes from all the Weak states being won by the Democrat, to all the weak states being won by the Republican.

One of the reasons for all the analysis in this series is of course that this method yielded a "best case" for Trump of a 66 EV margin over Clinton. But the actual earned margin (not counting faithless electors) was 74 EV.

So the nagging question was if these bounds were too narrow. Would some sort of more rigorous analysis (as opposed to just choosing a round number like 5%) lead to a really obvious "oh yeah, you should use 6.7% as your boundary instead of 5%" realization or something like that.

After digging in and looking at this, the answer seems to be no.

As I said in several venues in the week prior to the 2016 election, a Trump win, while not the expected or most likely result given the polling, should not have been surprising. It was a close race. Trump had a clear path to victory.

But the fact he won by 74 EV (77 after faithless electors) actually was OK to be surprised about.

Specifically, the fact that he won in Wisconsin, where the Election Graphs poll average had Clinton up by 7.06% is an outlier based on looking at all the poll average vs actual results deltas from the last three cycles. It is the only state in 2016 where the result was actually surprising. Without Wisconsin, Trump would have won by 54 EV, which was within the "band".

Advantages of simplicity

So after all of that, and this will be very anti-climactic, I've decided to keep the 5% and 10% boundaries that I've used for 2008, 2012, and 2016.

Several of the ways of defining close states that I looked at in this series are actually quite tempting. I could just use the 1σ boundaries of one of the methods to replace my 5% boundary between "weak" and "strong" states, and the 2σ numbers to replace the 10% boundary between "strong" and "solid" states.

I could even use one of the asymmetrical methods that reflect that things may be different on the two sides.

But frankly, I keep coming back to the premise of Election Graphs being that something really simple can do just as well as fancy modeling.

From the 2016 post mortem here is a list of where a bunch of the election tracking sites ended up:

 

  • Clinton 323 Trump 215 (108 EV Clinton margin) – Daily Kos
  • Clinton 323 Trump 215 (108 EV Clinton margin) – Huffington Post
  • Clinton 323 Trump 215 (108 EV Clinton margin) – Roth
  • Clinton 323 Trump 215 (108 EV Clinton margin) – PollyVote
  • Clinton 322 Trump 216 (106 EV Clinton margin) – New York Times
  • Clinton 322 Trump 216 (106 EV Clinton margin) – Sabato
  • Clinton 307 Trump 231 (76 EV Clinton margin) – Princeton Election Consortium
  • Clinton 306 Trump 232 (74 EV Clinton margin) – Election Betting Odds
  • Clinton 302 Trump 235 (67 EV Clinton margin) – FiveThirtyEight
  • Clinton 276 Trump 262 (14 EV Clinton margin) – HorsesAss
  • Clinton 273 Trump 265 (8 EV Clinton margin) – Election Graphs
  • Clinton 272 Trump 266 (6 EV Clinton margin) – Real Clear Politics
  • Clinton 232 Trump 306 (74 EV Trump margin) – Actual result

 

The only site (that I am aware of) that came closer to the actual result than I did was RCP… who like me just used a simple average, not a fancy model.

This says something about sticking with something simple.

Or maybe I was just lucky.

To be fair, there was a lot of movement just in the last day of poll updates. Before that, I had a 108 EV margin for Clinton as my expected case and would have been one of the worst sites instead of one of the best sites in terms of final predicted margin. Noticing that last minute Trump surge in the last few polls in some critical states was important, and the fact Election Graphs uses a "last 5 polls" methodology made our numbers able to pick up that change quickly.

But even aside from how close we got, a regular person who doesn't follow these things that closely could come to Election Graphs and just say "oh, close states are under 5%, they could go either way". More complex models have their places, but it hasn't been Election Graphs' niche. One of the main points of this site was always doing something relatively simple, and still getting decent results.

So. I'm sticking to 5% and 10%. Even though they are just nice round numbers, without a mathematical justification.

Because they are nice round numbers that are still reasonable for these purposes, and not too far out from numbers you COULD pick with some sort of mathematical hand waving if you wanted to.

So. Less than 5% for a weak state, less than 10% for a strong state, and over 10% is solid.

Just like before.

What about the tipping point?

OK, with everything I have said about nice round boundaries, and keeping it simple, I think I will actually allow myself to move in the limits of what I show as "close" on the chart of the tipping point. Maybe 5% is too close to call on a state level, but if the tipping point is at 5%, that is more substantial.

Having said that, 2016 did see 6% swings in the tipping point within two week periods of time. It can move quite a bit, quite quickly. So of course, just watch, 2020 will see someone with a 6% lead in the tipping point on election day proceed to lose the race. But for now, I feel OK tightening these bounds.

I'll be using the 2.36% and 3.45% levels described in the last post to really emphasize that if you are in that zone, you have a super close race. Regardless of what the electoral college center line is, or the "best case" scenarios for the two candidates, if we see a 1% tipping point margin again, it would be crazy not to emphasize that you are looking at a race that is too close to call.

[Note added 2019-03-01: Once I started actually building out the 2020 site, I tried changing the limits for the tipping point as described above, but with everything else left at 5% and 10%, it looked out of place, so I actually left them at 5% and 10% as well. So alas, all this analysis of other ways to define limits that were not nice round numbers ended up with me just using the nice round numbers from before.]

What about that Monte Carlo thing?

Well, once again ignoring everything I said above about simplicity, I've never quite liked the fact that the "band" is generated by swinging ALL the close states back and forth, which is actually not very likely. The fact that a bunch of states are close and could go either way, does not imply that it would be easily possible for them to ALL flip the same direction at the same time. (Although yes, if polling assumptions are all wrong the same way, all the polling may be off in the same direction.)

Election Graphs shows that whole range of possibility, with no way of showing some outcomes within the range are more likely than others, or that some outcomes outside the range actually are still possible, just less likely. It would be nice to add some nuance to that.

And I'll be honest, I've been slowly introducing more complexity over the last three cycles, and I kind of enjoy it. For instance, the logic for how to determine which polls to include in the "5 poll average" that I used in 2016 has a lot more going on than what I did in 2008 or 2012. And for that matter, in 2016 everything was generated automatically from the raw poll data, while in previous cycles I did everything by hand. Progress!

So… while I am going to keep the main display using the 5% and 10% boundaries, I am actually kind of excited to now have a structured way to also do a Monte Carlo style model…

I would use the data from the Polling Error vs Final Margin post to do some simulations and show win odds and electoral college probability distributions as they change over time as well as the current numbers. I have a vision in my head for how I would want it all to look.

But that would be an alternative view, not the main one… if I actually have time.

The plan

I had originally intended to have the 2020 site up by the day after the 2018 midterms. Then I'd hoped to be done by the end of November. Then December. Then January. But life and other priorities kept getting in the way.

I'd also intended to launch with a variety of changes and refinements over the 2016 site, including perhaps changing the 5% and 10% bounds, but also other things. Some changes to how some of the charts look. Additional changes to how the average itself was calculated. A completely different alternative view to switch to if a third party was actually strong enough to win electoral votes. Or the Monte Carlo view. Or making the site mobile friendly. Or a bunch of other things.

But frankly, I've just run out of time. I now know of seven state level general election matchup polls for 2020 that are already out, and there are probably more I have missed. And the pace is increasing rapidly now that candidates are announcing. So there are already results I could be showing.

(Yes, I am quite aware that general election match up polls this far out are not predictive of the actual election at all, but they still tell you something about where things are NOW.)

So at this point my priority is to just get the site up and running as fast as possible, which means making all the logic and visuals an exact clone of 2016, just with 2020 data. At least to start with.

After that, I'll start layering in changes or additions if and when I have time to do so. I still hope to be able to do a variety of things, but that depends on many factors, so I'm not making any promises at this point. I'll do what I can.

So that's the plan.

Conclusion

I have been dragging my feet working off and on (mostly off) on collecting the data, making the graphs, and writing my little commentary on this series of posts for literally more than six months. Maybe more than nine months. I forget exactly when I started.

If there are any of you who have actually read all of this to the end, thank you. I don't expect there are many of you, if any. That's just the way it goes.

But I felt like I needed to get all this done and out before starting to set up the 2020 site. I wanted to see what the results of looking at this old data would show, and I wanted to share it. Maybe I didn't really need to and it was just an excuse to procrastinate on doing the actual site.

But I have no more excuses left. Time to start getting the 2020 site ready to go… I'll hopefully have the basics up very soon.

Stay tuned!

You can find all the posts in this series here:

Criticism and Tipping Points

This is the fifth in a series of blog posts for folks who are into the geeky mathematical details of how Election Graphs state polling averages have compared to the actual election results from 2008, 2012, and 2016. If this isn't you, feel free to skip this series. Or feel free to skim forward and just look at the graphs if you don't want or need my explanations.

If you just want 2020 analysis, stay tuned, that will be coming soon.

You can find the earlier posts here:

Criticism

So, after the Predicting 2016 by Cheating post went up, Patrick Ruffini decided to quote tweet it, after which Nate Silver replied saying "whoever did that is incompetent".

That was exciting.

In any case, despite being incompetent, I will soldier on.

A reminder here though that I am indeed an amateur doing this sort of thing for fun in my spare time. I am not a professional statistician, data scientist, or even pundit. (Although, like everybody else on the planet, I do have a podcast.)

This is not my day job. I make no money off this. I never expect to make any money off this. I just enjoy doing it. I am always happy to take constructive criticism. I've changed things on the site based on reader feedback before, and undoubtedly will again.

Also though, in this series of blog posts specifically, I have been exploring different ideas and ways of looking at the 2008-2016 data. The Monte Carlo simulation in the last post was NEVER a valid prediction for 2016, because it used the actual results of 2016 in the model. Which I said repeatedly in that post. It was just a proof of concept that using that data in that way would provide something reasonable looking.

I'm not sure if Nate actually read the posts describing how I was modeling things and all the caveats about how running that simulation was cheating since I was using 2016 data to predict 2016. Maybe he did. Maybe he didn't.

He is right of course that the Monte Carlo graph he was reacting to does give a much narrower distribution than his model did. The Polling Error vs Final Margin post shows how I got the probabilities that led it to be that narrow. The distribution is actually narrower than I expected coming in. But that particular way of looking at the data leads there. It may or may not be a good way of looking at things. I am experimenting.

Having said that, the results gave Trump win odds near what FiveThirtyEight had, but with the median being further toward Trump than their model, and with a narrower distribution. Looking at some other folks who showed distributions for 2016 on their sites (and still have them easily findable today in 2019), it looks like this distribution would not have been out of place. It didn't match any of them of course, since the methodology is different from all of them. But it isn't wildly out of line.

Running this on 2016 data is bogus of course, as I explained in the last post, and again a few paragraphs ago. But the results are interesting enough that using the data from the analysis in the Polling Error vs Final Margin post to do some Monte Carlo simulations for 2020 would at least be fun to look at.

OK, enough of that unintended detour. Now back to the originally intended topic for this post…

Tipping Points

All of the previous posts have been looking exclusively at the state poll averages as they compared to the actual election results in 2008 through 2016. But for the last couple of cycles, Election Graphs has also looked at the "tipping point". I borrowed the idea from the "meta-margin" Sam Wang at Princeton Election Consortium uses. Basically, it is the margin in the state that would put the winning candidate over the edge if you sorted the states by margin.

The tipping point essentially gives a measure of the overall margin in the national race, similar to a popular vote margin, but modified to account for the structure of the electoral college. It is a nice way of looking at who is ahead and who is behind in a way that isn't (quite) as volatile as looking directly at the center line of the electoral college estimates.

So how did the final Election Graphs tipping point numbers based on our state poll averages do compared to the actual tipping point as measured by the final vote?

For this, since there is only one tipping point per election, we unfortunately only have three data points:

In 2016, I used the same 5% boundary to determine what was "close" for the tipping point as I did for state poll averages. Once again just a round number, with nothing specific behind it other than a gut feel that less than 5% seemed close.

We only have three data points, but even with just that, we can produce a very VERY rough estimate of the 1σ and 2σ levels. Basically, for 1σ, you use the 2 closest of the 3 data points, and for 2σ you use all 3. This is ballpark only (at best) due to the low number of data points, but it gives an idea.

So to be 68.27% sure the current leader will actually win, you want a tipping point margin greater than 2.36%.

For 95.45% confidence, you want a tipping point margin lead of more than 3.45%.

OK, OK, that is kind of pathetic. I know. But there is only so much you can do with only three data points.

Anyway…

Clinton's final tipping point margin in 2016 was only 1.59% in Pennsylvania. Even assuming you only knew the 2008 and 2012 results, it should have been clear that a 1.59% tipping point represented an incredibly close race, far closer than either 2008 or 2016, and well within the realm where it could have gone either way.

The 5% boundary Election Graphs used in 2016 also indicated a close race of course, but narrowing that boundary based on the results of the last three elections seems like it would give a better impression on how close things need to be before we should consider that things really do look like a toss up where anything could reasonably happen.

So, what, if anything, will Election Graphs actually do differently for the 2020 cycle compared to 2016?

I'll talk about that in the next post…

You can find all the posts in this series here:

Predicting 2016 by Cheating


This is the fourth in a series of blog posts for folks who are into the geeky mathematical details of how Election Graphs state polling averages have compared to the actual election results from 2008, 2012, and 2016. If this isn't you, feel free to skip this series. Or feel free to skim forward and just look at the graphs if you don't want or need my explanations.

You can find the earlier posts here:

The 2016 states we got wrong

In the last post I used the historical deltas between the final Election Graphs polling averages in 2008-2016 to construct a model that given a value for a poll average, would produce an average and standard deviation for what we could expect the actual election results to be. So what can we do with that?

I don't have another election year with data handy to test this model on. No 2020, no 2004, no 2000, no earlier cycles either. So I'm going to look at 2016, even though I shouldn't.

Just as examples, lets look at what the odds this model would have given to the states Election Graphs got wrong in 2016… This technically isn't something you should do, since we are using a model on data that was used to construct the model, which isn't cool, but this is just to get a rough idea, so…

 Final AvgDem Win%Rep Win%Actual
WID+7.06%
98.76%1.24%R+0.77%
MID+2.64%70.59%29.41%R+0.22%
ME-CD2D+2.04%67.92%32.08%R+10.54%
PAD+1.59%66.27%33.73%R+0.71%
NVR+0.02%45.85%54.15%D+2.42%

The only one that is really surprising is Wisconsin, just as it was on Election night in 2016. Every other state was clearly a close race, where nobody should have been shocked about it going either way.

Wisconsin though? It was OK to be surprised on that one.

OK, and maybe the margin in ME-CD2, but not that Trump won it.

Doing some Monte Carlo

Let's go a bit farther than this though. One thing Election Graphs has never done is calculate odds. The site has provided a range of likely electoral college results, but never a "Candidate has X% chance of winning". But with the model we developed in the last post, we now have a way to generate the chance each candidate has of winning a state based on the margin in the poll average, and with that, you can run a Monte Carlo simulation on the 50 states, DC, and five congressional districts.

Now, once again, it is kind of bogus to do this for 2016 since 2016 data was used to construct the model, but we're just trying to get an idea here, and we'll just recognize this isn't quite a legitimate analysis.

So, here is a one off running the simulation 10,000 times to generate some odds. I'd probably want a bit larger number of trials if I was doing this "for real". I might also smooth the win chances curve in the last post to get rid of some of the jaggy bits before using it as the source of probabilities for the simulation. And obviously if you ran this again, you'd get slightly different results. But here is the result of that one run with 10,000 trials…

Well, that is a fun graph. It puts the win odds for Trump at 25.38%.

Now, I emphasize again that this is cheating. Because the facts of Trump's win are baked into the model. We're testing on our training data. That's not really OK. Having said that though…

How does this compare to where other folks were at the end of 2016? I looked at this in my last regular update prior to the results coming in on election night, so here is my summary from then:

So this Monte Carlo simulation using the numbers calculated as I have described would have given Trump better odds than anybody other than FiveThirtyEight. Again though, I am cheating here. A lot.

But here is the thing. Even though I would be giving Trump pretty good odds with this model, the chance of him actually winning by as much as he did (or more) is actually still tiny at 0.29%. With these odds a Trump win should not have been a surprise, but a Trump win by as much as he actually won by… that still should have been very surprising.

Comparisons

In this series of posts, we've been looking at a whole bunch of different ways of answering the basic question "what is a close state?". One reason I am looking at this is that the way Election Graphs has done our "range of possibilities" in the past is just to define what a close state is, and then let all of them swing either to one candidate or the other, and see what the range of electoral college results would be.

So lets see what electoral college ranges we would have gotten in 2016 with each of the methods I've gone over in the last few blog posts:

The two showing the ranges from the Monte Carlo simulation are dimmed out because they are determined by a completely different method, not swinging all close states back and forth.

It is interesting that both the 1 sided and 2 sided histogram 1σ boundaries would end up with the exact same boundaries as my current 5% bounds. But as you can see there are a ton of different ways to define "too close to call" which result in a huge variation on how the range of possibilities gets described.

So what to do for 2020? How will I define close states?

You'll have to wait a little longer for that.

Before I get to that, it is also worth looking at the national race as opposed to just states. On Election Graphs I have used the "tipping point" to measure that. What tipping point values should be considered "too close to call"?

I'll look at that in the next post….

You can find all the posts in this series here:

Polling Error vs Final Margin

This is the third in a series of blog posts for folks who are into the geeky mathematical details of how Election Graphs state polling averages have compared to the actual election results from 2008, 2012, and 2016. If this isn't you, feel free to skip this series. Or feel free to skim forward and just look at the graphs if you don't want or need my explanations.

You can find the earlier posts here:

Error vs Margin scatterplot

In the last post I ended by mentioning that assuming the error on poll averages was independent of the value of the poll average might not be valid. There are at least some reasonable stories you could tell that would imply a relationship. So we should check.

I've actually looked at this before for 2012. That analysis showed the error on the polls DID vary based on the margin of the poll average. But it wasn't "close states are more accurate". But maybe that pattern was unique to that year.

So I looked at this relationship again now with all the data I have for 2008, 2012, and 2016:

That is just a blob right? Not a scatterplot we can actually see much in? Wrong. There is a bottom left to upper right trend hiding in there.

Interpreting the shape of the blob

Before going further, let's talk a bit about what this chart shows, and how to interpret it. Here are some shapes this distribution could have taken:

Pattern A would indicate the errors did not favor either Republicans or Democrats, and the amount of error we should expect did not change depending on who was leading in the poll average or how much.

Pattern B would show that Republicans consistently beat the poll averages… so the poll averages showed Democrats doing better than they really were, and the error didn't change substantially based on who was ahead or by how much.

Pattern C would show the opposite, that Democrats consistently beat the poll averages, or the poll averages were biased toward the Republicans. The error once again didn't depend on who was ahead or by how much.

Pattern D shows no systematic bias in the poll averages toward either Republicans or Democrats, but the polls were better (more likely to be close to the actual result) in the close races, and more likely to be wildly off the mark in races that weren't close anyway.

Pattern E would show that when Democrats were leading in the polls, Republicans did better than expected, and when Republicans were leading in the polls, Democrats did better than expected. In other words, whoever was leading, the race was CLOSER than the polls would have you believe.

Finally, Pattern F would show that when the polls show the Democrats ahead, they are actually even further ahead than the polls indicate, and when the Republicans are ahead, they are also further ahead than the polls indicate. In other words, whoever is leading, the race is NOT AS CLOSE as the polls would indicate.

In all of these cases the WIDTH of the band the points fall in also matters. If you have a really wide band, the impact of the shape may be less, because the variance overwhelms it. But as long as the band isn't TOO wide the shape matters.

Also, like everything in this analysis, remember this is about the shape of errors on the individual states, NOT on the national picture.

Linear regressions

Glancing at the chart above, you can determine which of these is at play. But lets be systematic and drop some linear regressions on there…

2008 and 2012 were similar.

2016 had a steeper slope and is shifted to the left (indicating that Republicans started outperforming their polls not near 0%, but for polls the Democrats led by less than about 11%). But even 2016 has the same bottom left to top right shape.

I haven't put a line on there for a combination of the three election cycles, but it would be in between the 2008/2012 lines and the 2016 line.

Of the general classes of shapes I laid out above, Pattern F is closest.

Capturing the shape of the blob

But drawing a line through these points doesn't capture the shape here. We can do better. There are a number of techniques that could be used here to get insight into the shape of this distribution.

The one I chose is as follows:

  1. At each value for the polling average (at 0.1% intervals), collect all of the 163 data points that are within 5% of the value under consideration. For instance, if I am looking at a 3% Democratic lead, I look at all data points that were between an 8% Democratic lead and a 2% Republican lead (inclusive).
  2. If there are less than 5 data points, don't calculate anything. The data is too sparse to reach any useful conclusions.
  3. If there are 5 or more points, calculate the average and standard deviation, and use those to define boundaries for the shape.

Here is what you get:

This is a more complex shape than any of the examples I described. Because it is real life messy data. But it looks more like Pattern F than anything else.

It does flatten out a bit as you get to large polling leads, even reversing a bit, with the width increasing like Pattern D, and there some flatter parts too. But roughly, it is Pattern F with a pretty wide band.

Fundamentally, it looks like there IS a tendency within the state level polling averages for states to look closer than they really are.

Is this just 3P and undecided voters?

All of my margins are just "Republican minus Democrat". Out of everybody, including people who say they are undecided or support 3P candidates. But those undecideds eventually pick someone. And many people who support 3rd parties in polls end up voting for the major parties in the end. Could this explain the pattern?

As an example assume the poll average had D's at 40%, R's at 50%, and 10% undecided, that's a 10% R margin… then split the undecideds at the same ratio as the R/D results to simulate a final result where you can't vote "undecided", and you would end up with D's at 44.4% and R's at 55.6% which is an 11.1% margin… making the actual margin larger than the margin in the poll average, just as happens in Pattern F. 

Would representing all of this based on share of the 2-party results make this pattern go away?

To check this, I repeated the entire analysis using 2-party margins.

Here, animated for comparison, is the same chart using straight margins and two party margins.

While the pattern is dampened, it does not go away.

It may still be the case that if we were looking at more than 3 election cycles, this would disappear. I guess we'll find out once 2020 is over. But it doesn't seem to be an illusion caused simply by the existence of undecided and 3P voters.

Does this mean anything?

Now why might there be a tendency that persists in three different election cycles for polls to show results closer than they really are? Maybe close races are more interesting than blowouts so pollsters subconsciously nudge things in that direction? Maybe people indicate a preference for the underdog in polls, but then vote for the person they think is winning in the end? I don't know. I don't have anything other than pure speculation at the moment. I'd love to hear some insights on this front from others.

Of course, this is all based on only 3 elections and 163 data points. It would be nice to have more data and more cycles to determine how persistent this patten is, vs how much may just be seeing patterns in noise and/or something specific to these three election cycles. After all, 2016 DID look noticeably different than 2008 and 2012, but I'm just smushing it all together.

It is quite possible that the patterns from previous cycles are not good indicators of how things will go in future cycles. After all, won't pollsters try to learn from their errors and compensate? And in the process introduce different errors? Quite possibly.

But for now, I'm willing to run with this as an interesting pattern that is worth paying some attention to.

Election Result vs Final Margin

Before determining what to do with this information, lets look at this another way. After all, while the amount and direction of the error is interesting, in terms of projecting election results, we only really care if the error gives us a good chance of getting the wrong answer.

Above is the actual vote margins vs the final Election Graph margins, with means and standard deviations for the deltas calculated earlier plotted as well. Essentially, the first graph is this new second graph with the y=x line (which I have added in light green) subtracted out.

The first view makes the deviation from "fair" more obvious by making an unbiased result horizontal instead of diagonal, but this view makes it easier to see when this bias may actually make a difference.

Lets zoom in on the center area, since that is the zone of interest.

Accuracy rate

Out of the 163 poll averages, there were only actually EIGHT that got the wrong result. Those are the data points in the upper left and lower right quadrants on the chart above. That's an accuracy rate of 155/163 ≈ 95.09%. Not bad for my little poll averages overall.

The polls that got the final result wrong range from a 7.06% Democratic lead in the polls (Wisconsin in 2016) to a Republican lead of 3.40% (Indiana in 2008).

For curiosity's sake, here is how those those errors were distributed:

  D's lead poll avg
but R's win
R's lead poll avg
but D's win
Total
Wrong
2008 1 (MO) 2 (NC, IN) 3
2012 0 0 0
2016 4 (PA, MI, WI, ME-CD2) 1 (NV) 5
Total 5 3 8

So, less than 5% wrong out of all the poll averages in three cycles, but at least in 2016, some of the states that were wrong were critical. Oops.

Win chances

Anyway, once we have averages and standard deviations for election results vs poll averages, if we assume a normal distribution based on those parameters at each 0.1% for the poll average, we can produce a chart of the chances of each party winning given the poll average.

Here is what you get:

Alternately, we could recolor the graph and express this in terms of the odds the polls have picked the right winner:

You can see that the odds of "getting it wrong" get non-trivially over 50% for small Democratic leads. The crossover point is a 0.36% Democratic lead. With a Democratic lead less than that, it is more likely that the Republican will win. (If, of course, this analysis is actually predictive.)

You can also work out how big a lead each party would need to have to be 1σ or 2σ sure they were actually ahead:

  68.27% (1σ)
win chance
95.45% (2σ)
win chance
Republicans Margin > 1.11% Margin > 4.87%
Democrats Margin > 2.32% Margin > 6.42%
Average Margin > 1.72% Margin > 5.64%

Democrats again need a larger lead than Republicans to be sure they are winning.

These bounds are the narrowest from the various methods we have looked at though.

Can we do anything to try to understand what this would mean for analyzing a new race? We obviously don't have 2020 data yet, and I don't have 2004 or earlier data lying around to look at either. So what is left?

Using the results of an analysis like this to look at a year that provided data for that analysis is not actually legitimate. You are testing on your training data. It is self-referential in a way that isn't really OK. You'll match the results better than you would looking at a new data set. I know this.

But it may still give an idea of what kind of conclusions you might be able to draw from this sort of data.

So in the next post we'll take the win odds calculated above and apply them to the 2016 race, and see what looks interesting…

You can find all the posts in this series here: