The 2008 U.S. Presidential Election
Background
The 2008 US Election season is upon us, which will result in the
selection of a new President. The President of the US is elected in a
somewhat complicated fashion, not being directly elected by the
people, but instead being selected by an Electoral College composed of
electors from each of the 50 states and the District of Columbia.
There are a number of websites that collect and track polling data,
and make projections of the expected Electoral Vote outcome on
election day: November 4, 2008. These websites include:
Pollster.com,
electoral-vote.com,
fivethirtyeight.com, and
The Princeton Election Consortium,
and many, many others.
This exercise uses polling data to support the analysis
and exploration of the electoral college map. At the moment, this
exercise is targeted toward data synthesized by The Princeton Election
Consortium, but similar sorts of analyses could be done with other
data streams. (The Princeton data are usefully organized, the
underlying code is available for those who are interested in the details,
and the Princeton site offers a different computational approach
than some of the other sites.)
This exercise is not intended to be an
introduction to polling methodologies, which is the focus of much
serious work in the field (appropriately so). Nor is it intended to
be partisan. Rather, it aims to introduce a few of the issues that
arise in the analysis of US presidential elections, and perhaps feed
interest in more rigorous and systematic endeavors.
Learning Goals
Science: You will learn some basics of the US Electoral College
system (including what happens in the event of a 269-269 ties),
perhaps a little bit about polling methodologies, and connections
to some of the other course modules (e.g.,
NumberPartitioning and
Random Text generation).
Computation:
You will learn how to download data from the web and parse it for your
purposes, how to sample from probability distributions to generate
synthetic election data, and how to use convolutions to compute exact
combinatorial distributions.
Procedure
If, at any point, you're interested in getting more detail on the
polling data, the underlying methodology, or the more detailed
approach taken by The Princeton Election Consortium, see the
FAQ or the
information
For Fellow Geeks. (Yes, we're geeks, and we vote.)
- Download the file
Vote2008Hints.py from the course website, and rename it to
Vote2008.py.
-
In Vote2008.py,
first notice that information regarding states and their respective
number of electoral votes is provided. Examine the use of the
dict() and zip() functions to produce a dictionary
mapping state names (abbreviations) to numbers of electoral votes.
- NOTE: Nebraska and Maine do not award their electoral votes
on a winner-take-all basis as every other state does. For this exercise,
we will not worry about that subtlety, and instead treat the votes in
each state as a bloc. (The candidates, on the other hand, are indeed
worrying about that subtlely, as Sarah Palin's recent visit to Omaha
would seem to suggest.)
- Write a function GetCurrentDate() that returns today's
date as an integer indicating the day's position in the year.
The polling data we will download is indexed using those integers.
- Write a function GetPrincetonPollingData()
to download and process the current polling data from
http://election.princeton.edu/code/matlab/polls.median.txt
- Use
the urlretrieve function in the Python urllib
module to download the file.
- Parse the file to build up a dictionary named polls.
At the highest level, polls is keyed by a date (an integer, e.g.,
that returned by GetCurrentDate()) that holds another dictionary;
polls[date] is keyed by state names (abbreviations in the list of states)
and holds a tuple for each state; the tuple contains the polling
margin (Democrat-Republican) and the SEM (standard error of the mean)
of those polling data. NOTE: some of the polling data in polls.median.txt
quote a standard error of 0. This causes problems when one tries to compute
a probability of victory for each candidate (division by zero error).
Digging into the
relevant part of the Princeton code, we find the following MATLAB tidbit:
polls.SEM=max(polls.SEM,zeros(1,51)+2);, i.e., any reported
SEM less than 2 is bumped up to a floor of 2. I suggest we do the same
here.
- Write a function GetDemWinProbabilitiesFromPolls(polls, date)
that will return from the polling dictionary, for a specified date,
a dictionary that maps state names (abbreviations) to
the probability of a Democratic win in that state, assuming
a normal probability distribution. Hint: the erf function is
the integral of a gaussian, and the scipy.special module is a
useful place to look for special functions. The appropriate scaling
of the erf function can be uncovered
here. For those of you who might
be worried that this exercise has a Democratic skew, feel free to
write instead a function called GetRepWinProbabilitiesFromPolls
that computes the probability of a Republican win in each state.
- With a probability of a Democratic or Republican victory in each
state, one can sample those distributions to
simulate a synthetic election.
This is the approach, for example, taken by
fivethirtyeight.com.
Write a function SimulateElectionFromPolls(evotes, polls, date)
that uses the polling data from a specified date, and the dictionary
of electoral vote counts (evotes), to return a tuple of
(Democratic_wins, Republican_wins, totals) where
wins are lists of states won by each candidate, and
totals is a tuple of the total number of (Democratic, Republican) votes.
- Create an ensemble of randomly sampled elections (say, 10000),
and plot a histogram of all the Democratic EV totals, or, if you
prefer, all the Republican EV totals.
- The Princeton site argues that one does not actually need
to simulate many random elections.
Instead, one can calculate the
exact probability distribution of electoral college outcomes directly
from the individual state win probabilities and the number of electoral
votes in each state.
Write a function ComputeExactEVDistribution(evotes, polls, date)
to compute the exact electoral vote probability distribution from
polling data on a specified date, using the meta-analysis
convolution method described
in the FAQ
and in
the geeky MATLAB code. Computationally, this relies on the
fact that one can multiply two polynomials by doing a convolution of
their coefficients, suitably organized and padded.
- Compute the exact EV probability distribution and plot that, comparing
it with the simulated, sampled probability distribution computed previously.
- The exact probability distribution computed above assumes that the
win probabilities in each state are known exactly, but of course they
are only approximately known to within some margin of error. Analytically,
one could derive the propagation of errors (uncertainties) through the
polynomial equation,
or computationally, one could explore the variation in the computed
probability distribution by sampling win probabilities from a normal
distribution. Using the downloaded probability and margin of error data,
generate ensembles of win probabilities (drawn from a normal distribution),
and compute the "exact" probability distribution for each member of the
ensemble. How much variation is apparent in the resulting EV distributions?
- There are a finite set of possible electoral college outcomes. Some of
those outcomes involve a tie: 269-269. (Although with the recent movement
in the polling data, that possibility has become less likely than, say,
three weeks ago.)
- Civics 101 Quiz: What happens in the event of an electoral
college tie? Look
here,
here or
here
for more information. (And when is the only time in US history that such a tie
has taken place?)
- A tie, of course, is only possible if there are an even number of
total electoral votes. Prior to the
23rd Amendment to the Constitution, which granted electoral
votes to the District of Columbia, there were 535 total electoral votes.
The 23rd Amendment grants DC no more electoral votes than the least
populous state, which means DC currently gets 3 votes (equal to two senators
plus one representative, even though DC doesn't actually have those).
If DC were granted statehood, its number of electoral votes would
increase (based on its population), and it would therefore be possible
that it would then possess an even number of electoral votes,
thereby making the total once again odd. But I digress...
The possibility of an electoral college tie is essentially the problem
posed in the
Number Partitioning course module: namely, if you have a set
of integers, can you find a partitioning of those integers into two
subsets such that the sum of each subset is equal?
We are interesting in enumerating all scenarios of electoral college
outcomes, but there are 2**51=2251799813685248 total possibilities.
We can narrow down that prohibitively large number by enumerating
over all those states that are plausibly close in the polls, i.e.,
the "swing states".
- Write a function GetBaseStatesAndSwingStates that, given
the polling data and a specified threshold (percentage difference
in poll numbers), returns a tuple composed of: safe states in the
Democratic base, safe states in the Republican base, and swing states
(polling margin less than the threshold).
- Use the EnumerateAllScenario(n) function to create an
array listing all possible arrays of length n containing +1 and -1,
i.e., the n-dimensional hypercube with 2**n vertices.
- Write a function to enumerate all outcomes, based on the
swing state list generated above. Enumerate over all possible
swing state scenarios and sum up the electoral votes for each scenario.
What fraction result in electoral college ties? How does this fraction
compare to that in the exactly computed distribution? What state(s)
show the greatest amount of variability among the scenarios involving ties?
Does that variability correlate at all with how close the states are in
the polls? (presumably not)
- Play with the data, or dig around in the
Princeton code
if you're interested.
- You need not confine yourself
to the current day's polling data, but can travel back in time and
see how the expected distribution of electoral college votes has
shifted over the course of the campaign. Use the course wiki to
post snapshots of this evolving distribution, or figure out how to
make a movie showing the dynamics of the race.
- There are analyses undertaken
on the Princeton site (e.g., the Popular Meta-Margin)
and other polling/election web sites that you could implement with
these data.
- There is some
java code for displaying color-coded maps that you might be
able to use to make maps from your own data.
- Not to pick on Sarah Palin, but it is worth noting that there is a
new website
that uses Markov chains to randomly synthesize Sarah Palin quotations
from a database of her speeches. This is basically what we did
(with other source material)
in our introductory
Random Text Generation exercise.
Files
Christopher R. Myers
Last modified: October 10, 2008