State of the Union Text Analysis

After the most recent State of the Union address by President Obama, a former speech writer for President G. W. Bush, Marc Thiessen, claimed that parts of the speech may have been plagiarized from Bush's 2007 speech. By analyzing transcripts of the speeches, can we confirm or reject this claim?

Word Cloud - President Obama - State of the Union speech 2014

Word Cloud - President G.W. Bush - State of the Union speech 2007

Part 1
The word clouds above, generated on, show a visualization of the top 30 most frequently occurring words in those two speeches. The bigger the word, the more frequently it was used. Words that would obfuscate the cloud - common English words like 'the' and variations of the word 'America' - were removed. These word clouds show a few common themes between the two speeches.

Marc Thiessen identified several specific phrases which he felt were plagiarized, all of which contained the word ‘opportunity.’ In order to perform an in-depth analysis, I utilized Laurence Anthony's AntConc software.

The image below shows all 22 times any variation of the word 'opportunity' was used in both presidents' speeches. For each occurrence, the AntConc concordance tool shows the surrounding words for context. We can see that no phrases from President G.W. Bush's speech were used verbatim in President Obama’s speech.

AntConc Concordance Tool - Search Term ‘opportun*’

Since there was no word-for-word plagiarism, I wanted to look more in-depth at the themes surrounding the word 'opportunity' to further evaluate Thiessen's claims. To explore this, I used the collocates tool in AntConc (output below). This tool shows words that are correlated to another word or phrase, in this case the word 'opportunity.' As the image shows, the top 3 collocates for 'opportunity' are ‘requires’, ‘hope’, and ‘future’, all of which only occur in President G.W. Bush’s speech. There do not appear to be any meaningful word-associations in President Obama’s speech on the specified search term. From this I conclude that although both presidents used the same word repeatedly, it was used in different contexts.

AntConc Collocates Tool - Search Term ‘opportun*’

Part 2
Are all State of the Union speeches basically the same? Reid Epstein theorizes that the speeches follow a template or pattern. Let's take a look at some analysis from the last 26 speeches (President G. H. W. Bush 1989 through President Obama 2014).

Word Count - State of the Union Speeches 1989 through 2014

Starting with a simple word count, we can glean some interesting trends. President Clinton had the two wordiest speeches (1995 and 2000) and the highest average. President G. H. W. Bush used the fewest words in 1990, and President G.W. Bush used the second fewest in 2002. Like father, like son?

The image below shows a concordance plot. Each horizontal bar represents one speech (you can see which one by looking at the FILE above each bar). The vertical lines show where a specified word occurred within the speech. Using this tool we can quickly see a visualization of words and themes used by different presidents. For example, below is a concordance plot for the term 'gun.' We can see a big cluster in President Clinton's 2000 speech, which was post Columbine. Interestingly, a smaller cluster appears in his 1994 speech in which he first stressed the importance of the Brady Bill and gun control. Over half of the references from 1989 to 2014 to the word ‘gun’ occurred in the 2000 speech.

Concordance Plot - Search Term ‘gun*’

Here's another interesting concordance plot. This one is for the word 'Iraq.' Starting in 2003, President G. W. Bush began to use that word dozens of times in each State of the Union speech. However, notice what happens from 2008 to 2009? The outgoing President made 39 references to Iraq, the highest yet, but when President Obama took over he made only 4 references. This reflects the change in foreign policy from President G.W. Bush to President Obama.

Concordance Plot - Search Term ‘iraq*’

With 197 references to 'Iraq' by 4 different presidents it may be difficult to grasp what message was being conveyed (although I'm sure each of us has an opinion). Below I have utilized the AntConc collocates tool again to show which words were highly associated with 'Iraq.' Some of the top words are 'disarming', 'capable', 'surge', 'increasingly', 'dictator', 'council', 'abandon', and 'inspectors.'

There are many, many more diagrams we could look at. But, overall I found a great amount of dissimilarity between all the speeches. Certain terms appear frequently in some speeches, but are completely absent from others. Some words and phrases are very common among all speeches including ’men and women’, ’United States of America’, ’God bless’, and ’Mr. President, Mr. Speaker’, but these are formalities, not content. Furthermore, there was a big variation in length of speeches, some twice as long as others. Based on my findings, I did not any patterns or templates for the speeches. That is not to say that the patterns are not there, just that I did not find them.

Data Sources: State of the Union transcripts