Natural Language Processing: Topic Modeling Trump Speeches

Marcos Dominguez
3 min readNov 15, 2020

I recently dove into a project to conduct a topic modeling analysis on President Donald Trump’s speeches. I web-scraped 8 speeches from October 2020 in the following states:

  1. Arizona
  2. Nebraska
  3. Wisconsin
  4. Michigan
  5. Pennsylvania

Methodology

Step 1: Import data/text

Step 2: Clean and preprocess data/text

Step 3: Perform topic modeling analysis

Step 4: Recommend talking points based on extracted topics

Data

Speech transcripts (text) can be found here

Web-Scraping

I built a function in python to automate this for any link from the website mentioned above:

import requests
from bs4 import BeautifulSoup
from IPython.core.display import display, HTML
def get_transcript(url):
"""
Takes a Trump speech URL and extracts the transcript
"""
transcript = []

response = requests.get(url)
page = response.text
soup = BeautifulSoup(page, "lxml")

for element in soup.find_all('p'):
transcript.append(element.text)

return transcript

Clean Data

Here’s a couple of functions: ‘clean’ prepares the text for preprocessing and ‘preprocess’ tokenizes, lemmatizes, removes stop words and punctuation from the text data.

Topic Modeling

Here are the overall topics generated from Donald Trump’s 8 speeches:

Arizona Topics:

Nebraska Topics:

Wisconsin Topics:

Michigan Topics:

Pennsylvania Topics:

Conclusion:

My findings are quite interesting. Overall, it appears Trump as a few universal themes in his speeches, regardless of which state he’s in.
As we travels state by state, he tweeks his speeches to appeal what those residents care about. For example, in Arizona, he focuses on
China/World Trade and Conservative Values, such as gun rights and anti-abortion principles.

Please visit Github repo for full code:

Contact:

e-mail: md.ghsd@gmail.com

linkedin: https://www.linkedin.com/in/marcosdominguez2018/

--

--

Marcos Dominguez

Data Scientist with a background in banking and finance. I love statistics, programming, and machine learning.