Natural Language Processing: Topic Modeling Trump Speeches
I recently dove into a project to conduct a topic modeling analysis on President Donald Trump’s speeches. I web-scraped 8 speeches from October 2020 in the following states:
- Arizona
- Nebraska
- Wisconsin
- Michigan
- Pennsylvania
Methodology
Step 1: Import data/text
Step 2: Clean and preprocess data/text
Step 3: Perform topic modeling analysis
Step 4: Recommend talking points based on extracted topics
Data
Speech transcripts (text) can be found here
Web-Scraping
I built a function in python to automate this for any link from the website mentioned above:
import requests
from bs4 import BeautifulSoup
from IPython.core.display import display, HTMLdef get_transcript(url):
"""
Takes a Trump speech URL and extracts the transcript
"""
transcript = []
response = requests.get(url)
page = response.text
soup = BeautifulSoup(page, "lxml")
for element in soup.find_all('p'):
transcript.append(element.text)
return transcript
Clean Data
Here’s a couple of functions: ‘clean’ prepares the text for preprocessing and ‘preprocess’ tokenizes, lemmatizes, removes stop words and punctuation from the text data.
Topic Modeling
Here are the overall topics generated from Donald Trump’s 8 speeches:
Arizona Topics:
Nebraska Topics:
Wisconsin Topics:
Michigan Topics:
Pennsylvania Topics:
Conclusion:
My findings are quite interesting. Overall, it appears Trump as a few universal themes in his speeches, regardless of which state he’s in.
As we travels state by state, he tweeks his speeches to appeal what those residents care about. For example, in Arizona, he focuses on
China/World Trade and Conservative Values, such as gun rights and anti-abortion principles.
Please visit Github repo for full code:
Contact:
e-mail: md.ghsd@gmail.com