
Latent Semantic Analysis of U.S. Presidential Inaugural Addresses
View Code & DetailsNatural language processing project analyzing presidential speeches from 1789 to 2005 using TF-IDF and SVD
This project explores how American presidential rhetoric has evolved over time by analyzing inaugural addresses from 1789 to 2005 using advanced natural language processing techniques.
Methodology:
Using Python tools including BeautifulSoup, TF-IDF vectorization, and Singular Value Decomposition (SVD), this analysis uncovers hidden themes and patterns in presidential communication:
• Data Collection - Scrapes presidential speeches from Project Gutenberg • Text Processing - Builds term-document matrices using TF-IDF weighting • Dimensionality Reduction - Applies SVD to identify latent semantic themes • Visualization - Maps speeches in 2D semantic space to reveal relationships
Key Findings:
The analysis identified ten major rhetorical themes that shaped presidential discourse:
- Formal governmental structure and constitutional frameworks
- Foundational principles and democratic ideals
- Economic policy and industrial development
- Foreign affairs and military matters
- Civil War era and religious appeals
- Enlightenment philosophy and early republicanism
- Democratic institutions and slavery debates
- Revolutionary concepts and national identity
- Democratic foundations and citizen participation
- Economic rhetoric and monetary policy
Technical Implementation:
The complete analysis pipeline runs in a single Jupyter notebook, handling everything from data acquisition to final visualization. This integrated approach ensures reproducibility and transparency in the analytical process.
Historical Insights:
By tracking how these themes emerge, fade, and interact across different presidential administrations, we can observe the evolution of American political discourse and the changing priorities of the nation.
This analysis demonstrates the power of computational linguistics to reveal patterns in historical texts that might otherwise remain hidden to traditional reading methods.