Latent Semantic Analysis of U.S. Presidential Inaugural Addresses

Latent Semantic Analysis of U.S. Presidential Inaugural Addresses

View Code & Details

Natural language processing project analyzing presidential speeches from 1789 to 2005 using TF-IDF and SVD

This project explores how American presidential rhetoric has evolved over time by analyzing inaugural addresses from 1789 to 2005 using advanced natural language processing techniques.

Methodology:

Using Python tools including BeautifulSoup, TF-IDF vectorization, and Singular Value Decomposition (SVD), this analysis uncovers hidden themes and patterns in presidential communication:

Data Collection - Scrapes presidential speeches from Project Gutenberg • Text Processing - Builds term-document matrices using TF-IDF weighting • Dimensionality Reduction - Applies SVD to identify latent semantic themes • Visualization - Maps speeches in 2D semantic space to reveal relationships

Key Findings:

The analysis identified ten major rhetorical themes that shaped presidential discourse:

  1. Formal governmental structure and constitutional frameworks
  2. Foundational principles and democratic ideals
  3. Economic policy and industrial development
  4. Foreign affairs and military matters
  5. Civil War era and religious appeals
  6. Enlightenment philosophy and early republicanism
  7. Democratic institutions and slavery debates
  8. Revolutionary concepts and national identity
  9. Democratic foundations and citizen participation
  10. Economic rhetoric and monetary policy

Technical Implementation:

The complete analysis pipeline runs in a single Jupyter notebook, handling everything from data acquisition to final visualization. This integrated approach ensures reproducibility and transparency in the analytical process.

Historical Insights:

By tracking how these themes emerge, fade, and interact across different presidential administrations, we can observe the evolution of American political discourse and the changing priorities of the nation.

This analysis demonstrates the power of computational linguistics to reveal patterns in historical texts that might otherwise remain hidden to traditional reading methods.