The FAHA institute provides online and in-person education aimed at a broad range of humanities scholars. Participants will gain a theoretical and practical understanding of text analysis methods, and will learn how to extract content and derive meaning from digital sources, enabling new humanities scholarship.
Old Bailey collage
For this course we will be using the latest version "Anaconda"--a distribution of the Python programming langauge with pre-configured settings designed for text mining, data analytics, and more! You can download Anaconda here.
To access the Notebooks for this course you can click the green "code" button (top right corner) and "Download Zip."
Alternatively, you can "clone" the repository onto your computer using terminal with the following command:
git clone https://github.com/stephbuon/faha.git
While students are encouraged to discover archives that match their interests, we provide access to four pre-curated data sets that can be downloaded onto your computer. These data sets have undergone transformed from their original sources (see data citations).
The U.S. Congress Congressional Records: The Congressional Record is the official record of the proceedings and debates of the United States Congress. It is published online daily when Congress is in session.
The Hansard Debates: Hansard is the name of the transcripts of Parliamentary debates in Britain. We provide access to the 19th-century corpus, debates from the House of Commons and the House of Lords.
The Minutes from the State Education Boards: We specifically provide data for the State Education Board for Loudon County, Virginia for the year 2021.
- The U.S. Congress (1873–2017)
- The Hansard British Parliamentary Debates
- Minutes from the State Education Boards
Buongiorno, Steph, Robert Kalescky, Omar Alexander Cerpa, and Jo Guldi. "The Hansard 19th-Century British Parliamentary Debates with Improved Speaker Names: Parsed Debates, N-Gram Counts, Special Vocabulary, Collocates, and Topics", https://doi.org/10.7910/DVN/ZCYJH8, Harvard Dataverse, V1, 2022, UNF:6:wFlN6+URD9Q9BWYxgZgu1A== [fileUNF]
Gentzkow, Matthew, Jesse M. Shapiro, and Matt Taddy. "Congressional Record for the 43rd-114th Congresses: Parsed Speeches and Phrase Counts." Palo Alto, CA: Stanford Libraries [distributor], 2018-01-16. https://data.stanford.edu/congress_text
Odell, Evan. "Hansard Speeches 1979-2020 Version 3.0.1," https://evanodell.com/projects/datasets/hansard-data/, Evan Odell, V3, N.D.