Skip to content

Code for "Foundations and Applications of Humanities Analytics" (2022) at the Santa Fe Institute

License

Notifications You must be signed in to change notification settings

stephbuon/faha-2022

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Foundations and Applications of Humanities Analytics

The FAHA institute provides online and in-person education aimed at a broad range of humanities scholars. Participants will gain a theoretical and practical understanding of text analysis methods, and will learn how to extract content and derive meaning from digital sources, enabling new humanities scholarship.

Trulli

Old Bailey collage

First Time Set Up

For this course we will be using the latest version "Anaconda"--a distribution of the Python programming langauge with pre-configured settings designed for text mining, data analytics, and more! You can download Anaconda here.

Code

To access the Notebooks for this course you can click the green "code" button (top right corner) and "Download Zip."

Alternatively, you can "clone" the repository onto your computer using terminal with the following command:

git clone https://github.com/stephbuon/faha.git

Data

While students are encouraged to discover archives that match their interests, we provide access to four pre-curated data sets that can be downloaded onto your computer. These data sets have undergone transformed from their original sources (see data citations).

The U.S. Congress Congressional Records: The Congressional Record is the official record of the proceedings and debates of the United States Congress. It is published online daily when Congress is in session.

The Hansard Debates: Hansard is the name of the transcripts of Parliamentary debates in Britain. We provide access to the 19th-century corpus, debates from the House of Commons and the House of Lords.

The Minutes from the State Education Boards: We specifically provide data for the State Education Board for Loudon County, Virginia for the year 2021.

Data Citations

Buongiorno, Steph, Robert Kalescky, Omar Alexander Cerpa, and Jo Guldi. "The Hansard 19th-Century British Parliamentary Debates with Improved Speaker Names: Parsed Debates, N-Gram Counts, Special Vocabulary, Collocates, and Topics", https://doi.org/10.7910/DVN/ZCYJH8, Harvard Dataverse, V1, 2022, UNF:6:wFlN6+URD9Q9BWYxgZgu1A== [fileUNF]

Gentzkow, Matthew, Jesse M. Shapiro, and Matt Taddy. "Congressional Record for the 43rd-114th Congresses: Parsed Speeches and Phrase Counts." Palo Alto, CA: Stanford Libraries [distributor], 2018-01-16. https://data.stanford.edu/congress_text

Odell, Evan. "Hansard Speeches 1979-2020 Version 3.0.1," https://evanodell.com/projects/datasets/hansard-data/, Evan Odell, V3, N.D.

About

Code for "Foundations and Applications of Humanities Analytics" (2022) at the Santa Fe Institute

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •