Foundations and Applications of Humanities Analytics

The FAHA institute provides online and in-person education aimed at a broad range of humanities scholars. Participants will gain a theoretical and practical understanding of text analysis methods, and will learn how to extract content and derive meaning from digital sources, enabling new humanities scholarship.

Old Bailey collage

First Time Set Up

For this course we will be using the latest version "Anaconda"--a distribution of the Python programming langauge with pre-configured settings designed for text mining, data analytics, and more! You can download Anaconda here.

Code

To access the Notebooks for this course you can click the green "code" button (top right corner) and "Download Zip."

Alternatively, you can "clone" the repository onto your computer using terminal with the following command:

git clone https://github.com/stephbuon/faha.git

Data

While students are encouraged to discover archives that match their interests, we provide access to four pre-curated data sets that can be downloaded onto your computer. These data sets have undergone transformed from their original sources (see data citations).

The U.S. Congress Congressional Records: The Congressional Record is the official record of the proceedings and debates of the United States Congress. It is published online daily when Congress is in session.

The Hansard Debates: Hansard is the name of the transcripts of Parliamentary debates in Britain. We provide access to the 19th-century corpus, debates from the House of Commons and the House of Lords.

The Minutes from the State Education Boards: We specifically provide data for the State Education Board for Loudon County, Virginia for the year 2021.

Data Citations

Buongiorno, Steph, Robert Kalescky, Omar Alexander Cerpa, and Jo Guldi. "The Hansard 19th-Century British Parliamentary Debates with Improved Speaker Names: Parsed Debates, N-Gram Counts, Special Vocabulary, Collocates, and Topics", https://doi.org/10.7910/DVN/ZCYJH8, Harvard Dataverse, V1, 2022, UNF:6:wFlN6+URD9Q9BWYxgZgu1A== [fileUNF]

Gentzkow, Matthew, Jesse M. Shapiro, and Matt Taddy. "Congressional Record for the 43rd-114th Congresses: Parsed Speeches and Phrase Counts." Palo Alto, CA: Stanford Libraries [distributor], 2018-01-16. https://data.stanford.edu/congress_text

Odell, Evan. "Hansard Speeches 1979-2020 Version 3.0.1," https://evanodell.com/projects/datasets/hansard-data/, Evan Odell, V3, N.D.

Name		Name	Last commit message	Last commit date
Latest commit History 191 Commits
class-projects/summer-2022		class-projects/summer-2022
extra		extra
networks		networks
word-counts		word-counts
word-embeddings		word-embeddings
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Foundations and Applications of Humanities Analytics

First Time Set Up

Code

Data

Data Citations

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

stephbuon/faha-2022

Folders and files

Latest commit

History

Repository files navigation

Foundations and Applications of Humanities Analytics

First Time Set Up

Code

Data

Data Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages