The Making and Knowing Project is a research and pedagogical initiative in the Center for Science and Society at Columbia University that explores the intersections between artistic making and scientific knowing. From 2014 through 2019, the Project’s focus is the creation of a digital critical edition of an intriguing anonymous sixteenth-century French artisanal and technical manuscript, BnF Ms. Fr. 640.
This repository contains the website code and scripts to process the XML of the documentary edition, as well as the associated research essays. Once the site is compiled using the provided tools, it can be deployed as a static website with no special hosting requirements.
- Getting started (first install)
- Deployment guide (for running existing installation)
- Publishing Google Drive content to GitHub
- Node v14.x with NPM v7.x
- e.g., nvm 14.21.3 with npm 7.24.2 (these are both the latest versions in those major versions)
- Yarn v1.22.x (recommended: 1.22.18)
- It is recommended to install this with npm:
npm install --global [email protected]
- It is recommended to install this with npm:
- Rclone v1.56+ (recommended: 1.58.0)
- Pandoc 2.14+ (recommended: 2.17.1)
A note on versions
The specified versions are minimum tested versions, while the "recommended" versions are the newest tested. There may be newer versions that will still work, but not necessarily: in the case of Yarn, you must use Yarn 1.x rather than 2.x.
If installing with package managers, ensure the correct versions are pulled. For example, using Homebrew on MacOS:
brew install [email protected]
or using apt-get on Linux:
sudo apt-get install pandoc=2.17.1
If the specified version is not available from your preferred package manager, you may follow manual installation instructions from the linked project websites. To check an installed version, typically the --version
flag will work, for example:
pandoc --version
In order to get the project running on your local machine, follow the steps below. Once you have a local environment, you can deploy a version of the edition to a server. (If you want the server to automatically stay up to date, see the MK Asset Server project; though note that it has not been tested recently.)
-
Run
yarn install
in the project root directory and thescripts
subdirectory.yarn install && yarn --cwd scripts install
-
You will need to set up and configure rclone, which provides rsync-like functionality. Set up rclone to have a service called "mk-annotations" which is authorized to access the shared "Annotations" directory by following the directions below. To check if you have previously done this (and thus do not need to set up and configure again), run
rclone mk-annotations:
to make sure rclone is able to connect to the Google Drive. This will give a listing of folders if successful. Otherwise, set up and configure the service:- Follow the instructions to make a Google Drive client ID (ensure that the user account performing these actions has access to the "Annotations" folder in Google Drive)
- In a terminal, run the config wizard with the command
rclone config
- Enter
n
for "New config" - Enter
mk-annotations
for the name - Enter
drive
for Google Drive - Enter your client ID from Google
- Enter your client secret from Google
- Enter
drive
for "drive" scope - Keep pressing enter to leave the rest as defaults
- You should get to a step that opens a browser window with Google authorization. Authorize rclone for the requested permissions. Then, back in the config wizard, continue pressing enter to leave the rest as defaults.
- You should see a list with one remote of "drive" type, named "mk-annotations"
- Enter
q
to quit the config wizard
-
In the project root directory, copy the
edition_data_example
directory toedition_data
cp -R edition_data_example edition_data
-
Open
edition_data/config.json
in your preferred text editor, for example:nano edition_data/config.json
Edit the configuration for each environment, for example, the local environment:
"local": { "buildID": "stagingMMDDYY-N", "editionDataURL": "http://localhost:4000/bnf-ms-fr-640", "assetServerURL": "https://edition-assets.makingandknowing.org", "targetDir": "public/bnf-ms-fr-640", "sourceDir": "edition_data/m-k-manuscript-data", "contentDir": "edition_data/edition-webpages", "workingDir": "edition_data/working/MMDDYY", "rclone": { "serviceName": "mk-annotations", "folderName": "Annotations", "sharedDrive": true }, "releaseMode": "staging" }
- You will need to specify a build ID (
buildID
) and a working directory (workingDir
). We recommend using a formatted date for both:MMDDYY-N
where MM=Month, DD=Date, YY=Year, N=builds on that date. - Explanations of other
config.json
content:- The
editionDataURL
setting is used to insert the root URL of the hosted edition. The defaults for staging and production assume deployment to S3 with existing CloudFront distributions. These may be changed if deploying elsewhere. - Similarly, the
assetServerURL
setting is used to build image asset URLs to refer their locations on the asset server. The default setting refers to the Making and Knowing S3 asset CloudFront distribution, but this may be changed if hosting images elsewhere.- Note that this will only affect new migrations from Google Drive. Essays already in GitHub will retain the
assetServerURL
that was in place when they were first migrated. See the note on asset URLs for more info.
- Note that this will only affect new migrations from Google Drive. Essays already in GitHub will retain the
- The
- If your Google Drive account is the owner of the "Annotations" folder (i.e. General Editor), set
sharedDrive
tofalse
. Otherwise, leave it astrue
. - If setting up a production build, ensure the
googleTrackingID
is set to a working Google Analytics ID.
- You will need to specify a build ID (
-
From the project root directory, set up the necessary directory structure:
mkdir public/bnf-ms-fr-640 mkdir edition_data/working
-
In the
edition_data
directory, clone the needed repositories. The third is a private repo, so it will require an SSH key. Note: if you have previously cloned these repos, you can dogit pull
inside each rather than cloning.cd edition_data git clone https://github.com/cu-mkp/m-k-manuscript-data.git git clone https://github.com/cu-mkp/edition-webpages.git git clone [email protected]:cu-mkp/m-k-annotation-data.git cd .. # return to the project root
Now, you are ready to process some data!
Run scripts/lizard.js sync
to download and prepare the edition data for your local machine.
Once you have generated some data for the edition, you can start it locally:
yarn start
Run the following commands to prepare a build for your first deployment. You may replace "staging" with "production" for a production server.
scripts/lizard.js run staging
scripts/lizard.js migrate staging
yarn build
This should create a directory called build
in the project root, which is the bundled, built project that you can deploy to your server (i.e. build/index.html
is the site root).
Note on asset URLs and internal links
If you need to alter the assetServerURL
, note that all essays already in the GitHub m-k-annotation-data
repo will still retain the original S3/CloudFront asset server URL (https://edition-assets.makingandknowing.org
) setting from their initial migration. Thus, after running the yarn build
script, you will need to search for all instances of that URL in the build
directory, and replace it with your new URL. Otherwise, you may run into CORS issues.
To permanently migrate essays to a new assetServerURL
, you will need to make the same change across all essays in the m-k-annotation-data
repo, as those are not overwritten or modified by any code here after initial migration.
The same goes for many internal links across the static site, which refer to the production site (essays) or the editionDataURL
(other static content). These may need to be manually altered when deployed to a different server.
Note also that values are pulled from edition_data/config.json
to populate .env.*
files, so do not alter any .env
files directly.
This guide assumes you've followed the instructions above and successfully set up your local development environment. This guide should be used for all future deployments after initial setup.
-
Navigate to the
edition_data
directory -
Update
config.json
- Specify a build ID (
buildID
) and a working directory (workingDir
). We recommend using a formatted date for both:MMDDYY-N
where MM=Month, DD=Date, YY=Year, N=builds on that date.
- Specify a build ID (
-
Pull the latest data from each repo:
cd edition_data cd m-k-annotation-data && git pull && cd .. cd m-k-manuscript-data && git pull && cd .. cd edition-webpages && git pull && cd .. cd .. # return to the project root
-
For each repo, check out the tagged release you want to use for the deployment. Otherwise, you will be using code and data from the
master
branch. For example. to use a tag namedv1.2.3
on them-k-annotation-data
repo:cd edition_data cd m-k-annotation-data git fetch --all --tags git checkout v1.2.3 cd ../.. # return to the project root
Note: This will put you in "detached HEAD" state for this repo, meaning you are not on a branch, so any commits made here will be lost. Be sure to switch to a branch before making any commits.
If you want to immediately start working on a new branch, you can use
git checkout v1.2.3 -b new-branch-name
instead. This might be useful if you are, for example, migrating new annotations to GitHub from Google Drive. -
Run the sync script from the project root:
scripts/lizard.js sync
-
This uses the
./edition_data/working/MMDDYY-N
directory to generate all annotation html, process images from Google Drive, and create a local build in./public/bnf-ms-fr-640
.Details on how this works
- If the annotation's
data_source
column is marked as"gh"
inannotation-metadata.csv
and the annotation's html file does not already exist in./edition_data/m-k-annotation-data/html
then...- This script prepares all those annotation for migration away from the Google Drive workflow and to the GitHub workflow by...
- Processing the html generated in step 2 (changes all
img
src
attributes to an asset url (viaassetServerURL
), injects the annotation's 'Abstract' & 'Cite As' elements, removes unnecessary elements, and makes html human readable) - Saving the newly processed html to
./edition_data/m-k-annotation-data/html
and replaces the annotation's html file in./public
- Moving images from
./public
to a holding directory (./edition_data/s3-images
) for later upload to the s3 bucket and deleting from./public
- Processing the html generated in step 2 (changes all
- This script prepares all those annotation for migration away from the Google Drive workflow and to the GitHub workflow by...
- If the annotation is marked as
"gh"
and already exists in./edition_data/m-k-annotation-data/html
, then the annotation is simply copied from./edition_data/m-k-annotation-data/html
to./public
- If the annotation's
-
-
Run the start script to test locally:
yarn start
-
Run the
run
andmigrate
scripts [WHAT DO THESE TWO DO? What ismigrate
migrating to?] for the environment you are deploying to, i.e. for staging:scripts/lizard.js run staging scripts/lizard.js migrate staging
or for production:
scripts/lizard.js run production scripts/lizard.js migrate production
-
Run the build script to produce a bundle built from your files in
./public
and output to the./build
directory:yarn build
-
Remove any unwanted builds from
./build/bnf-ms-fr-640
(i.e. any that aren't for the date and the environment you are currently building for).
The following steps may differ if you are deploying on your own server, but these are the exact steps needed for S3 and CloudFront. To use the AWS CLI, you must install it and set it up. Though, it is also possible to do the following in the S3 user interface.
-
From the project root, upload all images in
./edition_data/s3-images
to themk-annotation-images
AWS S3 bucket, either through the S3 UI or by running the following in the AWS CLI:aws s3 cp ./edition_data/s3-images/ s3://mk-annotation-images --recursive --grants read=uri=http://acs.amazonaws.com/groups/global/AllUsers
Then clear out the contents of
s3-images
:rm -rf ./edition_data/s3-images/*
-
Upload the contents of the
./build
directory to theedition640-dist
AWS S3 bucket with a key identical to your build ID, either through the UI, or by running the following.Note:
[staging/production]MMDDYY-N
is your Build ID, and should be identical to the folder name in./build
.cd build
Then - DO NOT HIT RETURN AFTER THIS COMMAND. REPLACE "MMDDYY-N" WITH THE DATE AND NUMBER (in order to have the proper Build ID)
aws s3 cp . s3://edition640-dist/stagingMMDDYY-N --recursive --grants read=uri=http://acs.amazonaws.com/groups/global/AllUsers
OR - DO NOT HIT RETURN AFTER THIS COMMAND. REPLACE "MMDDYY-N" WITH THE DATE AND NUMBER (in order to have the proper Build ID)
aws s3 cp . s3://edition640-dist/productionMMDDYY-N --recursive --grants read=uri=http://acs.amazonaws.com/groups/global/AllUsers
Then
cd .. # return to the project root
- Update the appropriate Amazon CloudFront (https://console.aws.amazon.com/cloudfront/home) distribution with the new build:
- Find the desired environment by looking at the CNAMEs column and click its distribution ID
- edition640... === production
- edition-staging... === staging
- edition-dev... === development
- Click the
Origins and Origin Groups
tab - Check the only listed origin and then click the
Edit
button - Change the
Origin Path
to the new build id (i.e., the name of the directory in./build
that you uploaded to S3) - Save that change, click the
Yes, Edit
button
- Find the desired environment by looking at the CNAMEs column and click its distribution ID
- After a few minutes or so, the site should be deployed!
Generated HTML needs to be added to Annotation Data repo html/
directory -- you must do this manually (e.g., by upload).
2024-05-07 UPDATE
When you are ready to switch the source of the annotation essays from Google Drive to Github:
- Edit annotation-metadata so that all essays you would like to add to github are marked as column
status-DCE
(currently column AJ) is set to "published" and columndata_source
(currently column AK) is set to "gh" - Do a build to
production
as described above in Preparing a build -- this will create properly-formatted html versions of these essays which you will need to be add to the annotations repo- NOTE: html versions of these essays are always generated whenever a build is done, even when they are marked as "gdrive" and for builds to
staging
. HOWEVER, these are not formatted fully for subsequent maintenance in Annotation Data github repo (i.e., do NOT add them to that repo) - If you add them after
staging
, for example, they will not have their frontmatter and their imagesrc
paths will be to the localbuild/
directory rather than to the asset server URLs - See #508, #509, and #510
- NOTE: html versions of these essays are always generated whenever a build is done, even when they are marked as "gdrive" and for builds to
- Download the new
html
files (for the essays you would like to now maintain in github) from theproduction
directory in S3 and upload to thehtml/
directory of Annotation Data repo - These essays will now be maintained in the github repo, so any edits, revisions, etc. should be made to the
html
files there and theirdata_source
value in annotation-metadata should remain as "gh"
Note NEEDS TO BE FIXED: this does not add generated HTML to the
/cloned edition_data/m-k-annotation-data
--> As of 2024-03-08 the only way to add newly converted essays (i.e., those generated from Google Drive for the 1st time) is to download the html files from S3 in the new build directory and add the files to the m-k-annotation-data Github repo
If you have followed the above guides, and there were any essays marked as "github" in metadata.csv
that had not yet been migrated from Google Drive, you will have populated ./edition_data/m-k-annotation-data
with new content.
It is recommended to publish this content to GitHub so that it will not have to be pulled from Google Drive again in the future. First, to check if there are indeed changes, run the following from the project root:
cd ./edition_data/m-k-annotation-data
git status
This should show any files that have changed: if working properly, and migrations were required, it should list new HTML files corresponding to any newly migrated Google Drive essays. You may want to open these files to make sure they look correct at a glance.
Note: recall the above note about
assetServerURL
. If you changed this setting in your configuration, ensure that all URLs for images in the essays you are about to publish use the same asset server URL as those already in them-k-annotation-data
repo. If they do not match, update the new essays' HTML to use the old URLs before continuing; or vice versa. These must at least be consistent across the repo, even if not all using thehttps://edition-assets.makingandknowing.org/
URL.
If all looks good, and you are ready to publish, you can do the following:
git checkout master && git pull # just to make sure we are on the latest master branch
git add .
git commit -m "Add new migrated essays from Google Drive"
git push origin master
This will publish them to the master branch on GitHub.