This repository contains the corresponding code for our paper
Henkel F., Kelz R., and Widmer G.
"Learning to Read and Follow Music in Complete Score Sheet Images".
In Proceedings of the 21st International Society for Music Information Retrieval Conference, 2020
which is an extension of our previous work (which you can find by switching to the worms-2019 branch):
Henkel F., Kelz R., and Widmer G.
"Audio-Conditioned U-Net for Position Estimation in Full Sheet Images".
In Proceedings of the 2nd International Workshop on Reading Music Systems, 2019
The data used in this paper can be found here and should be placed in audio_conditioned_unet/data. If you install the package
properly (see instructions below) this will be done automatically for you.
In the folder videos you will find several pieces from the test set, where our best performing model follows an incoming musical performance.
If you want to try our code, please follow the instructions below.
First, clone the project from GitHub:
 git clone https://github.com/CPJKU/audio_conditioned_unet.git
In the cloned folder you will find an anaconda environment file which you should install using the following command:
conda env create -f environment.yml
Activate the environment:
 conda activate audio_conditioned_unet
Finally, install the project in the activated environment:
python setup.py develop
This last command will extract the sound font file and download the data.
Make sure to have FluidSynth installed as we will use it to synthesize audio from MIDI. To this end we also provide you with a piano sound font which you can find in the folder audio_conditioned_unet/sound_fonts
To verify that everything is correctly set up, run the following command:
python test_model.py --param_path ../models/CB_TA/best_model.pt --test_dir ../data/msmd/msmd_test --test_piece Anonymous__lanative__lanative_page_0 --config configs/msmd.yaml --plot
This will run a pre-trained model on a test piece and plot the predictions on top of the score to the screen. (Note: The '--plot' mode does not support audio playback. If you want this, you will need to create a video which will be explained below.)
If you want to train your own models, you will need to run train_model.py. This script can take several parameters to change the network architecture and the training procedure. The most important parameters that you will need to set are the paths to the train and validation set, in which blocks you would like to activate the FiLM layers and which audio encoder you want to use. You can also provide a log and a dump directory where the statistics during training and validation and the model parameters will be stored. The logs can be visualized by using Tensorboard.
To give you an example, if you want to train a model with FiLM layers being activated in blocks B-H with a context based encoder and a LSTM, run the following command:
python train_model.py --train_set ../data/msmd/msmd_train --val_set ../data/msmd/msmd_valid --config configs/msmd.yaml --film_layers 2 3 4 5 6 7 8 --audio_encoder CBEncoder --use_lstm
To train the same models as in the paper you can check out the run_experiments.sh script.
To reproduce the results shown in Table 2, we provide you with our trained models in the folder models. To evaluate a single model on the test set you need to run the following command:
python eval_model.py --param_path ../models/<MODEL-FOLDER>/best_model.pt --test_dir ../data/msmd/msmd_test --config configs/msmd.yaml
To get the results shown in Table 3 you have to add the --eval_onsets flag.
E.g., if you want to evaluate the context based model with tempo augmentation, you need to execute:
python eval_model.py --param_path ../models/CB_TA/best_model.pt --test_dir ../data/msmd/msmd_test --config configs/msmd.yaml
To see what our network actually does, we can create a video of its performance on a certain piece:
 python test_model.py --param_path ../models/<MODEL-FOLDER>/best_model.pt --test_dir ../data/msmd/<TEST-DIR> --test_piece <PIECE> --config configs/msmd.yaml
e.g., if you want to create a video for the test piece Anonymous__lanative__lanative using our best performing model, you need to execute:
python test_model.py --param_path ../models/CB_TA/best_model.pt --test_dir ../data/msmd/msmd_test --test_piece Anonymous__lanative__lanative_page_0 --config configs/msmd.yaml
This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement number 670035, project "Con Espressione").
