Knowledge Tracing Machines

Presented at the AAAI 2019 conference in Honolulu, Hawaii on January 27, 2019 [arXiv] [slides].
Applied in the Best Paper Award of the EDM 2019 conference in Montreal, Canada on July 2, 2019.

@inproceedings{Vie2019,
  Author = {{Vie}, Jill-J{\^e}nn and {Kashima}, Hisashi},
  Booktitle = {Proceedings of the 33th {AAAI} Conference on Artificial Intelligence},
  Title = {{Knowledge Tracing Machines: Factorization Machines for Knowledge Tracing}},
  Pages = {750--757},
  Url = {https://arxiv.org/abs/1811.03388},
  Year = 2019
}

Authors: Jill-Jênn Vie, Hisashi Kashima

Follow our tutorial

Presented at the Optimizing Human Learning workshop in Kingston, Jamaica on June 4, 2019.

Slides from the tutorial are available here. A Jupyter notebook will be available "soon" on Binder.

The tutorial makes you play with the models to assess weak generalization. To assess strong generalization and reproduce the experiments of the paper, you may want to use scikit-learn's GroupShuffleSplit, cf. the sktm.py file.

Install

python3 -m venv venv
. venv/bin/activate
pip install -r requirements.txt  # Will install numpy, scipy, pandas, scikit-learn, pywFM

If you also want to get the factorization machines running (KTM for d > 0), you should also do:

make libfm

Prepare data

Select a dataset and the features you want to include.

Case 1: There is only one skill per item.

data/<dataset>/data.csv should contain the following columns:

user, item, skill, correct, wins, fails

where wins and fails are the number of successful and unsuccessful attempts at the corresponding skill.

Case 2: There may be several skills associated to an item.

data/<dataset>/needed.csv needs to contain:

user_id, item_id, correct

(Note the difference.)

And data/<dataset>/q_mat.npz should be a q-matrix under scipy.sparse format.

If you want to compute wins and fails like in PFA or DAS3H, you should run encode_tw.py instead of this file, with the --pfa option for PFA or --tw for DAS3H.

Running

Available datasets

Assistments 2009
Our reformatted version of the Assistments 2009 dataset.
Berkeley and Castor datasets are private.
Datasets ECPE and TIMSS 2003 come from the CDM package:

> install.packages('CDM')
> library('CDM')
> dim(fraction.subtraction.data)
[1] 536  20
> dim(data.ecpe$data)
[1] 2922   29
> dim(data.timss03.G8.su$data)
[1] 757  25

Encoding data into sparse features (quick start)

python encode.py --users --items  # To get the encodings (npz)
python lr.py data/dummy/X-ui.npz  # To get results (txt)
python sktm.py data/dummy/data.csv --feat ui  # Will be faster

You can also download the Assistments 2009 dataset into data/assistments09 and change the dataset:

python encode.py --dataset assistments09 --skills --wins --fails  # Will encode PFA sparse features into X-swf.npz

If you are lazy, you can also just do make and try to understand what is going on in the Makefile.

Encoding time windows

Choffin et al. proposed the DAS3H model, and we implemented it using queues. This code is faster than the original KTM encoding.

To prepare a dataset like Assistments, see examples in the data folder.
Skill information should be available either as skill_id, or skill_ids separated with ~~, or in a q-matrix q_mat.npz.

python encode_tw.py --dataset dummy_tw --tw  # Will encode DAS3H sparse features into X.npz

Then you can run lr.py or fm.py, see below.

Running a ML model

If you want to encode PFA features:

python encode.py --skills --wins --fails  # Will create X-swf.npz

For logistic regression:

python lr.py data/dummy/X-swf.npz
# Will save weights in coef0.npy

For factorization machines of size d = 5:

python fm.py --d 5 data/dummy/X-swf.npz
# Will save weights in w.npy and V.npy

The following code does not work if you don't have user_id as column in CSV file.

Results

Strong generalization

On the Assistments 2009 dataset:

Model	Dimension	AUC	Note
KTM: items, skills, wins, fails, extra	5	0.819
KTM: items, skills, wins, fails, extra	5	0.815	+0.05
KTM: items, skills, wins, fails	10	0.767
KTM: items, skills, wins, fails	0	0.759	+0.02; 0.747 in 2025, random seed 42
(DKT (Wilson et al., 2016))	100	0.743	+0.05
IRT: users, items	0	0.691	0.678 in 2025, random seed 42
PFA: skills, wins, fails	0	0.685	+0.07; 0.703 in 2025, random 42
AFM: skills, attempts	0	0.616

On the Duolingo French dataset:

Model	Dimension	AUC	Improvement
KTM	20	0.822	+0.01
DeepFM	20	0.814	+0.04
Logistic regression + L2 reg	0	0.771

We also showed that Knowledge Tracing Machines (Bayesian FMs) got better results than Deep Factorization Machines on the Duolingo dataset. See our article: Deep Factorization Machines for Knowledge Tracing and poster at the BEA workshop at New Orleans, LA on June 5, 2018.

@inproceedings{Vie2018,
  Author = {{Vie}, Jill-J{\^e}nn},
  Booktitle = {{Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications}},
  Pages = {370--373},
  Title = {{Deep Factorization Machines for Knowledge Tracing}},
  Url = {http://arxiv.org/abs/1805.00356},
  Year = 2018}

Weak generalization

Those numbers may change according to your random state seed. For the numbers below we used random_state=42 and seed=42 for respectively (multicore) 5-fold cross validation and libfm.

On the Assistments 2009 dataset:

AUC time	users + items	skills + wins + fails	items + skills + wins + fails
LR	0.769 (IRT) 4.2s	0.704 (PFA) 15s	0.747 48s
(last) FM d = 5	0.742 1min14s	0.703 54s	0.730 1min48s
(MCMC) FM d = 5	0.767 2min23s	0.705 1min2s	0.749 2min50s

Computation times are given for a i5 with 2.6 GHz in performance mode, with 200 epochs of FM training.

Improvement attempts

Efficient scikit-learn implementation for IRT

Check sktm.py.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LogisticRegression

pipe = Pipeline([
    ('onehot', OneHotEncoder(handle_unknown='ignore')),
    ('lr', LogisticRegression(solver='liblinear'))
])

# IRT
pipe.fit(df_train[['user', 'item']], df_train['correct'])
print(pipe.predict_proba(df_test[['user', 'item']]))

sktm contains efficient parallel cross validation over 5 folds, stratified by group (i.e. strong generalization).

Usage:

mkdir data/assistments09
wget https://jiji.cat/weasel2018/data.csv -P data/assistments09  # Basically download it there
python sktm.py data/assistments09/data.csv --feat swf  # Choose which model, swf is PFA

For factorization machines, replace LogisticRegression with from fm import FMClassifier. There is a subtlety, please contact me to know more.

For an online MIRT model:

python omirt.py --d 0 data/assist09/needed.csv  # Will load LR: coef0.npy
python omirt.py --d 5 data/assist09/needed.csv  # Will load FM: w.npy and V.npy

# Will train a IRT model on Fraction dataset with learning rate 0.01
python omirt.py --d 0 data/fraction/needed.csv --lr 0.01 --lr2 0.

For an IRT or deeper model with Keras, for batching and early stopping:

python dmirt.py data/assist09/needed.csv

It will also create a model.png file with the architecture (here just IRT with L2 regularization):

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.github/workflows		.github/workflows
data		data
doc		doc
notebooks		notebooks
poster		poster
tests		tests
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
bijection.py		bijection.py
cf-class.py		cf-class.py
cf.py		cf.py
codecov.yaml		codecov.yaml
dataio.py		dataio.py
dmirt.py		dmirt.py
encode.py		encode.py
encode_tw.py		encode_tw.py
eval.py		eval.py
eval_metrics.py		eval_metrics.py
fm.py		fm.py
ktm.png		ktm.png
lr.py		lr.py
model.png		model.png
ofm.py		ofm.py
omirt.py		omirt.py
requirements.txt		requirements.txt
sktm.py		sktm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Knowledge Tracing Machines

Follow our tutorial

Install

Prepare data

Case 1: There is only one skill per item.

Case 2: There may be several skills associated to an item.

Running

Available datasets

Encoding data into sparse features (quick start)

Encoding time windows

Running a ML model

Results

Strong generalization

Weak generalization

Improvement attempts

Efficient scikit-learn implementation for IRT

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Uh oh!

License

Uh oh!

jilljenn/ktm

Folders and files

Latest commit

History

Repository files navigation

Knowledge Tracing Machines

Follow our tutorial

Install

Prepare data

Case 1: There is only one skill per item.

Case 2: There may be several skills associated to an item.

Running

Available datasets

Encoding data into sparse features (quick start)

Encoding time windows

Running a ML model

Results

Strong generalization

Weak generalization

Improvement attempts

Efficient scikit-learn implementation for IRT

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages