High-performance multi-threaded list-wise learning-to-rank implementation that supports mini-batched learning. The implementation focuses on coordinate ascent to optimize for mean-average precision, which is one of the strongest models for model combination in information retrieval.
Rank-lips is designed to work with trec_eval file formats for defining runs (run format) and relevance data (qrel format). The features will be taken from the score and/or reciprocal rank of each input file. The filename of an input run (in the directory) will be used as a feature name. If you want to train a model and predict on a different test set, make sure that the input runs for test features are using exactly the sane filename. We recommend to create different directories for training and test sets.
Rank-lips is implemented in GHC Haskell, but we also provide Linux binaries. Rank-lips is released AS-IS under the BSD-3-Clause open source license.
Authors: Laura Dietz and Ben Gamari.
The latest release is v1.1.
Training with 5-fold cross-validation
The features are given as a set of run-files (one run = one feature). The filename of the runfile in the $TRAIN_FEATURE_DIR
is used as a feature name. When the trained model is used to predict on a different dataset, the directory of test features need to use exactly the same file names. (Hint: you can use a directory of softlinks to define the names.) The ground truth is given as a qrels format.
rank-lips train -d "$TRAIN_FEATURE_DIR" -q "$QREL" -e "$FRIENDLY_NAME" -O "$OUT_DIR" -o "$OUT_PREFIX" -z-score --feature-variant FeatScore --mini-batch-size 1000 --convergence-threshold -0.001 --folds 5 --restarts 5 --threads 5 --train-cv
The $OUT_DIR$
will contain the following files:
Cross-validation:
train-run-test.run
prediction with cross-validation (no leakage of test data into the training process! Use this in your research paper!)train-model-fold-$k-best.json
the model trained for fold $k
(best model of 5 restarts)train-run-fold-$k-best.run
run predicted of fold’s model on the folds’ training data (used to compute the training MAP)Trained on whole data set:
train-model-train.json
model trained on all data (to be used with rank-lips predict
)train-run-train.run
run predicted with the overall train model on the train data (used to produce the training MAP score)To predict with rank-lips trained with previous versions, please include --is-v10-model
.