Rank-Lips: Learning-to-Rank on Trec_Eval

High-performance multi-threaded list-wise learning-to-rank implementation that supports mini-batched learning. The implementation focuses on coordinate ascent to optimize for mean-average precision, which is one of the strongest models for model combination in information retrieval.

Rank-lips is designed to work with trec_eval file formats for defining runs (run format) and relevance data (qrel format). The features will be taken from the score and/or reciprocal rank of each input file. The filename of an input run (in the directory) will be used as a feature name. If you want to train a model and predict on a different test set, make sure that the input runs for test features are using exactly the sane filename. We recommend to create different directories for training and test sets.

Rank-lips is implemented in GHC Haskell, but we also provide Linux binaries. Rank-lips is released AS-IS under the BSD-3-Clause open source license.

Authors: Laura Dietz and Ben Gamari.

Installation

The latest release is v1.1.

Download Linux binary (statically compiled).
Build from source using ghcup to get GHC 8.8.3 and Cabal (or GHC for Windows)

Quick Usage

Training with 5-fold cross-validation

The features are given as a set of run-files (one run = one feature). The filename of the runfile in the $TRAIN_FEATURE_DIR is used as a feature name. When the trained model is used to predict on a different dataset, the directory of test features need to use exactly the same file names. (Hint: you can use a directory of softlinks to define the names.) The ground truth is given as a qrels format.

rank-lips train -d "$TRAIN_FEATURE_DIR" -q "$QREL" -e "$FRIENDLY_NAME" -O "$OUT_DIR" -o "$OUT_PREFIX" -z-score --feature-variant FeatScore --mini-batch-size 1000 --convergence-threshold -0.001 --folds 5 --restarts 5 --threads 5 --train-cv

The $OUT_DIR$ will contain the following files:

Cross-validation:

train-run-test.run prediction with cross-validation (no leakage of test data into the training process! Use this in your research paper!)
train-model-fold-$k-best.json the model trained for fold $k (best model of 5 restarts)
train-run-fold-$k-best.run run predicted of fold’s model on the folds’ training data (used to compute the training MAP)

Trained on whole data set:

train-model-train.json model trained on all data (to be used with rank-lips predict)
train-run-train.run run predicted with the overall train model on the train data (used to produce the training MAP score)

To predict with rank-lips trained with previous versions, please include --is-v10-model.

Laura Dietz

Rank-Lips: Learning-to-Rank on Trec_Eval

Installation

Quick Usage

See Also