Rank-Lips: Usage

Rank-lips is designed to combine different retrieval models in a supervised way, aka “Learning to rank”. We interface the the most commonly used “trec_eval run file format”, which represents rankings for a set of benchmark queries in one large file.

Our implementation of the MAP (mean-average precision) evaluation metric is identical to the one in trec_eval.

Training

A model is trained from features (as input run files) and a ground truth (as qrels). The filename of the input run will be used as a feature name. Up to two features are derived from one line in the run file: the rank score (feaure variant FeatScore),and/or the inverse rank (FeatRecipRank).

Command: rank-lips train -d $TRAIN_FEATURE_DIR -f RUN_A -f RUN_B -f RUN_C -q QRELS -O $OUT_DIR -o OUT_PREFIX -e "experiment 1"

Missing features will be set to 0, please see below for fine-grained control over default feature values.

By default both feature variants are included. You can choose the feature variant with:

You can enable z-score normalization with --z-score. If z-score normalization is activated during training, it will automatically be applied during prediction. The mean and stdev of the train set will be applied.

In general, whichever settings were enabled during training, it will automatically applied during prediction (these are stored in the model file).

Training with Cross Validation

If cross-validation is enabled, the data will be automatically split into five folds, training a model per fold and predicting it on the test data of the fold.

Command: rank-lips train [...] --train-cv

You can specify the number of folds used with --folds K (default is 5).

If you intend to re-use the trained per-fold models, you will have to take precautions to only apply them to held-out queries (i.e., those not used for training). For verification, rank-lips can export the holdout queries along with the model using the option --save-heldout-queries-in-model

Prediction

Once a rank-lips model is trained, you can use it to predict rankings on a separate test set.

The model file format has changed compared to version 1.0. To load v1.0 models, include the command line flag --is-v10-model.

Optimization Parameters for Training

Rank-lips’ training procedure can be adjusted to your needs through command line parameters. The setting of parameters is included for archival purposes in the JSON of a rank-lips model

By default, rank-lips will use 5 restarts and 5 folds. You can change this with

Rank-lips uses coordinate ascent to find the the parameters of a linear model that yield the best MAP score on the training set. To detect convergence, we use the relative change from epoch to epoch in MAP score, and stop when the relative change is less than FACTOR. An initial number of epochs can be dropped and an upper limit on iterations can be set.

Whenever the training data is large (e.g. beyond 100 queries with 1000 documents), one training epoch may take a very long time, without offering much utility to the parameter optimization. By default rank-lips will use mini-batches of SIZE training queries to determine the gradient, every STEPS epochs, a new batch of queries will be chosen. (This technique is a form of stochastic gradient descent.) To diagnose convergence, the MAP score on the full training set will be used. Since this evaluation can be potentially expensive, we will skip dong this for EVAL many mini-batch iterations.

We ask to set the a friendly name of the experiment being conducted. This experiment name will be archived in the model file.

Multi-threading is enabled with

If you train a model on features and then use it to predict on the identical feature set, you will obtain the Training MAP score, please do not use such numbers in your paper. Only report numbers on a holdout test set.

Command: rank-lips predict -m MODEL -d FEATURE_DIR -f RUN_A -f RUN_B -f RUN_C [-q QRELS] -O OUT_DIR -o OUT_PREFIX

Default Feature Values

For the case where not all features are defined for all query/document combinations, it is strongly recommended to set a default feature value, which will be used to fill in missing feature values.

We offer three mutually exclusive default feature modes:

To set default values for multiple features / feature-variants, please repeat the parameter, e.g. --default-feature-value FeatureA-FeatScore=-10.0 --default-feature-value FeatureB-FeatScore=0.0

If no default feature option is passed on the command line, rank-lips will set any missing feature to 0.0.