Rank-Lips: Example

A small example to illustrate the use of (rank-lips)[index.html].

Download and unpack the archive rank-lips-example.tar.gz or:

  1. Create a data directory and download the training/test qrels here,

  2. Create a subdirectory for train-features ( FeatureA, FeatureB, FeatureC).

  3. Create a subdirectory for test-features ( FeatureA, FeatureB, FeatureC).

Since the feature filenames need to be consistent, you have to place train/test features in different directories.

This example has three training queries, however, only queries with positive and negative training data can be used for determining a gradient. In this example, Q3 only has one positive example, but no negative training data, which is why it will be removed by rank-lips. (During prediction, such queries will be included.)

The Qrel file does not need to be complete, missing entries are considered as non-relevant (negative), just as in trec_eval.

The following commands are provided in run.sh.


  1. Test with the command rank-lips train. For a full explanation of command line parameters see rank-libs train --help.

    Command: rank-lips train -d ./train-features -q train.qrel -e 'my first rank-lips experiment' -O ./out -o train-try1 --z-score --default-any-feature-value 0.0 --convergence-threshold 0.001 --mini-batch-size 1000 --folds 2 --restarts 10 --save-heldout-queries-in-model --feature-variant FeatScore

  2. You will see information about training progress and restarts as well as the final training MAP score in the output.

    full restart 3 iteration 3, score 0.9166666666666666 -> 0.9166666666666666 rel 0.0
    full restart 4 iteration 1, score 0.725 -> 0.85 rel 0.14705882352941177
    full restart 4 iteration 2, score 0.85 -> 0.9166666666666666 rel 7.272727272727271e-2
    full restart 4 iteration 3, score 0.9166666666666666 -> 0.9166666666666666 rel 0.0
    Model train train metric 0.9166666666666666 MAP.
    Written model train to file "./out/train-try1-model-train.json" .
    Model train test metric 0.9166666666666666 MAP.
    dumped all models and rankings
  3. The $OUT directory will contain new files: Model train-try1-model-train.json and predicted run on train set train-try1-run-train.run.

  4. Explore trained model weights by inspecting the model JSON.

      "rankLipsTrainedModel": {
        "FeatureC-FeatScore": -9.533877827456385,
        "FeatureB-FeatScore": 2.102726478972383,
        "FeatureA-FeatScore": -0.14793830970606084


  1. Test with the command rank-lips predict. Note that a slightly different set of parameters apply (See rank-libs predict --help)

    Command: rank-lips predict -d ./test-features -q test.qrel -O ./out -o predict-try1 -m ./out/train-try1-model-train.json --feature-variant FeatScore

  2. In the output you will see information about the test MAP score.

    Model predict test metric 0.41666666666666663 MAP.
  3. The $OUT directory will contain a predicted test run file test-run-predict.run,

    Q10 Q0 doc2 1 1.731316541796714 l2r predict
    Q10 Q0 doc3 2 1.7202359601271957 l2r predict
    Q10 Q0 doc1 3 0.6309246349602737 l2r predict
    Q11 Q0 doc6 1 1.8836096930688682 l2r predict
    Q11 Q0 doc7 2 1.8808550789709522 l2r predict
    Q11 Q0 doc8 3 1.8359348755751346 l2r predict
    Q11 Q0 doc9 4 1.7806539721177292 l2r predict
    Q11 Q0 doc10 5 1.7072597968074712 l2r predict
    Q11 Q0 doc5 6 1.4889951940566715 l2r predict
    Q11 Q0 doc4 7 1.4024090651371584 l2r predict
  4. Verify the MAP score with trec_eval test.qrel out/test-run-predict.run -m map:

    map                     all     0.4167


The example has only two queries, therefore only two folds are being used (one query each).

  1. You can also train with cross validation enabled using the command rank-lips train -O $DTA_PATH -o "cv-example" -d $TRAIN_FEATURES -q $QREL --train-cv — this will perform normal training AND cross-validated training

Command: rank-lips train --train-cv -d ./train-features -q train.qrel -e 'my first rank-lips experiment' -O ./out -o cv-try1 --z-score --default-any-feature-value 0.0 --convergence-threshold 0.001 --mini-batch-size 1000 --folds 2 --restarts 10 --save-heldout-queries-in-model --feature-variant FeatScore

  1. The output contains information about both the regular training (training MAP) and the cross-validated map (“test-test”)
full restart 4 iteration 1, score 0.875 -> 0.9166666666666666 rel 4.5454545454545414e-2
full restart 4 iteration 2, score 0.9166666666666666 -> 0.9166666666666666 rel 0.0
FoldIdx 1 restart 4 iteration 2, score 1.0 -> 1.0 rel 0.0
Model fold-1-best test metric 0.45 MAP.
Model fold-1-best train metric 1.0 MAP.
Written model fold-1-best to file "./out/cv-try1-model-fold-1-best.json" .
full restart 4 iteration 3, score 0.9166666666666666 -> 0.9166666666666666 rel 0.0
Model train test metric 0.9166666666666666 MAP.
Model test test metric 0.475 MAP.
Model train train metric 0.9166666666666666 MAP.
Written model train to file "./out/cv-try1-model-train.json" .
dumped all models and rankings
  1. The predicted ranking:
Q1 Q0 doc3 1 0.8124722217132947 l2r test
Q1 Q0 doc2 2 0.3361474803069571 l2r test
Q1 Q0 doc1 3 -2.3958635914467803 l2r test
Q2 Q0 doc9 1 0.6736768036795666 l2r test
Q2 Q0 doc8 2 0.5940152842882087 l2r test
Q2 Q0 doc7 3 0.5721800884716158 l2r test
Q2 Q0 doc6 4 0.34189284390689123 l2r test
Q2 Q0 doc5 5 0.22965047894959037 l2r test
Q2 Q0 doc4 6 -0.23827775903504964 l2r test
  1. Training MAP and CV Test Map can be seen in the output:
Model train train metric 0.9166666666666666 MAP.
Model test test metric 0.39166666666666666 MAP.
  1. A comparison with trec_eval yields the same result
trec_eval -m map train.qrel ./out/cv-try1-run-test.run
map                     all     0.3917
  1. Please note: In cross-validation mode, only queries with both positive and negative instances are used (in our example Q3 is dropped). This will also affect test performance, hence trec_eval -c may offer a different value
trec_eval -c -m map train.qrel ./out/cv-try1-run-test.run
map                     all     0.2611
  1. You will find six more models in the $OUT, one for training on the whole set *-model-train.json, and one for each of the five folds (*-model-fold-$k-best.json). You will also find more run files (one per model, (*-run-fold-$k-best.run)), and a coss-validated run file for testing *-run-test.run.

More Options

  1. Enabling only features A and B with -f FeatureA -f FeatureB (these are filenames in the feature directory)

Command: rank-lips train -d ./train-features -q train.qrel -e 'my first rank-lips experiment' -O ./out -o train-try-feature-subset --z-score --convergence-threshold 0.001 --mini-batch-size 1000 -f FeatureA -f FeatureB --feature-variant FeatScore

 loadRunFiles FeatureA FeatureB
Feature dimension: 2
  1. Disabling z-score normalization during training may lead to a less resilinent model learning. (To disable, delete the --z-score option from the command above.

Command: rank-lips train -d ./train-features -q train.qrel -e 'my first rank-lips experiment' -O ./out -o train-try-feature-subset --convergence-threshold 0.001 --mini-batch-size 1000 -f FeatureA -f FeatureB --feature-variant FeatScore

  "rankLipsTrainedModel": {
    "FeatureA-FeatRecipRank": 2.0027944151298925e-05,
    "FeatureB-FeatRecipRank": 2.0027944151298925e-05,
    "FeatureC-FeatScore": -2.309760786659935,
    "FeatureC-FeatRecipRank": -0.7356527425044204,
    "FeatureB-FeatScore": 5.3752568906776315,
    "FeatureA-FeatScore": -0.009620874528474738
  1. Enabling different feature variants with --feature-variant FeatScore --feature-variant FeatRecipRank

Command: /rank-lips train -d ./train-features -q train.qrel -e 'my first rank-lips experiment' -O ./out -o train-try-feature-variants --z-score --convergence-threshold 0.001 --mini-batch-size 1000 --feature-variant FeatScore --feature-variant FeatRecipRank

  1. Explore changing settings of the training parameters, such as mini-batch size --convergence-threshold 0.0001 --mini-batch-size 1.

We recommend to use a validation set to choose these parameters to trade-off speed versus performance. In this example we chose mini-batch of a single query (1), because we only have 2 train queries in total, typical minibatch-sizes are 10, 100, 1000

Command: rank-lips train -d ./train-features -q train.qrel -e 'my first rank-lips experiment' -O ./out -o train-try-convergence-mini-batch --z-score --convergence-threshold 0.0001 --mini-batch-size 1 --feature-variant FeatScore --feature-variant FeatRecipRank

  1. In the case where not all features are defined for all query/document combinations, we highly recommend to at least set a default that is used for any feature. Finer control over default features can be set per feature-variant…

Command: rank-lips train -d ./train-features-with-missing -q train.qrel -e 'my first rank-lips experiment' -O ./out -o train-missing-features-variants --feature-variant FeatScore --feature-variant FeatRecipRank --default-feature-variant-value FeatRecipRank=100.0 --default-feature-variant-value FeatScore=-9999.99

… or per-feature (which is a combination of run filename and feature variant).

Command rank-lips train -d ./train-features-with-missing -q train.qrel -e 'my first rank-lips experiment' -O ./out -o train-missing-features --feature-variant FeatScore -f FeatureA -f FeatureB --default-feature-value FeatureA-FeatScore=-10.0 --default-feature-value FeatureB-FeatScore=0.0