A small example to illustrate the use of (rank-lips)[index.html].
Download and unpack the archive rank-lips-example.tar.gz or:
Create a data directory and download the training/test qrels here,
Create a subdirectory for train-features ( FeatureA, FeatureB, FeatureC).
Create a subdirectory for test-features ( FeatureA, FeatureB, FeatureC).
Since the feature filenames need to be consistent, you have to place train/test features in different directories.
This example has three training queries, however, only queries with positive and negative training data can be used for determining a gradient. In this example, Q3 only has one positive example, but no negative training data, which is why it will be removed by rank-lips. (During prediction, such queries will be included.)
The Qrel file does not need to be complete, missing entries are considered as non-relevant (negative), just as in trec_eval
.
The following commands are provided in run.sh.
Test with the command rank-lips train
. For a full explanation of command line parameters see rank-libs train --help
.
Command: rank-lips train -d ./train-features -q train.qrel -e 'my first rank-lips experiment' -O ./out -o train-try1 --z-score --default-any-feature-value 0.0 --convergence-threshold 0.001 --mini-batch-size 1000 --folds 2 --restarts 10 --save-heldout-queries-in-model --feature-variant FeatScore
You will see information about training progress and restarts as well as the final training MAP score in the output.
Output:
[...]
full restart 3 iteration 3, score 0.9166666666666666 -> 0.9166666666666666 rel 0.0
full restart 4 iteration 1, score 0.725 -> 0.85 rel 0.14705882352941177
full restart 4 iteration 2, score 0.85 -> 0.9166666666666666 rel 7.272727272727271e-2
full restart 4 iteration 3, score 0.9166666666666666 -> 0.9166666666666666 rel 0.0
Model train train metric 0.9166666666666666 MAP.
Written model train to file "./out/train-try1-model-train.json" .
Model train test metric 0.9166666666666666 MAP.
dumped all models and rankings
The $OUT
directory will contain new files: Model train-try1-model-train.json
and predicted run on train set train-try1-run-train.run
.
Explore trained model weights by inspecting the model JSON.
"rankLipsTrainedModel": {
"FeatureC-FeatScore": -9.533877827456385,
"FeatureB-FeatScore": 2.102726478972383,
"FeatureA-FeatScore": -0.14793830970606084
}
Test with the command rank-lips predict
. Note that a slightly different set of parameters apply (See rank-libs predict --help
)
Command: rank-lips predict -d ./test-features -q test.qrel -O ./out -o predict-try1 -m ./out/train-try1-model-train.json --feature-variant FeatScore
In the output you will see information about the test MAP score.
Model predict test metric 0.41666666666666663 MAP.
The $OUT
directory will contain a predicted test run file test-run-predict.run
,
Q10 Q0 doc2 1 1.731316541796714 l2r predict
Q10 Q0 doc3 2 1.7202359601271957 l2r predict
Q10 Q0 doc1 3 0.6309246349602737 l2r predict
Q11 Q0 doc6 1 1.8836096930688682 l2r predict
Q11 Q0 doc7 2 1.8808550789709522 l2r predict
Q11 Q0 doc8 3 1.8359348755751346 l2r predict
Q11 Q0 doc9 4 1.7806539721177292 l2r predict
Q11 Q0 doc10 5 1.7072597968074712 l2r predict
Q11 Q0 doc5 6 1.4889951940566715 l2r predict
Q11 Q0 doc4 7 1.4024090651371584 l2r predict
Verify the MAP score with trec_eval test.qrel out/test-run-predict.run -m map
:
map all 0.4167
The example has only two queries, therefore only two folds are being used (one query each).
rank-lips train -O $DTA_PATH -o "cv-example" -d $TRAIN_FEATURES -q $QREL --train-cv
— this will perform normal training AND cross-validated trainingCommand: rank-lips train --train-cv -d ./train-features -q train.qrel -e 'my first rank-lips experiment' -O ./out -o cv-try1 --z-score --default-any-feature-value 0.0 --convergence-threshold 0.001 --mini-batch-size 1000 --folds 2 --restarts 10 --save-heldout-queries-in-model --feature-variant FeatScore
[...]
full restart 4 iteration 1, score 0.875 -> 0.9166666666666666 rel 4.5454545454545414e-2
full restart 4 iteration 2, score 0.9166666666666666 -> 0.9166666666666666 rel 0.0
FoldIdx 1 restart 4 iteration 2, score 1.0 -> 1.0 rel 0.0
Model fold-1-best test metric 0.45 MAP.
Model fold-1-best train metric 1.0 MAP.
Written model fold-1-best to file "./out/cv-try1-model-fold-1-best.json" .
full restart 4 iteration 3, score 0.9166666666666666 -> 0.9166666666666666 rel 0.0
Model train test metric 0.9166666666666666 MAP.
Model test test metric 0.475 MAP.
Model train train metric 0.9166666666666666 MAP.
Written model train to file "./out/cv-try1-model-train.json" .
dumped all models and rankings
Q1 Q0 doc3 1 0.8124722217132947 l2r test
Q1 Q0 doc2 2 0.3361474803069571 l2r test
Q1 Q0 doc1 3 -2.3958635914467803 l2r test
Q2 Q0 doc9 1 0.6736768036795666 l2r test
Q2 Q0 doc8 2 0.5940152842882087 l2r test
Q2 Q0 doc7 3 0.5721800884716158 l2r test
Q2 Q0 doc6 4 0.34189284390689123 l2r test
Q2 Q0 doc5 5 0.22965047894959037 l2r test
Q2 Q0 doc4 6 -0.23827775903504964 l2r test
Model train train metric 0.9166666666666666 MAP.
Model test test metric 0.39166666666666666 MAP.
trec_eval
yields the same resulttrec_eval -m map train.qrel ./out/cv-try1-run-test.run
map all 0.3917
trec_eval -c
may offer a different valuetrec_eval -c -m map train.qrel ./out/cv-try1-run-test.run
map all 0.2611
$OUT
, one for training on the whole set *-model-train.json
, and one for each of the five folds (*-model-fold-$k-best.json
). You will also find more run files (one per model, (*-run-fold-$k-best.run
)), and a coss-validated run file for testing *-run-test.run
.-f FeatureA -f FeatureB
(these are filenames in the feature directory)Command: rank-lips train -d ./train-features -q train.qrel -e 'my first rank-lips experiment' -O ./out -o train-try-feature-subset --z-score --convergence-threshold 0.001 --mini-batch-size 1000 -f FeatureA -f FeatureB --feature-variant FeatScore
loadRunFiles FeatureA FeatureB
Feature dimension: 2
--z-score
option from the command above.Command: rank-lips train -d ./train-features -q train.qrel -e 'my first rank-lips experiment' -O ./out -o train-try-feature-subset --convergence-threshold 0.001 --mini-batch-size 1000 -f FeatureA -f FeatureB --feature-variant FeatScore
"rankLipsTrainedModel": {
"FeatureA-FeatRecipRank": 2.0027944151298925e-05,
"FeatureB-FeatRecipRank": 2.0027944151298925e-05,
"FeatureC-FeatScore": -2.309760786659935,
"FeatureC-FeatRecipRank": -0.7356527425044204,
"FeatureB-FeatScore": 5.3752568906776315,
"FeatureA-FeatScore": -0.009620874528474738
}
--feature-variant FeatScore --feature-variant FeatRecipRank
Command: /rank-lips train -d ./train-features -q train.qrel -e 'my first rank-lips experiment' -O ./out -o train-try-feature-variants --z-score --convergence-threshold 0.001 --mini-batch-size 1000 --feature-variant FeatScore --feature-variant FeatRecipRank
--convergence-threshold 0.0001 --mini-batch-size 1
.We recommend to use a validation set to choose these parameters to trade-off speed versus performance. In this example we chose mini-batch of a single query (1), because we only have 2 train queries in total, typical minibatch-sizes are 10, 100, 1000
Command: rank-lips train -d ./train-features -q train.qrel -e 'my first rank-lips experiment' -O ./out -o train-try-convergence-mini-batch --z-score --convergence-threshold 0.0001 --mini-batch-size 1 --feature-variant FeatScore --feature-variant FeatRecipRank
Command: rank-lips train -d ./train-features-with-missing -q train.qrel -e 'my first rank-lips experiment' -O ./out -o train-missing-features-variants --feature-variant FeatScore --feature-variant FeatRecipRank --default-feature-variant-value FeatRecipRank=100.0 --default-feature-variant-value FeatScore=-9999.99
… or per-feature (which is a combination of run filename and feature variant).
Command rank-lips train -d ./train-features-with-missing -q train.qrel -e 'my first rank-lips experiment' -O ./out -o train-missing-features --feature-variant FeatScore -f FeatureA -f FeatureB --default-feature-value FeatureA-FeatScore=-10.0 --default-feature-value FeatureB-FeatScore=0.0