Instructions for Reproducing/Running ENT Rank
Installation
Get the ENT-rank code and follow the installation instructions given.
Check out the tag sigir19-ent-rank
to get the code version that is used in the SIGIR paper, or the latest version from master
for a version with potential bug fixes.
Download dataset
From TREC CAR dataset from http://trec-car.cs.unh.edu/datareleases/v2.1 download and unpack the following files
- paragraphCorpus (corpus, $paragraphCbor)
- allButBenchmark (Knowledge graph, $allButBenchmark)
- benchmarkY1train, benchmarkY1test, benchmarkY2test (outlines, $queryCbor)
Get the the DBpedia V2 extension
Prerequisites for Running ENT Rank
The ENT Rank code creates features from (unsupervised) input rankings created with a large range of different retrieval models.
The ENT rank code reads any input runs in TREC Run file format – feel free to use your own!
Download the TREC Run files used to produce the experiments of the SIGIR paper here.
Alternatively you can re-produce the input runs. See instructions Alternative: Reproduce input runs.
Extract paragraph/page/section contexts for edges from the $paragraphCbor and $allButBenchmarkCbor.
Or download edgeContexts from main page.
Running ENT Rank
ENT Rank code proceeds in different phases: training, testing, visualization (graphvis) each described in detail below.
Each of the ENT Rank commands creates features from input ranking files. These are passed in on the command line through multiple “–grid-run” arguments, which each specifies the semantics of the run toether with the $runFileName (containing the corresponding ranking).
--grid-run "${queryModel} ${retrievalModel} ${expansionModel} ${indexType} ${contextType} ${runFileName}"
Options for
- $queryModel: SectionPath for section-level runs, Title for page-level runs.
- $retrievalModel: BM25, Ql
- $expansionModel: NoneX, Rm, EcmX, EcmPsg
- $indexType: ParagraphIdx, EntityIdx, PageIdx, AspectIdx
- $contextType: for paragraph-based contexts use Edge, for page-based contexts use Entity (only bidirectional), for aspect-based contexts use Aspect
With ${runFileList}
we refer to all the list of “–grid-run” options.
Training ENT Rank
To train an ENT Rank model use this command:
- $bin/graph-expansion-feature-generation $allButBenchmark -o $outputFilePattern -q $queryCbor -k ${numResults} –edge-doc-cbor ${edgeContext} –page-doc-cbor ${pageDoc} –aspect-doc-cbor ${aspectDoc} ${runFileList} train –train-model ${trainedModelFile} –qrel ${qrels} –exp ${expSetting} –exp AllExp –exp JustTitleAndSectionPath –exp JustSimpleRm –exp JustScore –exp JustSourceNeighbors ${pageParams} ${sectionParams} ${miniBatchParams} ${enableCv} ${cachedTrainData} ${confExtraParams} +RTS -s -N20 -A64M -qn6 -RTS
Description of different parts of the command line:
- $pageOrSection for article-level runs set to “page”, for hierarchical/tree runs set to “section”.
- $numResults 1000
- $exp is a friendly name referring to an experiment run.
- $qrels set to the ground truth appropriate for this run (i.e., matching $queryCbor, $sectionOrPage, etc.
- $miniBatchParams used in the learning-to-rank optimization loop. Recommended to set to “–mini-batch-steps 1 –mini-batch-size 150 –mini-batch-eval 0” using a minibatch size of 150 queries with 1 iteration.
- $runFileList as described above
- $edgeContext, $pageDoc, $aspectDoc referring to the filepath to the entity context files (Cbor) created under prerequisites.
- $expSetting only needed if only a feature subset is to be used choices: NoEdgeFeats JustAggr ExpEcmTestFeature OnlyNoneXFeature NoRawEdgeFeats NoNeighborFeats NoEntityFeats AllExp
- $pageParams for page-level runs, set to “–exp ExpPage” (not needed for section-level runs)
- $sectionParams for section-level runs, include “–query-from-sections”
- \(outputFilePattern prefix for output files that are created. In our experiments we use "\){exp}-train–entity-${pageOrSection}–”
- $trainModelFile file prefix under which models are written
- $enableCv to enable 5-fold crossvalidation set to “–include-cv True” , otherwise False
- \(cachedTrainData as most time and memory is spent processing input runs and producing features, we include an option to cache train data. - Write train data caching with "--do-write-train-data True". After the program completes, you find a the training data in a file with this pattern "run-\){benchmark}-.*-AllExp-train–entity-${pageOrSection}–.alldata.cbor”. To produce different runs with feature subsets, it is sufficient to run with “AllExp”. However if any other parameters are changed, a new train data file needs to be created. In some cases it is faster to first create the training data without training the model, then train the model in a separate process. To swith off training, set the flag “–do-train-model False”.
- To use previously cached train data use “–train-data $trainDataFile –do-write-train-data False” — don’t forget to disable train data writing, or you may overwrite your files.
$confExtraParams configures individual experiments as follows:
- only paragraph contexts: –exp NoEdgesFromAspects –exp NoEdgesFromPages
- only page contexts: –exp NoEdgesFromAspects –exp NoEdgesFromParas
- only aspect contexts: –exp NoEdgesFromParas –exp NoEdgesFromPages
- to run an experiment with all three contexts, just omit these options
The options “-N20 -A64M -qn” are instructing the run time system to use 20 threads and with 64 megabyte initial memory for each thread and 6 garbage collection threads. We found this configuration to work well on a 50 CPU, 1TB memory machine. Consult the documentation of the Haskell run time system for details.
Testing ENT Rank (predicting a ranking)
To predict a ranking using a pre-trained model use this command:
- $bin/graph-expansion-feature-generation ${allButBenchmark} -o ${outputFilePattern} -q ${queryCbor} -k ${numResults} –edge-doc-cbor ${edgeContext} –page-doc-cbor ${pageDoc} –aspect-doc-cbor \({aspectDoc} "\){runFileList}” test –test-model ${trainedModelFile} –exp ${expSetting} –exp AllExp –exp JustTitleAndSectionPath –exp JustSimpleRm –exp JustScore –exp JustSourceNeighbors ${pageParams} ${sectionParams} ${cachedTrainData} ${miniBatchParams} +RTS -s -N20 -A64M -qn6 -RTS
The valid command line options for the test command are a subset of the train command, see description above.
- $trainedModelFile: the file name to a model produced with the train command. (If you train on one benchmark and test on another, you want
.*model-train.json
; to reproduce one of the folds number \({fold}, .\*model-fold-\){fold}-best.json)
Output Files from training and testing
Training without cross validation will produce these files
${outputFilePattern}model-run-benchmarkY1train-10--AllExp.json-model-train.json
– the model
${outputFilePattern}-model-train.run
– a ranking on the trainig data (to measure training-error)
Testing on a pretrained model will produce these files
- ${outputFilePattern}-model-predict.run
– test ranking, but we chose a different name to not confuse it with k-fold CV experiments
Trainig with cross-validation will produce a set of files like these (each for for $fold in 0-4):
${outputFilePattern}-${modelName}-model-fold-${fold}-best.json
– model trained on the fold
${outputFilePattern}--model-fold-${fold}-best.run
– a ranking on the fold’s training data
${outputFilePattern}-model-test.run
– a test ranking (predicted on test fold for each model, then concatenated).
All experiments in the paper conducted on benchmarkY1train and dbpediaV2 are produced with the cross-validation’s test ranking.
All other experiments in the paper conducted by training on benchmarkY1train (trained without cross validation) and are predicted on the test benchmark.
Graphviz visualization
To produce the graphvis visualization, use this command:
- $bin/graph-expansion-feature-generation ${allButBenchmark} -o ${outputFilePattern} -q ${queryCbor} -k ${numResults} –edge-doc-cbor ${edgeContext} –page-doc-cbor ${pageDoc} –aspect-doc-cbor ${aspectDoc} ${runFileList} –graphviz-model ${modelFile} –posify ExpDenormWeight –qrel ${qrels} –exp ${expSetting} $–exp AllExp –exp JustTitleAndSectionPath –exp JustSimpleRm –exp JustScore –exp JustSourceNeighbors ${pageParams} ${sectionParams} ${cachedTrainData} ${miniBatchParams} ${graphvisConf} +RTS -s -N20 -A64M -qn6 -RTS
The valid command line options for the test command are a subset of the train command, see description above.
$graphvisConf required to set visualization for one particular example for which the candidate graph between two entities is to be visualized
--query $queryId --graphviz-target-entity $targetEntityId --graphviz-source-entity $sourceEntityId --graphviz-path-restriction $numEdges
where
- $queryId: title query id
- $targetEntityId: (CAR) entity id of one entity
- $sourceEntityId: (CAR) entity id of the other entity
- $numEdges: is the max number of edges between these two entities for a path to be included in the visualization
In the paper we used
--query enwiki:Zika%20fever --graphviz-target-entity enwiki:South%20America --graphviz-source-entity enwiki:Zika%20fever --graphviz-path-restriction 2```
Evaluation
We evaluate the runs with trec_eval using the qrels provided by in the data releases.
We use minir-plots to compute mean results with standard errors, produce paired-t-test results and create plots.
Recommended Workflow
- Extract Edge contexts (or download “edgeContexts” archive)
- Create or download input runs
- Create train data
- Run train command with
--exp AllExp --do-write-train-data True --do-train-model False
- For each feature subset, train models with 5-fold cv
- Run train command with
-exp $featureSet --exp AllExp --do-write-train-data False --include-cv True
- To train on one benchmark and test on another benchmark (or to evaluate a pre-trained model)
- Run train command with
-exp $featureSet --exp AllExp --do-write-train-data False --include-cv False
(or with --include-cv True
)
- Identify the trained model file which is indicated by the file suffix
model-train.json
- Run test command with
--test-model ${trainedModelFile}
- Evaluate rankings
- Identify the run files which end with
.*-test.run
for crossvalidation results and when testing on a different benchmark.
- Identify the right qrels file for this run (matching benchmark, page-level vs section-level, passage vs entity, auto vs manual)
- Run
trec_eval -q -c -m map -m Rprec -m ndcg -m ndcg_cut.10 ${qrels} ${run}
and store result in a file (we call it below ${evalFile})
- Important: Use
-c
or your evaluation results are wrong !!!
- Plot and analyze the results with minir-plots
- for all metrics in map Rprec ndcg ndcg_cut_10:
- Plot and create table with standard errors:
nix run -f $minirPlotsDirectory -c minir-column ${evalFile} --metric ${metric} --format trec_eval --out ${pdfFileName} >| ${tableFileName}
- Paired-t-test:
nix run -f $minirPlotsDirectory -c minir-pairttest ${baselinerun} ${evalFile} --metric ${metric} --format trec_eval >| ${pairedTtestFileName}
- Hurts/helps analysis:
nix run -f $minirPlotsDirectory -c minir-hurtshelps ${baselinerun} ${evalFile} --metric ${metric} --format trec_eval --delta 0.1 >| ${hurtshelpsFileName}