Evaluation Results on Entity-aspect-linking-2020

See Main Page for details and downloads.

Experimental Setup

Small/Test: Train on Train-Small and predict on Test.
Small/Nanni-Test: Train on Train-Small and predict on Nanni-Test.
Small/Nanni’s 201: Train on Train-Small and predict on Nanni’s 201.
Nanni’s 201-CV: 5-fold cross-validation on Nanni’s 201. (Original evaluation protocol of Nanni et al. (2018))
Remaining/Test: Train on Train-Remaining and predict on Test.
Remaining/Nanni-Test: Train on Train-Remaining and predict on Nanni-Test.
Remaining/Nanni’s 201: Train on Train-Remaining and predict on Nanni’s 201.

Results

We report results separately for features derived from sentence and paragraph contexts.

Evaluation results using train-small and nanni-200. Significance is analyzed with a standard error overlap test: ^▾ below standard error, ^▴ above standard error.

	Paragraph Context				Sentence Context
Small/Test		P@1	MAP	ndcg@20		P@1	MAP	ndcg@20
Rank-lips		0.582±0.007	0.746±0.004	0.810±0.003		0.623±0.007	0.771±0.004	0.828±0.003
RankLib		0.576±0.007	0.740±0.004	0.804±0.003		0.614±0.007	0.765±0.004	0.824±0.003
Small/Nanni-Test
Rank-lips		0.601±0.004^▾	0.755±0.002^▾	0.816±0.002^▾		0.664±0.003^▴	0.802±0.002^▴	0.851±0.002^▴
RankLib		0.594±0.004^▴	0.751±0.002^▴	0.813±0.002^▴		0.668±0.003^▾	0.806±0.002^▾	0.855±0.002^▾
Small/Nanni’s 201
Rank-lips		0.617±0.034^▴	0.762±0.022^▴	0.821±0.017^▴		0.657±0.033	0.784±0.022	0.836±0.017
RankLib		0.632±0.034^▾	0.779±0.021^▾	0.835±0.015^▾		0.677±0.033	0.796±0.021	0.845±0.016
Nanni’s 201-CV
Rank-lips		0.647±0.034	0.780±0.022	0.835±0.017		0.667±0.033	0.785±0.022	0.837±0.017
RankLib		0.602±0.034	0.747±0.022	0.817±0.017		0.612±0.034	0.765±0.022	0.824±0.016
Nanni et al		0.637±0.034	0.777±0.021	0.833±0.016		0.667±0.034	0.790±0.022	0.842±0.016

Remaining/Test
Rank-lips		0.587±0.006	0.751±0.004	0.813±0.003		0.628±0.006	0.774±0.004	0.831±0.003
Remaining/Nanni-Test
Rank-lips		0.604±0.004	0.758±0.002	0.818±0.002		0.697±0.003	0.822±0.002	0.867±0.002
Remaining/Nanni’s 201
Rank-lips		0.626±0.034	0.771±0.022	0.828±0.016		0.682±0.033	0.797±0.022	0.846±0.017

If you obtain new results on this dataset, we want to hear about your results and would be honored to include it in the table below.

Baseline

The baseline uses list-wise learning to rank to combine the following features.

All features are based on word/entity similarities between context and (parts of) an aspect.

The following similarities are used. We exclude Nanni’s RDF2Vec feature since it is difficult to produce and does not perform well.

BM25:: using context as query and aspect part as document, use BM25 with default parameters as a ranking model.¹
TFIDF:: cosine tf-idf score between context and aspect part. We use tf-idf variant with tf log normalization and smoothed inverse document frequency.
OVERLAP:: number of unique words/entities shared between context and aspect part (no normalization).
W2VEC:: Word embedding similarity between context and aspect part. Word vectors are weighted by their TF-IDF weight. The pretrained word embeddings were taken from word2vec-slim, a reduced version of Google News word2vec model.²

Features combinations:

context	aspect part	BM25	TFIDF	Overlap	W2Vec
sentence words	name words	X	X		X
paragraph words	name words	X	X		X
sentence words	content words	X	X	X	X
paragraph words	content words	X	X	X	X
sentence entities	content entities	X	X	X
paragraph entities	content entities	X	X	X

Entity-aspect-linking-2020 by Jordan Ramsdell, Laura Dietz is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at http://trec-car.cs.unh.edu/datareleases/v2.4-release.html, work at www.wikipedia.org, and on a work at https://federiconanni.com/entity-aspect-linking/.

We provide corpus statistics in our dataset.↩︎
Available at https://github.com/eyaler/word2vec-slim ↩︎