Authors: Jordan Ramsdell and Laura Dietz.
The test collection and all associated data are released under a Creative Commons Attribution-ShareAlike 4.0 International License. .
We provide a large-scale dataset for training and evaluating methods for entity aspect linking (EAL), a fine-grained variation on entity linking that discerns which particular aspect of an entity is mentioned in the context.
Building on the definition of Nanni et al. (2018), we formalize the task as a refinement of entity linking as follows:
Given a paragraph-sized text passage t with entity links to entities e1, e2, ..en.
For each entity ei, a catalog of candidate aspects ai1, ai2, …aim is available with name, content, and entity links.
The task is to predict for each entity ei the correct aspect aij that is mentioned in the context t.
For every entity a catalog of candidate aspects to be available for each entity. List Nanni et al. we construct the aspect catalog from the top-level sections of an entity’s Wikipedia pages, where each section represents one aspect. Administrative sections without topical nature such as "References" or "See Also" are excluded from the aspect catalog.
See EAL Instances for target entity Oyster.
See Results for reference results. (Please send us your results too!)
See README.mkd for detailed dataset description.
See data model of test collection for questions about the JSON-L format.
Our dataset is derived from an English Wikipedia dump from 01/01/2020 offered by TREC Complex Answer Retrieval track v2.4 release, which exposes section and hyperlink information in a machine-readable format.
We also provide a converted re-release of the original dataset of Nanni et al. (2018) in our data schema, as nanni-201
.
This material is based upon work supported by the National Science Foundation under Grant No. 1846017. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Entity-aspect-linking-2020
by Jordan Ramsdell, Laura Dietz
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at http://trec-car.cs.unh.edu/datareleases/v2.4-release.html,
work at www.wikipedia.org,
and on a work at https://federiconanni.com/entity-aspect-linking/.