image/svg+xmlSlide
UMass at Knowledge Base Acceleration
UMass at Knowledge Base Acceleration
Use cursor keys to flip through slides.
Laura Dietz dietz@cs.umass.eduand Jeffrey Dalton jdalton@cs.umass.eduUniversity of Massachusetts
Motivation: Exploratory Info Needs
Given user query X?Find relevant- documents- entities- relationsthat are consistent.
One Approach
Entity Linking
Entity-based Retrieval
ad hoc IR
Object IR
Given query entity find documents in which the entity plays a central role.Disclaimer: - Do not address novelty (vital)- Do not address Twitter- Memory-less methods only- Completely unsupervised
Query
Entity
Problem: Entity-based Retrieval
Document
Structured Entity Information
Canonical Name
Various Names
Neighboring Entities
TextWiki Article
Entity-based Retrieval
Equivalent to Entity Linking IR with swapped rolesActually:
IR Query with Canonical Name
Sequential Dependence Model#sdm( canonical name )
Mixed with Entity Names
Names from Anchor text, Freebase Names, RedirectsDifferent names weighted by disambiguation probability:
# name refers to this entity# name refers to any entity
Entity Text
Most frequent terms from the Wikipedia article
Entity Text + Names
Disambiguation Problem
We might retrieve documents that mentiona different entity with the same name!Solution: Link entities in retrieved documents.Do entity links point back to the query entity?
Entity-basedIR
Entity Link
Disambiguation Problem
Improve precision of IR with Entity Linking- Use maximum entity-linking score (of any mention)- Restrict to top1 links / non-NILs- Build language model of entity links; Use probability of query entity under language model
Entity-Link Language Model
Entity-basedIR
Entity Link
Only on 2 documents per week
Entity Linkper mention
Mentionsper doc
Experimental Entity Retrieval Results:TREC Knowledge Base Acceleration 2013
Comparison to Other Teams
"official" P/R/F over Confidence
Our best run:Sequential Dependence Modelon wiki title!
0
200
400
600
800
1000
0.0
0.1
0.2
0.3
0.4
0.5
0.6
P
rm
rm
rn
rt
rtn
sdm
skq
link top NIL
link top
link NIL
link
link LM
"official" Precision over Confidence
Confidence versus Ranking
Issue: - We have retrieval score- How to turn it into a confidence cutoff?Intuition: rank1 = 1000, rank 1000 = 1... but without looking into the future!Rankings of two training weeks to determine score at rank 1 and rank 1000Use linear projection of rank score to obtain conf.
Issue: - We have retrieval score- How to turn it into a confidence cutoff?Intuition: rank1 = 1000, rank 1000 = 1... but without looking into the future!Rankings of two training weeks to determine score at rank 1 and rank 1000Use linear projection of rank score to obtain conf.Probably our weakest part....To analyze our results:Resurrect ranking from confidence scores, evaluate ranking.
Confidence versus Ranking
P/R/F over Rank Cutoffs
0
200
400
600
800
1000
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
R
CIIR-fix.all.wrm_trm_s
CIIR-fix.all.wrm_tsdm_s
CIIR-fix.all.wrn_tsdm_s
CIIR-fix.all.wrt_tsdm_s
CIIR-fix.all.wrtn_tsdm_s
CIIR-fix.all.wsdm_tsdm_s
CIIR-fix.all.wskq_tsdm_s
CIIR-top2LinkedMaxScoreOnlyTop__1.0_twitter-ccr-wi
CIIR-top2LinkedMaxScoreOnlyTop_twitter-ccr-wikiped
CIIR-top2LinkedMaxScore__1.0_twitter-ccr-wikipedia
CIIR-top2LinkedMaxScore_twitter-ccr-wikipedia-only
CIIR-top2LinkedProb_twitter-ccr-wikipedia-only-vit
0
200
400
600
800
1000
0.40
0.45
0.50
0.55
0.60
0.65
0.70
P
rm
rm
rn
rt
rtn
sdm
skq
link top NIL
link top
link NIL
link
link LM
P and R over Rank Cutoffs
Recall
Precision
0
200
400
600
800
1000
0.40
0.45
0.50
0.55
0.60
0.65
0.70
P
rm
rm
rn
rt
rtn
sdm
skq
link top NIL
link top
link NIL
link
link LM
Conclusions 1
Our Champion:Sequential Dependence Model on canonical entity name
Conclusions 1
Our Champion:Sequential Dependence Model on canonical entity name... which was our base line ... Using SDM in combination with names or text is worseRefining SDM with entity linking is worseand did not achieve any improvement in precision.
Entity Linking: Error Analysis
Wanted:High precision method using Entity LinkingIssue:We predicted very few documents:approx 20 docs per entity:(Micro-average Precision: 0.6
0
100
200
300
400
ETR days
0.0
0.1
0.2
0.3
0.4
0.5
P10
week
UMass_CIIR wrm_trm_s
UMass_CIIR wrm_tsdm_s
UMass_CIIR wrn_tsdm_s
UMass_CIIR wrt_tsdm_s
UMass_CIIR wrtn_tsdm_s
UMass_CIIR wsdm_tsdm_s
UMass_CIIR wskq_tsdm_s
UMass_CIIR top2LinkedMaxScoreOnlyTop_-1.0-twitter
UMass_CIIR top2LinkedMaxScoreOnlyTop-twitter
UMass_CIIR top2LinkedMaxScore_-1.0-twitter
UMass_CIIR top2LinkedMaxScore-twitter
UMass_CIIR top2LinkedProb-twitter
Performance over Time (P@10)
ETR week daysdm 0.52 0.23 0.16rtn 0.46 0.21 0.14link 0.36 0.05 0.03link LM 0.34 0.04 0.03
How to treat unjudged documents?
0
100
200
300
400
ETR days
0.00
0.02
0.04
0.06
0.08
0.10
P10
week
UMass_CIIR wsdm_tsdm_s
UMass_CIIR top2LinkedProb-twitter
0
100
200
300
400
ETR days
0.0
0.1
0.2
0.3
0.4
0.5
P10
week
UMass_CIIR wsdm_tsdm_s
UMass_CIIR top2LinkedProb-twitter
0
100
200
300
400
ETR days
0.00
0.02
0.04
0.06
0.08
0.10
P10
week
UMass_CIIR wsdm_tsdm_s
UMass_CIIR top2LinkedProb-twitter
Unjudged as Negative
Unjudged Ignored
Conclusion
'best' SDM on wiki titleStill hope in entity linkingQuestions for the planning session:- Evaluation over conf cutoffs or as ranking?- How to deal with missing judgments?- Time-aware evaluation?Time-aware evaluation code available: or fork my 'year2' branchhttp://ciir.cs.umass.edu/~dietz/streameval/dietz@cs.umass.edu
github.com/laura-dietz/taia-stream-eval/releases/tag/kba-y2.1
0
100
200
300
400
ETR days
0.0
0.1
0.2
0.3
0.4
0.5
P10
week
UMass_CIIR wsdm_tsdm_s
UMass_CIIR top2LinkedProb-twitter