Research

IR+NLP:

My main line of research lies in the intersection between information retrieval, semantic annotations, knowledge graphs, and machine learning. Much of my research and of the Ph.D. students in the “TREMA” lab evolves around the vision of generating comprehensive articles for user-provided topics. This vision requires solving open research questions such as: How to identify which concepts (called entities) to mention? How to find supporting text in a large collection? How to identify whether a text that mentions a relevant concept is noteworthy for the user-provided topic? How to predict whether relations between concepts (as extracted from text or obtained from a knowledge graph) are relevant in the context? How to arrange relevant text passages into meaningful subtopics?

This line of work was honored with an NSF CAREER Award on “Utilizing Fine-grained Knowledge Annotations in Text Understanding and Retrieval” (January 2019 – December 2023).

Ongoing research efforts have led to a full-paper and three short-papers at the IR flagship conference ACM SIGIR (Dietz 2019; Chatterjee and Dietz 2021; Litschko et al. 2019; Kadry and Dietz 2017), prime venues such as CIKM (Ramsdell and Dietz 2020), ICTIR (Chatterjee and Dietz 2019; Weiland et al. 2016), and ECIR (Dalton et al. 2019), journals (Dietz and Dalton 2020; Nanni, Ponzetto, and Dietz 2020; Weiland et al. 2018, 2017). Our work won a best-paper award at the JCDL (an A* conference) in 2018 (Nanni, Ponzetto, and Dietz 2018). Our ideas were discussed at several conference workshops (Oza and Dietz 2021; Kashyapi and Dietz 2021; Magnusson and Dietz 2019; Basu, Dietz, and Fellbaum 2018).

I presented this work during several conference keynotes (ECIR, AKBC, SPIRE) and invited talks at renowned universities. I presented conference tutorials at ICTIR 2016, WSDM 2017, SIGIR 2018, and organized the workshop series “KG4IR” which was held at the IR flagship conference ACM SIGIR twice, as well as the workshop at NAACL on Extracting Structured Knowledge from Scientific Publications (ESSP), and an edited special issue in the Journal for Information Retrieval (IRJ).

The testbed for this vision, with benchmarks, evaluation protocols, and strong reference methods, has been developed within the TREC Complex Answer Retrieval challenge that I was coordinating between 2017–2019 with advice from members of the National Institute for Standards and Technology (NIST). It is a great honor that my track was selected by the TREC evaluation venue, since empirical system evaluation is central in my research field and TREC is a highly selective venue.

Watershed Data Science:

I have been developing a parallel research initiative on data science for studying storm events in watersheds with Adam Wymore and other faculty from the department of Natural Resources and the Environment (NRESS). The work of MS/PhD student Sepideh Koohfar and two undergrad capstone projects have led to a rigorous data processing pipeline with automatic storm event detection and a method for forecasting the solute concentration response to expected storm events.

The work has been awarded a seed grant from the NSF-funded Northeast Big Data Innovation Hub. Joint work is under submission to the AGU Fall Meeting.

Other Interests:

In general, I am interested in developing machine learning methods for analyzing, classifying, predicting, and tagging sequential data. The underlying technology impacts both my work on IR+NLP as well as Watershed Data Science. It is also why I like to work with students who are interested in various data domains such as music or social media.

Because of my expertise in algorithms and empirical system evaluation, I am consulted by the members of open-source community that supports the compiler for a functional programming language Haskell (GHC). This has led to some research publications on non-moving garbage collectors for the Haskell runtime system.

Research Projets

Automatic Wikipedia Construction

Together with my students I am working on methods to automatically, and in a query-driven manner, retrieve materials from the Web and compose Wikipedia-like articles. Especially for information needs, where the user has very little prior expert knowledge about, the web search paradigm of 10 blueToe hyperlinks is not sufficient. Instead we envision to provide a synthesis of the Web materials that strives to mimick the comprehensiveness of Wikipedia articles. We limit ourselves to a content-only setting where query-log, click, or session information is not available. Consequently, we aim to maximize the utility of information retrieval models in combination with methods from natural language processing. A particular emphasis is to utilize information from structured knowledge resources such as Wikipedia, Freebase, or DBpedia together with text-based reasoning on general document and Web corpora.

An early feasibility study was presented at AKBC 2014, a later demo presented at the ESAIR workshop at CIKM 2015 (demo). The method paper for the demo is under submission (information available on request).

Closely related work on reranking entities for web queries was presented at CIKM 2015 (appendix) as well as work on using relation extraction in information retrieval presented at ECIR 2016 (supervised relations) and SIGIR 2017 (OpenIE)

The project was awarded with an Amazon AWS in education research grant and a stipdend by the Eliteprogramm for Postdoktorandinnen und Postdoktoranden of the Baden-Württemberg Stiftung.

Entity-Aspect Linking

With Federico Nanni, I am working on building document collections for events. We found that entity links are too unspecific, as the same entity can be mentioned in different contexts (we call them entity aspects). In our JCDL 18 paper on entity aspect linking, we demonstrated that such aspects can be harvested section headings of the entity’s Wikipedia article. To post-process entity links, we propose a method for entity-aspect linking to refine the entity link with aspect information. When applied to retrieval problems, aspect linking improved the accuracy of rankings and classifications. This work received a best paper award at JCDL 2018.

We provide a large benchmark for training and evaluation of entity aspect linking (ramsdell2020?). In our latest SIGIR paper, we demonstrate the added benefits of using entity-aspects for entity-oriented search tasks (Chatterjee and Dietz 2021).

Learning Relevance-weighted Graphs

For many years I am interested in unsupervised algorithms for identifying shared aspects and quantifying influence in social networks. Work on symmetric networks is published at ICWSM 2012 ( Code & Supplement ) and asymmetric networks at ICML 2007 (talk – Supplement).

In my work at SIGIR 2019 (Dietz 2019), I propose a method for incorporating enity, neighbor and text information into an entity ranking task. The underlying framework represents neighbor and text information to predict edges weights in an entity-relation graph, optimizing for a list-wise learning-to-rank criterion. Paper – appendix – video

My PhD thesis was focused on topic models and other generative models for data with link structure.

Complex Answer Retrieval

From 2017-2019, I coordinated the Complex Answer Retrieval track at the Text Retrieval Conference (TREC). It is an international evaluation track on how can retrieve the most best passages and and entities on topics about popular science and society. For more information about the data, task and evaluation, please see the official TREC Complex Answer Retrieval site.

Track overview papers:

L.Dietz, M.Verma, F.Radlinski, N.Craswell (2017). TREC Complex Answer Retrieval Overview. In TREC. year 1
L.Dietz, B.Gamari, J.Dalton, N.Craswell (2018), TREC Complex Answer Retrieval Overview. In TREC. year 2
L.Dietz, B.Gamari, J.Foley (2019), TREC CAR Y3: Complex Answer Retrieval Overview. In TREC. year 3

References

Basu, Chumki, Laura Dietz, and Christiane Fellbaum. 2018. “WordNetContext: Information Retrieval-Friendly Access to WordNet Senses.” In Joint Proceedings of the First International Workshop on Professional Search (ProfS 2018); the Second Workshop on Knowledge Graphs and Semantics for Text Retrieval, Analysis, and Understanding (KG4IR); and the International Workshop on Data Search (DATA:SEARCH’18) Co-located with (ACM SIGIR 2018), 63–64. https://www.cs.unh.edu/~dietz/papers/basu2018wordnetcontext.pdf.

Chatterjee, Shubham, and Laura Dietz. 2019. “Why Does This Entity Matter? Support Passage Retrieval for Entity Retrieval.” In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR), 221–24. https://www.cs.unh.edu/~dietz/papers/chatterjee2019does.pdf.

Chatterjee, Shubham, and Laura Dietz. 2021. “Entity Retrieval Using Fine-Grained Entity Aspects.” In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 1662–66. https://www.cs.unh.edu/~dietz/papers/chatterjee2021entity.pdf.

Dalton, Jeff, Shahrzad Naseri, Laura Dietz, and James Allan. 2019. “Local and Global Query Expansion for Hierarchical Complex Topics.” In Proceedings of the European Conference on Information Retrieval (ECIR). Springer. https://www.cs.unh.edu/~dietz/papers/dalton2019local.pdf.

Dietz, Laura. 2019. “ENT Rank: Retrieving Entities for Topical Information Needs Through Entity-Neighbor-Text Relations.” In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 215–24. https://www.cs.unh.edu/~dietz/papers/dietz2019ent.pdf.

Dietz, Laura, and Jeff Dalton. 2020. “Humans Optional? Automatic Large-Scale Test Collections for Entity, Passage, and Entity-Passage Retrieval [Special Issue on Trends in Information Retrieval Evaluation].” Datenbank-Spektrum, 1–12. https://www.cs.unh.edu/~dietz/papers/dietz2020humans.pdf.

Kadry, Amina, and Laura Dietz. 2017. “Open Relation Extraction for Support Passage Retrieval: Merit and Open Issues.” In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 1149–52. ACM. https://www.cs.unh.edu/~dietz/papers/kadry2017open.pdf.

Kashyapi, Sumanta, and Laura Dietz. 2021. “Learn the Big Picture: Representation Learning for Clustering.” In Workshop on Representation Learning for NLP (RepL4NLP) at ACL 2021. ACL. https://www.cs.unh.edu/~dietz/papers/kashyapi2021learn.pdf.

Litschko, Robert, Goran Glavaš, Ivan Vulic, and Laura Dietz. 2019. “Evaluating Resource-Lean Cross-Lingual Embedding Models in Unsupervised Retrieval.” In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 1109–12. https://www.cs.unh.edu/~dietz/papers/litschko2019evaluating.pdf.

Magnusson, Matthew, and Laura Dietz. 2019. “An Analysis of Deep Contextual Word Embeddings and Neural Architectures for Toponym Mention Detection in Scientific Publications.” In NAACL Workshop on Workshop on Extracting Structured Knowledge from Scientific Publications (ESSP). https://www.cs.unh.edu/~dietz/papers/magnusson2019an.pdf.

Nanni, Federico, Simone Paolo Ponzetto, and Laura Dietz. 2018. “Entity-Aspect Linking: Providing Fine-Grained Semantics of Entities in Context.” In Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries (JCDL), 49–58. ACM. https://www.cs.unh.edu/~dietz/papers/nanni2018entity.pdf.

Nanni, Federico, Simone Paolo Ponzetto, and Laura Dietz. 2020. “Toward Comprehensive Event Collections.” International Journal on Digital Libraries 21 (2): 215–29. https://www.cs.unh.edu/~dietz/papers/nanni2020toward.pdf.

Oza, Pooja, and Laura Dietz. 2021. “Which Entities Are Relevant for the Story?” In Proceedings of the Fourth International Workshop on Narrative Extraction from Texts Held in Conjunction with the 43rd European Conference on Information Retrieval (Text2Story). https://www.cs.unh.edu/~dietz/papers/pooja2021entity.pdf.

Ramsdell, Jordan, and Laura Dietz. 2020. “A Large Test Collection for Entity Aspect Linking.” In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM), 3109–16. https://www.cs.unh.edu/~dietz/papers/ramsdell2020large.pdf.

Weiland, Lydia, Ioana Hulpus, Simone Paolo Ponzetto, and Laura Dietz. 2016. “Understanding the Message of Images with Knowledge Base Traversals.” In Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval (ICTIR), 199–208. ACM. https://www.cs.unh.edu/~dietz/papers/weiland2016understanding.pdf.

Weiland, Lydia, Ioana Hulpus, Simone Paolo Ponzetto, and Laura Dietz. 2017. “Using Object Detection, NLP, and Knowledge Bases to Understand the Message of Images.” In International Conference on Multimedia Modeling (MMM), 405–18. Springer, Cham. https://www.cs.unh.edu/~dietz/papers/weiland2017using.pdf.

Weiland, Lydia, Ioana Hulpuş, Simone Paolo Ponzetto, Wolfgang Effelsberg, and Laura Dietz. 2018. “Knowledge-Rich Image Gist Understanding Beyond Literal Meaning.” Data & Knowledge Engineering 117: 114–32. https://www.cs.unh.edu/~dietz/papers/weiland2018knowledge.pdf.