Decomp

Traditional semantic annotation frameworks generally define complex, often exclusive category systems that require highly trained annotators to build. And in spite of their high quality for the cases they are designed to handle, these frameworks can be brittle to deviations from prototypical instances of a category.

The Decompositional Semantics Initiative (Decomp) is founded on the idea that semantic annotation should rather take the form of many simple questions about words or phrases in context that are (i) easy for naive native speakers to answer, thus allowing annotations to be crowd-sourced; and (ii) more robust than traditional category systems, thus allowing coverage of non-prototypical instances.

Decomp is supported by DARPA AIDA, DARPA KAIROS, and IARPA BETTER. It is a large project and so only research products authored by core FACTS.lab members are listed below. A full list of publications, presentations, and available data can be found at decomp.io.

Papers (show)

Gantt, William, Lelia Glass, and Aaron Steven White. 2022. Decomposing and Recomposing Event Structure. Transactions of the Association for Computational Linguistics 10: 17–34 [pdf, data, code, doi]

Stengel-Eskin, Elias, Sheng Zhang, Kenton Murray, Aaron Steven White, and Benjamin Van Durme. 2021. Joint Universal Syntactic and Semantic Parsing. Transactions of the Association for Computational Linguistics 9: 756—773. [pdf, code, doi]

Xia, Patrick, Guanghui Qin, Siddharth Vashishtha, Yunmo Chen, Tongfei Chen, Chandler May, Craig Harman, Kyle Rawlins, Aaron Steven White, and Benjamin Van Durme. 2021. LOME: Large Ontology Multilingual Extraction. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations. Kyiv, Ukraine: Association for Computational Linguistics. [pdf, demo, docker]

Xu, Haoran, Seth Ebner, Mahsa Yarmohammadi, Aaron Steven White, Benjamin Van Durme, and Kenton Murray. 2021. Gradual Fine-Tuning for Low-Resource Domain Adaptation. Proceedings of the Second Workshop on Domain Adaptation for NLP. Kyiv, Ukraine: Association for Computational Linguistics. [pdf]

Vashishtha, Siddharth, Adam Poliak, Yash Kumar Lal, Benjamin Van Durme, and Aaron Steven White. 2020. Temporal Reasoning in Natural Language Inference. In Findings of EMNLP, 4070–4078. Online: Association for Computational Linguistics. [pdf, code]

Chen, Yunmo, Tongfei Chen, Seth Ebner, Aaron Steven White, and Benjamin Van Durme. 2020. Reading the Manual: Event Extraction as Definition Comprehension. In Proceedings of the EMNLP 2020 Workshop on Structured Prediction for NLP, 74–83. Online: Association for Computational Linguistics. [pdf]

White, Aaron Steven, Elias Stengel-Eskin, Siddharth Vashishtha, Venkata Subrahmanyan Govindarajan, Dee Ann Reisinger, Tim Vieira, Keisuke Sakaguchi, Sheng Zhang, Francis Ferraro, Rachel Rudinger, Kyle Rawlins, and Benjamin Van Durme. 2020. The Universal Decompositional Semantics Dataset and Decomp Toolkit. In Proceedings of The 12th Language Resources and Evaluation Conference, 5698–5707. Marseille, France: European Language Resources Association. [pdf, code + data]

Stengel-Eskin, Elias, Aaron Steven White, Sheng Zhang, and Benjamin Van Durme. 2020. Universal Decompositional Semantic Parsing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 8427–8439. Online: Association for Computational Linguistics. [pdf, doi]

Govindarajan, Venkata, Benjamin Van Durme, and Aaron Steven White. 2019. Decomposing Generalization: Models of Generic, Habitual, and Episodic Statements. Transactions of the Association for Computational Linguistics 7: 501–517. [pdf, data, doi]

Vashishtha, Siddharth, Benjamin Van Durme, and Aaron Steven White. 2019. Fine-Grained Temporal Relation Extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2906–2919. Florence, Italy: Association for Computational Linguistics. [pdf, data, doi]

Poliak, Adam, Aparajita Haldar, Rachel Rudinger, J. Edward Hu, Ellie Pavlick, Aaron Steven White, and Benjamin Van Durme. 2018. Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 67–81. Brussels, Belgium: Association for Computational Linguistics. [pdf, data, doi]

Rudinger, Rachel, Aaron Steven White, and Benjamin Van Durme. 2018. Neural Models of Factuality. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 731–744. New Orleans, Louisiana: Association for Computational Linguistics. [pdf, data, doi]

White, Aaron Steven, Pushpendre Rastogi, Kevin Duh, and Benjamin Van Durme. 2017. Inference Is Everything: Recasting Semantic Resources into a Unified Evaluation Framework. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 996–1005. Taipei, Taiwan: Asian Federation of Natural Language Processing. [pdf, data]

White, Aaron Steven, Kyle Rawlins, and Benjamin Van Durme. 2017. The Semantic Proto-Role Linking Model. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 92–98. Valencia, Spain: Association for Computational Linguistics. [pdf, code]

White, Aaron Steven, Dee Ann Reisinger, Keisuke Sakaguchi, Tim Vieira, Sheng Zhang, Rachel Rudinger, Kyle Rawlins, and Benjamin Van Durme. 2016. Universal Decompositional Semantics on Universal Dependencies. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 1713–1723. Austin, Texas: Association for Computational Linguistics. [pdf, data, doi]

MegaAttitude

The MegaAttitude Project addresses how humans draw complex inferences from the thousands of English predicates that combine with subordinate clauses – “think”, “know”, “say”, “tell”, “remember”, “forget”, etc. – when the structural characteristics of the clauses they combine with vary. For example, the sentence “John forgot that he bought milk” is similar to the sentence “John forgot to buy milk”; but from the first sentence, a listener infers that John bought milk, while from the second, a listener infers that he didn’t. This inference pattern is only one among many such patterns in English; yet, in spite of this variety, there appears to be substantial regularities across predicates and subordinate clause structures that prior work has only scratched the surface of. Investigating the systematicities in how humans compute these inference patterns sheds light on how the human cognitive system constructs complex meanings from simpler parts and supports the development of intelligent computational systems for comprehending and reasoning about natural language in human-like ways.

The current project approaches this investigation in two parts. First, it develops and deploys multiple scalable, crowd-sourced annotation protocols, based on experimental methodologies from psycholinguistics, in order to collect data about a wide variety of inference patterns triggered by all of the thousands of English predicates that combine with subordinate clauses. Second, it leverages recent advances in multi-task machine learning to build a unified computational model of the relationship between such predicates, the structure of their subordinate clauses, and the inferences that they trigger, which is trained on these data. This model not only helps to reveal systematicities in how humans compute the inference patterns of interest; it can also be straightforwardly incorporated into applied technologies for natural language understanding.

MegaAttitude is supported by a National Science Foundation collaborative grant (BCS-1748969/BCS-1749025).

Papers (show)

Kane, Benjamin, William Gantt, and Aaron Steven White. 2022. Intensional Gaps: Relating veridicality, doxasticity, bouleticity, and neg-raising. Edited by Nicole Dreier and Chloe Kwon. Semantics and Linguistic Theory 31: 570-605. [pdf, data, code, doi]

White, Aaron Steven. 2021. On Believing and Hoping Whether. Semantics and Pragmatics 14 (6): 1–18. [pdf, code, doi]

Kim, Gene Louis and Aaron Steven White. 2021. Montague Grammar Induction. Edited by Joseph Rhyne, Kaelyn Lamp, Nicole Dreier, and Chloe Kwon. Semantics and Linguistic Theory 30: 227–251. [pdf, doi]

White, Aaron Steven, and Kyle Rawlins. 2020. Frequency, Acceptability, and Selection: A Case Study of Clause-Embedding. Glossa 5(1): 105. 1–41. [pdf, code, doi]

Gantt, William, Benjamin Kane, and Aaron Steven White. 2020. Natural Language Inference with Mixed Effects. In Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics (*SEM 2020), 81–87. Online: Association for Computational Linguistics. [pdf, code]

Chen, Yunmo, Tongfei Chen, Seth Ebner, Aaron Steven White, and Benjamin Van Durme. accepted. Reading the Manual: Event Extraction as Definition Comprehension. In Proceedings of the EMNLP 2020 Workshop on Structured Prediction for NLP. Online: Association for Computational Linguistics. [pdf]

An, Hannah Youngeun, and Aaron Steven White. 2020. The Lexical and Grammatical Sources of Neg-Raising Inferences. In Proceedings of the Society for Computation in Linguistics 3: 220–233. [pdf, data, doi]

Moon, Ellise, and Aaron Steven White. 2020. The Source of Nonfinite Temporal Interpretation. In Proceedings of the 50th Annual Meeting of the North East Linguistic Society, 11-24. Amherst, MA: GLSA Publications. [pdf (preprint), data]

White, Aaron Steven. 2019. Lexically Triggered Veridicality Inferences. In Handbook of Pragmatics, 22:115–148. John Benjamins Publishing Company. [pdf (preprint), doi]

White, Aaron Steven, Rachel Rudinger, Kyle Rawlins, and Benjamin Van Durme. 2018. Lexicosyntactic Inference in Neural Models. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 4717–4724. Brussels, Belgium: Association for Computational Linguistics. [pdf, data, doi]

White, Aaron Steven, and Kyle Rawlins. 2018. The Role of Veridicality and Factivity in Clause Selection. In Proceedings of the 48th Annual Meeting of the North East Linguistic Society, edited by Sherry Hucklebridge and Max Nelson, 221–234. Amherst, MA: GLSA Publications. [pdf (preprint), data]

White, Aaron Steven, and Kyle Rawlins. 2018. Question Agnosticism and Change of State. In Proceedings of Sinn Und Bedeutung 21, edited by Robert Truswell, Chris Cummins, Caroline Heycock, Brian Rabern, and Hannah Rohde, 1325–1342. University of Edinburgh. [pdf]

White, Aaron Steven, and Kyle Rawlins. 2016. A Computational Model of S-Selection. Edited by Mary Moroney, Carol-Rose Little, Jacob Collard, and Dan Burgdorf. Semantics and Linguistic Theory 26: 641–663. [pdf, data, doi]

SuperMereo

Human language is a powerful tool for conveying information about complex, multi-faceted events at different levels of specificity: in the space of a breath, we can move from talking about a complex event as a whole to a targeted discussion of its many parts and their inter-relationships. Understanding how we convey such complex information using language is critical to improving not only our scientific understanding of human linguistic capacities, but also the ability of artificial intelligence systems to extract knowledge about the world from the massive bodies of text humans generate every day, and ultimately to improve their ability to serve humanity’s needs. With the goal of advancing both aims, this project develops foundational resources and cutting-edge deep learning-based artificial intelligence systems for extracting knowledge from those resources.

To achieve that goal, SuperMereo develops a broad-coverage, automatic method for mapping a description of an event to a rich representation of the relationships among that event’s parts: its event structure. It has two main components: (i) it collects behavioral data and text corpus annotations for key aspects of the event structure of verbal, adjectival, and nominal predicates in English; and (ii) it develops and implements a general deep learning-based computational model of event structure, trained using those data.

The lexicon and corpus produced under this proposal will be annotated for properties of events that are central in current linguistic theories of tense, grammatical aspect, and lexical aspect: (i) does the event have a natural endpoint (running a race) or not (simply running around)?; (ii) does the event happen at an instant (hitting a ball) or over time (building a house); (iii) what are the event’s preconditions and results?; (iv) are those results permanent (killing a mosquito) or transient (opening a door, which can be closed again)?; (v) do they come about gradually (cleaning a table) or abruptly (scuffing a table)?; (vi) does the event consist of indivisible parts (individual claps in applause) or not (being red)?; (vii) are those parts similar (tapping on glass) or dissimilar (shopping for clothes); and (viii) do event parts correspond to participant parts (writing a book) or not (combining ingredients)?

On the basis of these annotations, a computational model will be developed and implemented that jointly induces (a) distinct senses of a predicate (running a race v. running a company); (b) the event structure class(es) associated with those senses; (c) the event structure properties associated with those classes; and (d) a mapping from the event’s parts to its participants and temporal/causal structure. This model will integrate Bayesian hierarchical models with recent advances in deep learning and will enable explicit quantitative comparison of alternative theoretical assumptions, such as the number of event structure classes and properties that must be posited to best explain the data.

SuperMereo is supported by a National Science Foundation collaborative grant (BCS-2040831/BCS-2040820).

Papers (show)

Logical Form Induction (LoFI)

Artificial intelligence (AI) systems’ natural language processing capabilities have made remarkable strides in recent years. Beyond their numerous commercial applications, these advances suggest that AI systems might be powerful tools for deepening our understanding of how humans comprehend natural language. A major obstacle to using them for this purpose is that, while they seem to simulate certain aspects of reasoning by analogy quite well, their capacity to simulate complex logical reasoning shows much room for improvement. With the aim of addressing these shortcomings and thereby deepening our scientific understanding of human language, this project develops a framework for integrating complex logical reasoning capabilities into the components of AI systems that make their ability to reason by analogy possible. To support the development of this framework, the project develops a large dataset capturing the logical relationships among sentences in three languages.

In addition to producing foundational software and data artifacts for both artificial intelligence and language science researchers, both components of this project will be tightly integrated with the graduate and undergraduate curricula in computational linguistics at the University of Rochester. Through this integration, the project will serve as a vehicle to enhance programming and statistical literacy as well as data collection and data management skills through training with hands-on applications at both the undergraduate and graduate levels. It will additionally support the development and implementation of the curriculum for a new BS in Computational Linguistics at the University of Rochester as well as open courseware based on this curriculum, which can be used for the deployment of similar programs at other universities.

LoFI is supported by a National Science Foundation CAREER award (BCS-2237175).