openPRINCETON, NJ

EAGER: Building Idiomaticity into Natural Language Processing

National Science Foundation

Description

Idiomatic expressions are an essential component of everyday language use and the hallmark of native language ability. Consider the phrase throw away; proficient speakers can effortlessly understand that the phrase takes a figurative meaning in “Britain threw away all the achievements of the last decade.” and a literal sense in “He threw away his cigarette and buried his head in his arms.” This EArly Grant for Exploratory Research (EAGER) will build a high-quality dataset for computers to understand the differences between figurative and literal senses of these expressions in general English text. The main novelty of this project will be in collecting a large class of idiomatic expressions and sentences containing them to let computers learn the inherent variability between a variety of idiomatic phrases. Collecting many sentences with phrases that have a figurative and literal meaning will permit computers better understand the nuances with which these expressions are used in everyday conversations and writing. Beyond understanding them, the collected. examples will help computers use these expressions like native speakers do when automatically writing text and even suggest appropriate expressions in specific contexts. This EAGER project is essentially interdisciplinary spanning the areas of linguistics and computation and will investigate novel paradigms for natural language processing that are idiomaticity-aware. As such, it will have two research aims: (1) creating a high-quality dataset of phrasal verbs annotated with their context-specific senses and their literal/figurative equivalent forms, and (2) testing the performance of state-of-the-art idiomaticity-aware algorithms. Because idiomatic expressions vary widely in form and structure, the focus on phrasal verbs (also known as verb-particle constructions) in the context of the exploratory project will permit studying a very frequent class of idiomatic expressions that are syntactically different from those in currently available datasets. The primary risk of this project stems from its exploratory nature of creating large corpora with sufficient coverage for language model training. Given their prevalence in natural language, the dataset of phrasal verbs in English will supplement available datasets on idiomatic expressions in terms of their variety. Moreover, their figurative and literal ambiguity in context (apart from their polysemy) will permit a diverse look at the phenomenon of non-compositionality that characterizes idiomatic expressions. Thus, the dataset will serve as a training and test bed for algorithms that detect, interpret, and generate a broad class of idiomatic expressions. This effort will lead to new natural language processing algorithms for accurate interpretation and generation of idiomatic expressions towards a more human-like language processing ability in machines. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2611728 | Program: 01002223DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Suma Bhat | Institution: Princeton University, PRINCETON, NJ | Award Amount: $20,360 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2611728 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2611728.html

Interested in this grant?

Sign up to get match scores, save grants, and start your application with AI-powered tools.

Start Free Trial

Grant Details

Funding Range

$20,360 - $20,360

Deadline

July 31, 2026

Geographic Scope

PRINCETON, NJ

Status
open

External Links

View Original Listing

Want to see how well this grant matches your organization?

Get Your Match Score

Get personalized grant matches

Start your free trial to save opportunities, get AI-powered match scores, and manage your applications in one place.

Start Free Trial