Simulation-based Natural Language Understanding for Scientific Procedural Texts


We are a group of researchers at Hebrew University of Jerusalem, Georgia Tech, the Allen Institute for Artificial Intelligence & Ohio State University.

The TextLabs project targets natural language understanding (NLU) of complex scientific procedural texts ("recipes" for performing experiments). This page hosts project resources, such as links to publications, models, code and data.

On the NLU side, experimental procedures typically feature dense, technical and under-specified language, making them a particularly tough nut to crack for current state-of-the-art-systems.

On the application side, there is rapidly growing interest in automated execution of experimental protocols, to address serious reproduceability issues across many fields, such as chemistry, biology and materials science.

Current procedural datasets serve primarily for text-mining applications, and do not provide enough detail to support execution. To begin bridging the gap towards execution, we re-framed procedural text understanding as a text-to-code setting, and developed a novel process-level representation called a Process Execution Graph (PEG). Natural language protocols can be parsed into PEGs which can later be converted into actual instructions for automated, executable workflows.

In practice, we enriched a sub-set of the existing Wet Labs Protocols dataset to our PEG format. The current version features:

In the future, we look forward to making use of the underlying text-based game environment, by leveraging recent advances in text-based reinforcement learning (RL) agents and adapting them to our data.

­čöÄExplore the data