🧪TextLabs
Simulation-based Natural Language Understanding for Scientific Procedural Texts
About
We are a group of researchers at Hebrew University of Jerusalem, Georgia Tech, the Allen Institute for Artificial Intelligence & Ohio State University.
The TextLabs project targets natural language understanding (NLU) of complex scientific procedural texts ("recipes" for performing experiments). This page hosts project resources, such as links to publications, models, code and data.
On the NLU side, experimental procedures typically feature dense, technical and under-specified language, making them a particularly tough nut to crack for current state-of-the-art-systems.
On the application side, there is rapidly growing interest in automated execution of experimental protocols, to address serious reproduceability issues across many fields, such as chemistry, biology and materials science.
Current procedural datasets serve primarily for text-mining applications, and do not provide enough detail to support execution. To begin bridging the gap towards execution, we re-framed procedural text understanding as a text-to-code setting, and developed a novel process-level representation called a Process Execution Graph (PEG). Natural language protocols can be parsed into PEGs which can later be converted into actual instructions for automated, executable workflows.
In practice, we enriched a sub-set of the existing Wet Labs Protocols dataset to our PEG format. The current version features:
- eXecutable Wet Labs Protocols (X-WLP): a new dataset containing process level PEG annotations for 279 protocols.
- An interactive text-based game annotation interface which handles visualization, input validation and state tracking (demo video).
- Modelling experiments on X-WLP using a SciBERT pipeline and a multi-task approach based on the DyGIE++ method.
In the future, we look forward to making use of the underlying text-based game environment, by leveraging recent advances in text-based reinforcement learning (RL) agents and adapting them to our data.
🔎Explore the data
Publications
- Process-Level Representation of Scientific Protocols with Interactive Annotation
Ronen Tamari, Fan Bai, Alan Ritter and Gabriel Stanovsky
EACL 2021
[preprint][paper][data][code][data][guidelines for annotators]
Team