Latest NLP Research Derives Insight from Growing Volume of Digital Text
Jul 13, 2022 — Atlanta, GA
New NLP research from Georgia Tech is allowing for patterns to be uncovered in this text and broaden the understanding of how to build better computer applications that derive value from written language.
Georgia Tech researchers are presenting their latest work at the annual conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2022), taking place this week, July 10-15. NAACL provides a regional focus for members of the Association for Computational Linguistics (ACL) in North America as well as in Central and South America and promotes cooperation and information exchange among related scientific and professional societies.
“Recent advances in natural language processing – especially around big models – have enabled successful applications,” said Diyi Yang, assistant professor in the School of Interactive Computing and researcher in NLP. “At the same time, we see a growing amount of evidence and concern toward the negative aspects of NLP systems, such as the bias and fragility exhibited by these models, as well as the lack of input from users.”
Yang’s work in computational social science and NLP focuses on how to understand human communication in social context and build socially aware language technologies to support human-to-human and human-computer interaction.
Her SALT Lab has accrued an impressive number of innovations in the field over the past eight months, starting with research presented at last November’s EMNLP conference. SALTers, as they are called, led Georgia Tech to become the top global contributor in computational social science and cultural analytics at that venue. The 60th Meeting of the ACL in Dublin followed in May with multiple SALT studies, including a best paper. Yang’s group has six papers at this week’s NAACL.
“We hope to build NLP systems that are more user centric, more robust, and more aware of human factors,” said Yang. “Our NAACL works are in this direction, covering robustness, toxicity detection, and generalization to new settings.”
Yang’s aspirations for the field are shared by her Tech peers, who have work in the following tracks at NAACL:
- Ethics, Bias, Fairness
- Information Extraction
- Information Retrieval
- Interpretability and Analysis of Models for NLP
- Machine Learning
- Machine Learning for NLP
- Semantics: Sentence-level Semantics and Textual Inference
Georgia Tech’s research paper acceptances in the main program at NAACL are below. To learn more about NLP and machine learning research at Georgia Tech visit https://ml.gatech.edu.
GEORGIA TECH RESEARCH AT NAACL 2022 (main papers program)
Ethics, Bias, Fairness
Explaining Toxic Text via Knowledge Enhanced Text Generation
Rohit Sridhar, Diyi Yang
Information Extraction
Self-Training with Differentiable Teacher
Simiao Zuo, Yue Yu, Chen Liang, Haoming Jiang, Siawpeng Er, Chao Zhang, Tuo Zhao, Hongyuan Zha
Information Retrieval
CERES: Pretraining of Graph-Conditioned Transformer for Semi-Structured Session Data
Rui Feng, Chen Luo, Qingyu Yin, Bing Yin, Tuo Zhao, Chao Zhang
Interpretability and Analysis of Models for NLP
Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models
Tianlu Wang, Rohit Sridhar, Diyi Yang, Xuezhi Wang
Measure and Improve Robustness in NLP Models: A Survey
Xuezhi Wang, Haohan Wang, Diyi Yang
Reframing Human-AI Collaboration for Generating Free-Text Explanations
Sarah Wiegreffe, Jack Hessel, Swabha Swayamdipta, Mark Riedl, Yejin Choi
Machine Learning
AcTune: Uncertainty-Aware Active Self-Training for Active Fine-Tuning of Pretrained Language Models
Yue Yu, Lingkai Kong, Jieyu Zhang, Rongzhi Zhang, Chao Zhang
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Simiao Zuo, Qingru Zhang, Chen Liang, Pengcheng He, Tuo Zhao, Weizhu Chen
Machine Learning for NLP
TreeMix: Compositional Constituency-based Data Augmentation for Natural Language Understanding
Le Zhang, Zichao Yang, Diyi Yang
NLP Applications
Cryptocoin Bubble Detection: A New Dataset, Task & Hyperbolic Models
Ramit Sawhney, Shivam Agarwal, Vivek Mittal, Paolo Rosso, Vikram Nanda, Sudheer Chava
Semantics: Sentence-level Semantics and Textual Inference
SEQZERO: Few-shot Compositional Semantic Parsing with Sequential Prompts and Zero-shot Models
Jingfeng Yang, Haoming Jiang, Qingyu Yin, Danqing Zhang, Bing Yin, Diyi Yang
SUBS: Subtree Substitution for Compositional Semantic Parsing
Jingfeng Yang, Le Zhang, Diyi Yang