Multimodal Tuning with Human Cognition Data as Prompts


Human cognitive signals reflect humans’ attention distribution and neural activation regarding different parts of the input, which are crucial in understanding the mechanism behind their language processing behaviour. Computational linguistics research aims to optimize language models to achieve a human-like level of performance in natural language processing tasks, ideally in an accountable fashion. This renders integrating human cognitive signals into language models an intriguing research area to optimize their downstream task performances in an accountable fashion. Previous works exploring how cognition data could enhance natural language processing (NLP) tasks bore limitations such as weak accuracy increase, heavy engineering bias, and limited generalizability of conclusions drawn from experiments on outdated models. This thesis addresses these issues by introducing a novel approach that leverages prompt-based fine-tuning. In particular, two methods were proposed. (1) Inspired by ‘hard prompting’, Method 1 uses gaze and electroencephalography (EEG) features as discrete prompt tokens to modify model behaviour during training; (2) Drawing on ‘soft prompting’, Method 2 designs a multi-modal prompting framework called `CogMAP’ (Cognition Mapping And Prompting), which employs these cognition features as multidimensional prompting vectors projected into the continuous embedding space of language models. Task results on ternary sentiment classification were consistently superior when incorporating either gaze or EEG data as prompts in both methods (p<0.001), across encoder-only BERT-based models and decoder-only GPT-2-based models. This study signifies a leap in cognition-inspired NLP research, addressing existing limitations while providing a new robust and effective paradigm for future investigations of bridging the gap between human cognition and artificial language processing to improve the performance and understanding of language models.

(Master’s Dissertation)
Yingjia Wan
Yingjia Wan
Master’s student in Natural Language Processing (graduated)

My research interests lie in multimodality, debiasing language models, prompting, and aliging lanaguage models with cognitive science for interpretability.