Thesis Examination Committee
Prof Bing-Yi JING, MATH/HKUST (Chairperson)
Prof Pascale FUNG, ECE/HKUST (Thesis Supervisor)
Prof Preslav NAKOV, Arabic Language Technologies, Qatar Computing Research Institute, HBKU (External Examiner)
Prof Matthew MCKAY, ECE/HKUST
Prof Bertram SHI, ECE/HKUST
Prof Andrew HORNER, CSE/HKUST
Humor is a very pervasive emotion and an important driver in human relations. Research on computational humor, especially in conversations, is very scarce, and it is hard for machines to understand humorous utterances, and generate appropriate jokes to the situation.
We first propose a novel framework to recognize humor in dialogues. We perform an initial attempt with a Conditional Random Field, trained on an ensemble of acoustic and language features, to exploit the discourse context and the combination of text and audio. For our experiments we collected data from popular funny sitcoms, where we annotated the punchlines through the laugh track. We show that our CRF is effective in exploiting both speech and language, but bag-of-ngram features are not a good semantic representation of each utterance. We therefore replace the CRF with a deep learning framework. In order to get a fine-grained representation of meaning and prosody, two Convolutional Neural Network encoders model each sentence over pretrained word-embedding vectors and frame-based low-level acoustic features. A Long Short-Term Memory is then used to capture the discourse context. We managed in this way to achieve significant improvements over the CRF-based model and other baselines.
We then propose the first ever conversational humor generation model, to produce punchlines in response to short input dialogues. Our aim is to overcome the limitations of state-of-the-art template-based models, which generate canned jokes only, and end-to-end dialogue generation, whose outputs are not necessarily funny. We enhance a sequence-to-sequence model with reinforcement learning, by defining and maximizing both a funniness and a relevance reward function, the former obtained from our proposed humor recognition model, and the latter through adversarial learning. By training our model on funny sitcom data, and applying human evaluation, we obtained funnier utterances than those produced by standard end-to-end and transfer learning baselines.