Di Lu, Spencer Whitehead, Lifu Huang, Heng Ji, Shih-Fu Chang. Entity-aware Image Caption Generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Pages 4013-4023, 2018.

Current image captioning approaches generate descriptions which lack specific information, such as named entities that are involved in the images. In this paper we propose a new task which aims to generate informative image captions, given images and hashtags as input. We propose a simple but effective approach to tackle this problem. We first train a convolutional neural networks - long short term memory networks (CNN-LSTM) model to generate a template caption based on the input image. Then we use a knowledge graph based collective inference algorithm to fill in the template with specific named entities retrieved via the hashtags. Experiments on a new benchmark dataset collected from Flickr show that our model generates news-style image descriptions with much richer information. Our model outperforms unimodal baselines significantly with various evaluation metrics


Shih-Fu Chang

BibTex Reference

   Author = {Lu, Di and Whitehead, Spencer and Huang, Lifu and Ji, Heng and Chang, Shih-Fu},
   Title = {Entity-aware Image Caption Generation},
   BookTitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
   Pages = {4013--4023},
   Publisher = {Association for Computational Linguistics},
   Year = {2018}

