whitehead2018incorporating

Spencer Whitehead, Heng Ji, Mohit Bansal, Shih-Fu Chang, Clare Voss. Incorporating Background Knowledge into Video Description Generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Pages 3992-4001, 2018.

Download [help]

Download paper: Adobe portable document (pdf)

Copyright notice:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract

Most previous efforts toward video captioning focus on generating generic descriptions, such as, “A man is talking.” We collect a news video dataset to generate enriched descriptions that include important background knowledge, such as named entities and related events, which allows the user to fully understand the video content. We develop an approach that uses video meta-data to retrieve topically related news documents for a video and extracts the events and named entities from these documents. Then, given the video as well as the extracted events and entities, we generate a description using a Knowledge-aware Video Description network. The model learns to incorporate entities found in the topically related documents into the description via an entity pointer network and the generation procedure is guided by the event and entity types from the topically related documents through a knowledge gate, which is a gating mechanism added to the model’s decoder that takes a one-hot vector of these types. We evaluate our approach on the new dataset of news videos we have collected, establishing the first benchmark for this dataset as well as proposing a new metric to evaluate these descriptions

Contact

Shih-Fu Chang

BibTex Reference

@InProceedings{whitehead2018incorporating,
   Author = {Whitehead, Spencer and Ji, Heng and Bansal, Mohit and Chang, Shih-Fu and Voss, Clare},
   Title = {Incorporating Background Knowledge into Video Description Generation},
   BookTitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
   Pages = {3992--4001},
   Publisher = {Association for Computational Linguistics},
   Year = {2018}
}

EndNote Reference [help]

Get EndNote Reference (.ref)

For problems or questions regarding this web site contact The Web Master.