Find Paper, Faster
Example:10.1021/acsami.1c06204 or Chem. Rev., 2007, 107, 2411-2502
Generating Personalized Summaries of Day Long Egocentric Videos.
IEEE Transactions on Pattern Analysis and Machine Intelligence  (IF16.389),  Pub Date : 2021-10-06, DOI: 10.1109/tpami.2021.3118077
Pravin Nagar,Anuj Rathore,C V Jawahar,Chetan Arora

The popularity of egocentric cameras and their always-on nature has lead to the abundance of day long first-person videos. The highly redundant nature of these videos and extreme camera-shakes make them difficult to watch from beginning to end. These videos require efficient summarization tools for consumption. However, traditional summarization techniques developed for static surveillance videos or highly curated sports videos and movies are either not suitable or simply do not scale for such hours long videos in the wild. On the other hand, specialized summarization techniques developed for egocentric videos limit their focus to important objects and people. This paper presents a novel unsupervised reinforcement learning framework to summarize egocentric videos both in terms of length and the content. The proposed framework facilitates incorporating various prior preferences such as faces, places, or scene diversity and interactive user choice in terms of including or excluding the particular type of content. This approach can also be adapted to generate summaries of various lengths, making it possible to view even 1-minute summaries of one's entire day. When using the facial saliency-based reward, we show that our approach generates summaries focusing on social interactions, similar to the current state-of-the-art (SOTA). The quantitative comparisons on the benchmark Disney dataset show that our method achieves significant improvement in Relaxed F-Score (RFS) (29.60 compared to 19.21 from SOTA), BLEU score (0.68 compared to 0.67 from SOTA), Average Human Ranking (AHR), and unique events covered. Finally, we show that our technique can be applied to summarize traditional, short, hand-held videos as well, where we improve the SOTA F-score on benchmark SumMe and TVSum datasets from 41.4 to 46.40 and 57.6 to 58.3 respectively. We also provide a Pytorch implementation and a web demo at