Cloud-assisted multiview video summarization using CNN and bidirectional LSTM

View Researcher's Other Codes

Disclaimer: The provided code links for this paper are external links. Science Nest has no responsibility for the accuracy, legality or content of these links. Also, by downloading this code(s), you agree to comply with the terms of use as set out by the author(s) of the code(s).

Please contact us in case of a broken link from here

Authors Tanveer Hussain, Khan Muhammad, Amin Ullah, Zehong Cao, S. Baik, V. H. C. de Albuquerque
Journal/Conference Name I
Paper Category
Paper Abstract IEEE.org IEEE Xplore IEEE-SA IEEE Spectrum More Sites SUBSCRIBE Cart Create Account Personal Sign In Browse My Settings Help Institutional Sign In All Books Conferences Courses Journals & Magazines Standards Authors Citations ADVANCED SEARCH Journals & Magazines >IEEE Transactions on Industri... >Volume 16 Issue 1 Cloud-Assisted Multiview Video Summarization Using CNN and Bidirectional LSTM Publisher IEEE Cite This PDF Tanveer Hussain; Khan Muhammad; Amin Ullah; Zehong Cao; Sung Wook Baik; Victor Hugo C. de Albuquerque All Authors 14 Paper Citations 1238 Full Text Views Abstract Document Sections I. Introduction II. Proposed MVS Framework III. Experimental Results and Discussion IV. Conclusion Authors Figures References Citations Keywords Metrics Footnotes Abstract The massive amount of video data produced by surveillance networks in industries instigate various challenges in exploring these videos for many applications, such as video summarization (VS), analysis, indexing, and retrieval. The task of multiview video summarization (MVS) is very challenging due to the gigantic size of data, redundancy, overlapping in views, light variations, and interview correlations. To address these challenges, various low-level features and clustering-based soft computing techniques are proposed that cannot fully exploit MVS. In this article, we achieve MVS by integrating deep neural network based soft computing techniques in a two-tier framework. The first online tier performs target-appearance-based shots segmentation and stores them in a lookup table that is transmitted to cloud for further processing. The second tier extracts deep features from each frame of a sequence in the lookup table and pass them to deep bidirectional long short-term memory (DB-LSTM) to acquire probabilities of informativeness and generates a summary. Experimental evaluation on benchmark dataset and industrial surveillance data from YouTube confirms the better performance of our system compared to the state-of-the-art MVS methods. Published in IEEE Transactions on Industrial Informatics ( Volume 16, Issue 1, Jan. 2020) Page(s) 77 - 86 Date of Publication 17 July 2019 ISSN Information INSPEC Accession Number 19348323 DOI 10.1109/TII.2019.2929228 Publisher IEEE Funding Agency I. Introduction The tremendous amount of video data generated by camera networks installed at industries, offices, and public places meet the requirements of Big Data. For instance, a simple multiview network with two cameras, acquiring video from two different views with 25 frames per second (fps), generates 180 000 frames (90 000 for each camera) for an hour. The surveillance networks acquire video data for 24 hours from multiview cameras, thereby making it challenging to extract useful information from this Big Data. It requires significant effort when searching for salient information in such huge-sized 60 × 60 video data. Thus, automatic techniques are required to extract the prominent information present in videos without involving any human efforts. In the video analytics literature, there exist several key information extraction techniques such as video abstraction [1], video skimming [2], and video summarization (VS) [3]. VS techniques investigate input video for salient information and create a summary in the form of keyframes or short video clips that represent lengthy videos. The extracted keyframes assist in many applications such as action and activity recognition [4], [5], anomaly detection [6], and video retrieval [7]. Sign in to Continue Reading Authors Figures References Citations Keywords Metrics Footnotes More Like This A 3D Cross-Hemisphere Neighborhood Difference Convnet for Chronic Stroke Lesion Segmentation 2019 IEEE International Conference on Image Processing (ICIP) Published 2019 Driver Behavior Recognition via Interwoven Deep Convolutional Neural Nets With Multi-Stream Inputs IEEE Access Published 2020 Show More IEEE Personal Account CHANGE USERNAME/PASSWORD Purchase Details PAYMENT OPTIONS VIEW PURCHASED DOCUMENTS Profile Information COMMUNICATIONS PREFERENCES PROFESSION AND EDUCATION TECHNICAL INTERESTS Need Help? US & CANADA +1 800 678 4333 WORLDWIDE +1 732 981 0060 CONTACT & SUPPORT Follow About IEEE Xplore | Contact Us | Help | Accessibility | Terms of Use | Nondiscrimination Policy | IEEE Ethics Reporting | Sitemap | Privacy & Opting Out of Cookies A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2021 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions., The tremendous amount of video data generated by camera networks installed at industries, offices, and public places meet the requirements of Big Data. For instance, a simple multiview network with two cameras, acquiring video from two different views with 25 frames per second (fps), generates 180 000 frames (90 000 for each camera) for an hour. The surveillance networks acquire video data for 24 hours from multiview cameras, thereby making it challenging to extract useful information from this Big Data. It requires significant effort when searching for salient information in such huge-sized 60 × 60 video data. Thus, automatic techniques are required to extract the prominent information present in videos without involving any human efforts. In the video analytics literature, there exist several key information extraction techniques such as video abstraction [1], video skimming [2], and video summarization (VS) [3]. VS techniques investigate input video for salient information and create a summary in the form of keyframes or short video clips that represent lengthy videos. The extracted keyframes assist in many applications such as action and activity recognition [4], [5], anomaly detection [6], and video retrieval [7]., 2019 IEEE International Conference on Image Processing (ICIP), Published 2019, IEEE Access, Published 2020, About IEEE Xplore | Contact Us | Help | Accessibility | Terms of Use | Nondiscrimination Policy | IEEE Ethics Reporting | Sitemap | Privacy & Opting Out of Cookies, A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity., © Copyright 2021 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions., Abstract The massive amount of video data produced by surveillance networks in industries instigate various challenges in exploring these videos for many applications, such as video summarization (VS), analysis, indexing, and retrieval. The task of multiview video summarization (MVS) is very challenging due to the gigantic size of data, redundancy, overlapping in views, light variations, and interview correlations. To address these challenges, various low-level features and clustering-based soft computing techniques are proposed that cannot fully exploit MVS. In this article, we achieve MVS by integrating deep neural network based soft computing techniques in a two-tier framework. The first online tier performs target-appearance-based shots segmentation and stores them in a lookup table that is transmitted to cloud for further processing. The second tier extracts deep features from each frame of a sequence in the lookup table and pass them to deep bidirectional long short-term memory (DB-LSTM) to acquire probabilities of informativeness and generates a summary. Experimental evaluation on benchmark dataset and industrial surveillance data from YouTube confirms the better performance of our system compared to the state-of-the-art MVS methods., I. Introduction The tremendous amount of video data generated by camera networks installed at industries, offices, and public places meet the requirements of Big Data. For instance, a simple multiview network with two cameras, acquiring video from two different views with 25 frames per second (fps), generates 180 000 frames (90 000 for each camera) for an hour. The surveillance networks acquire video data for 24 hours from multiview cameras, thereby making it challenging to extract useful information from this Big Data. It requires significant effort when searching for salient information in such huge-sized 60 × 60 video data. Thus, automatic techniques are required to extract the prominent information present in videos without involving any human efforts. In the video analytics literature, there exist several key information extraction techniques such as video abstraction [1], video skimming [2], and video summarization (VS) [3]. VS techniques investigate input video for salient information and create a summary in the form of keyframes or short video clips that represent lengthy videos. The extracted keyframes assist in many applications such as action and activity recognition [4], [5], anomaly detection [6], and video retrieval [7]., The tremendous amount of video data generated by camera networks installed at industries, offices, and public places meet the requirements of Big Data. For instance, a simple multiview network with two cameras, acquiring video from two different views with 25 frames per second (fps), generates 180 000 frames (90 000 for each camera) for an hour. The surveillance networks acquire video data for 24 hours from multiview cameras, thereby making it challenging to extract useful information from this Big Data. It requires significant effort when searching for salient information in such huge-sized 60 × 60 video data. Thus, automatic techniques are required to extract the prominent information present in videos without involving any human efforts. In the video analytics literature, there exist several key information extraction techniques such as video abstraction [1], video skimming [2], and video summarization (VS) [3]. VS techniques investigate input video for salient information and create a summary in the form of keyframes or short video clips that represent lengthy videos. The extracted keyframes assist in many applications such as action and activity recognition [4], [5], anomaly detection [6], and video retrieval [7].
Date of publication 2020
Code Programming Language Python
Comment

Copyright Researcher 2022