{Semi-supervised Deep Reinforcement Learning in Support of IoT and Smart City Services
View Researcher's Other CodesDisclaimer: The provided code links for this paper are external links. Science Nest has no responsibility for the accuracy, legality or content of these links. Also, by downloading this code(s), you agree to comply with the terms of use as set out by the author(s) of the code(s).
Please contact us in case of a broken link from here
Authors | Mehdi Mohammadi, Ala Al-Fuqaha, Mohsen Guizani, Jun-Seok Oh |
Journal/Conference Name | IEEE Internet of Things Journal |
Paper Category | Computer Networks and Communications |
Paper Abstract | Smart services are an important element of the smart cities and the Internet of Things (IoT) ecosystems where the intelligence behind the services is obtained and improved through the sensory data. Providing a large amount of training data is not always feasible; therefore, we need to consider alternative ways that incorporate unlabeled data as well. In recent years, deep reinforcement learning (DRL) has gained great success in several application domains. It is an applicable method for IoT and smart city scenarios where auto-generated data can be partially labeled by users' feedback for training purposes. In this paper, we propose a semisupervised DRL model that fits smart city applications as it consumes both labeled and unlabeled data to improve the performance and accuracy of the learning agent. The model utilizes variational autoencoders as the inference engine for generalizing optimal policies. To the best of our knowledge, the proposed model is the first investigation that extends DRL to the semisupervised paradigm. As a case study of smart city applications, we focus on smart buildings and apply the proposed model to the problem of indoor localization based on Bluetooth low energy signal strength. Indoor localization is the main component of smart city services since people spend significant time in indoor environments. Our model learns the best action policies that lead to a close estimation of the target locations with an improvement of 23% in terms of distance to the target and at least 67% more received rewards compared to the supervised DRL model. |
Date of publication | 2017 |
Code Programming Language | Python |
Comment |