Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance
View Researcher's Other CodesDisclaimer: The provided code links for this paper are external links. Science Nest has no responsibility for the accuracy, legality or content of these links. Also, by downloading this code(s), you agree to comply with the terms of use as set out by the author(s) of the code(s).
Please contact us in case of a broken link from here
Authors | Oluwasanmi Koyejo, Cong Xie, Indranil Gupta |
Journal/Conference Name | 36th International Conference on Machine Learning, ICML 2019 |
Paper Category | Artificial Intelligence |
Paper Abstract | We present Zeno, a technique to make distributed machine learning, particularly Stochastic Gradient Descent (SGD), tolerant to an arbitrary number of faulty workers. Zeno generalizes previous results that assumed a majority of non-faulty nodes; we need assume only one non-faulty worker. Our key idea is to suspect workers that are potentially defective. Since this is likely to lead to false positives, we use a ranking-based preference mechanism. We prove the convergence of SGD for non-convex problems under these scenarios. Experimental results show that Zeno outperforms existing approaches. |
Date of publication | 2018 |
Code Programming Language | Python |
Comment |