Zhao and Cheng studying model-parallelism for large-scale deep learning
Liang Zhao, Assistant Professor, Information Sciences and Technology, and Yue Cheng, Associate Professor, Computer Science, Volgenau School of Engineering, are set to receive funding from the National Science Foundation for a project for which they have several goals.
Specifically, they are working to: 1) develop new gradient-free methods for training of various types of deep neural networks (DNNs); 2) design an algorithmic and theoretical framework of model parallelization based on gradient-free optimization; 3) develop novel scheduling and load balancing techniques that can help users achieve different objectives within a multi-dimensional tradeoff space; and 4) build an efficient and easy-to-use distributed work-flow system attuned to a broad-range of model parallelism-based DNN training applications such as deep learning for large graphs and very-deep convolutional neural networks for image processing.
A generic gradient-free algorithmic network framework covering various types of DNNs’ architecture and operations will be developed along with theoretical guarantees. They are also investigating a novel model parallelism framework for DNNs. They will develop that framework under different parallelization strategies that navigate the tradeoff space of convergence properties, data independencies, and communication overheads.
Additionally, they will design and prototype a new DAG (directed acyclic graph) scheduling and management framework, specifically attuned to leverage the unique merit of their alternating optimization methods.
They will also design and implement a suite of systems-level optimization techniques, including: 1) a novel load-balancing technique that can achieve tunable, dynamic parallelism (both intra- and inter-layer parallelism), and 2) a new dynamic resource scheduling heuristic that cohesively combines serverful and serverless computing infrastructure to maximize runtime resource efficiency.
The system they develop will be evaluated in solving crucial open problems, including: 1) large-scale (e.g., billions of nodes) graph deep learning problems, and 2) very-deep convolutional neural networks for image processing problems.
The techniques devised in this project will open a new window for the development, analyses, and application of gradient-free optimization for DNNs.
The proposed algorithms framework will be open-sourced on GitHub and added to existing deep learning libraries such as TensorFlow and Keras, based on the PIs’ collaboration with industry practitioners and developers.
The proposed algorithms and systems will be ideal for motivating more profound understanding of deep learning and distributed and parallel computing, providing graduate and undergraduate students with new courses, research, and internship opportunities.
The researchers will receive $498,609 from NSF for this work. Funding will begin in October 2020 and will end in late September 2023.