08-10-2024
The Center of Mathematics and Applications (NOVA Math), promote the Seminar of Operations Research with the title: “Enhancing the Efficiency and Stability of Deep Neural Network Training through Controlled Mini-batch Algorithms”. Corrado Coppola (Sapienza University of Rome, Italy) is the speaker.
Abstract: The exponential growth of trainable parameters in state-of-the-art deep neural networks (DNNs), driven by innovations such as self-attention layers and over-parameterization, has led to the development of models containing billions or even trillions of parameters. As training datasets grow larger and tasks become more complex, the current challenge lies in balancing convergence guarantees with the increasing need for efficient training. In this work, we focus on supervised deep learning, where the training problem is formulated as the unconstrained minimization of a smooth, potentially non-convex objective function with respect to network weights.
We propose an approach based on Incremental Gradient (IG) and Random Reshuffling (RR) algorithms, enhanced with derivative-free extrapolation line-search procedures. Specifically, we present the Controlled Mini-batch Algorithm (CMA), proposed in [1], which incorporates sufficient decrease conditions for the objective function and allows for line-search procedures to ensure convergence, without assuming any further hypotheses on the search direction. We also present computational results on large-scale regression problems. We further introduce CMA Light, proposed in [2], an enhanced variant of CMA with convergence guarantees within the IG framework. Using an approximation of the real objective function to verify sufficient decrease, CMA Light drastically reduces the number of function evaluations needed and achieves notable performance gains. We discuss computational results both against CMA and against state-of-the-art optimizers for neural networks, showing a significant advantage of CMA Light in large-scale classification tasks using residual convolutional networks.
Finally, we present the Fast-Controlled Mini-batch Algorithm (F-CMA), extending the convergence theory of CMA Light to the case where samples are reshuffled at each epoch. We develop a new line-search procedure, and demonstrate F-CMA's superior performance when training ultra-deep architectures, such as transformers SwinB and SwinT with up to 130 millions of trainable parameters. Our results show significant advantages in both stability and generalization compared to state-of-the-art deep learning optimizers.
Wednesday, 16 October 2024, from 16:20 to 17:20.