Stochastic Zeroth Order Descent with Structured Directions

IRIS

We introduce and analyze Structured Stochastic Zeroth order Descent (S-SZD), a finite difference approach that approximates a stochastic gradient on a set of l ≤ d orthogonal directions, where d is the dimension of the ambient space. These directions are randomly chosen and may change at each step. For smooth convex functions we prove almost sure convergence of the iterates and a convergence rate on the function values of the form O((d/l)k −c ) for every c < 1/2, which is arbitrarily close to the one of Stochastic Gradient Descent (SGD) in terms of number of iterations [29]. Our bound shows the benefits of using l multiple directions instead of one. For non-convex functions satisfying the Polyak-Łojasiewicz condition, we establish the first convergence rates for stochastic structured zeroth order algorithms under such an assumption. We corroborate our theoretical findings with numerical simulations where the assumptions are satisfied and on the real-world problem of hyper-parameter optimization in machine learning, achieving competitive practical performance.

Stochastic Zeroth Order Descent with Structured Directions

Rando, M.;Molinari, C.;Villa, S.;Rosasco, L.

2024-01-01

Abstract

We introduce and analyze Structured Stochastic Zeroth order Descent (S-SZD), a finite difference approach that approximates a stochastic gradient on a set of l ≤ d orthogonal directions, where d is the dimension of the ambient space. These directions are randomly chosen and may change at each step. For smooth convex functions we prove almost sure convergence of the iterates and a convergence rate on the function values of the form O((d/l)k −c ) for every c < 1/2, which is arbitrarily close to the one of Stochastic Gradient Descent (SGD) in terms of number of iterations [29]. Our bound shows the benefits of using l multiple directions instead of one. For non-convex functions satisfying the Polyak-Łojasiewicz condition, we establish the first convergence rates for stochastic structured zeroth order algorithms under such an assumption. We corroborate our theoretical findings with numerical simulations where the assumptions are satisfied and on the real-world problem of hyper-parameter optimization in machine learning, achieving competitive practical performance.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2024

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1254221

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

1

0

social impact