This is a past event. Its details are archived for historical purposes.
The contact information may no longer be valid.
Please visit our current events listings to look for similar events by title, location, or venue.
Thursday, April 4, 2019 at 12:15pm to 1:15pm
Phillips Hall, 233 229 Phillips Hall, 229
ABSTRACT: In this talk, I present an information-theoretic view of data science based on the study of the following fundamental questions: What is the minimal amount of information necessary to solve an assigned inference problem? Given this minimal amount of information, is it possible to design a low-complexity algorithm? What are the trade-offs between the parameters at play (e.g., dimensionality of the problem, size of the data sample, complexity)?
As a case study, I consider the inference problem of fitting data with a linear combination of a large number of simple components. This idea lies at the core of a variety of methods, most notably two-layer neural networks, and also kernel regression, sparse deconvolution and boosting. More specifically, we want to learn a concave function by using ‘bump-like’ neurons and fitting the centers of the bumps. The resulting optimization problem is highly non-convex, and little is known about convergence properties of gradient descent methods. Our main result is to show that stochastic gradient descent (SGD) converges to the global optimum at exponential rates. As a by-product of our analysis, we accurately track the dynamics of the SGD algorithm for a wide class of two-layer neural networks. To do so, we prove that, as the number of neurons goes large, the evolution of SGD converges to a gradient flow in the space of probability distributions. Furthermore, as the bump width vanishes, this gradient flow solves a viscous porous medium equation. The associated risk function exhibits a special property, known as displacement convexity, and recent breakthroughs in optimal transport theory guarantee exponential convergence for this limit dynamics. Numerical simulations demonstrate that the asymptotic theory captures well the behavior of stochastic gradient descent for moderate values of the parameters.
BIO: Marco Mondelli received the B.S. and M.S. degree in Telecommunications Engineering from the University of Pisa in 2010 and 2012, respectively. In 2016, he obtained his Ph.D. in Computer and Communication Sciences at EPFL, under the supervision of Prof. Urbanke. Since February 2017, he has been a Postdoctoral Scholar in the Department of Electrical Engineering at Stanford University, hosted by Prof. Montanari. He has also been a Research Fellow at the Simons Institute for the Theory of Computing for the program “Foundations of Data Science” held in Fall 2018. His research interests include data science, machine learning, information theory and communications. He was the recipient of a Dan David Prize Scholarship in 2015, of an Early Postdoc Mobility Fellowship from the Swiss NSF in 2016, and of the Simons-Berkeley Research Fellowship in 2018. He received the ISIT Student Paper Award in 2015, the STOC Best Paper Award in 2016, the Patrick Denantes Memorial Prize in 2017, and the EPFL Doctorate Award in 2018.