Convergence of BFGS for convex over-parameterized problems_问答_开发者

Convergence of BFGS for convex over-parameterized problems

开发者 https://www.devze.com 2022-12-20 13:36 出处：网络

It is \"well开发者_Python百科-known\" that the BFGS optimization algorithm is superlinearly convergent for strictly convex problems, but is there any analysis for problems that are non-strictly convex

It is "well开发者_Python百科-known" that the BFGS optimization algorithm is superlinearly convergent for strictly convex problems, but is there any analysis for problems that are non-strictly convex. For example, suppose that f(x) is convex for some scalar x. Then, suppose we optimize over g(x1,x2)=f(x1+x2). Will this still always be superlinearly convergent?

Whether BFGS converges at all on non-convex problems is still an open problem. In fact, in 1984 Powell gave a counter-example that shows that BFGS with an inexact line-search search may fail to converge. What can be made are local statements, such as: Given a local minima x*, if you eventually enter a region of space near x* BFGS will converge super-linearly. The reason for this is that near x*, the objective function can be accurately modelled by a convex quadratic.

As for what is known for the composition function you gave, I am not sure. For a detailed explanation of the properties of BFGS, see either Dennis and Schnabel or Nocedal and Wright.

Best of luck.

Correct me if I'm wrong, but won't the "solution" in this case actually be a line, not a single point? If x' is a minimizer for f(x), then the best you can hope for when applying any method to g(x1, x2) is for it to converge to the line x2 = x' - x1.

In practice I have found that a carefully written algorithm will converge, but not necessarily super-linearly. Roundoff error is the culprit. Convergence criteria come into play. It is the same for functions that are "almost" not convex, i.e. "stiff."

One must be careful with the BFGS updates to assure that the resulting approximate Hessian remains positive-definite "enough" even though the theoretical Hessian is not. What I do is to keep and update a Cholesky decomposition of the Hessian, rather than the Hessian per se or its inverse.