I am attempting to implement automatic differentiation for a Python statistics package (the problem formulation is similar to optimization problem formulations).
The computational graph is generated using operator overloading and factory functions for operations like sum(), exp(), etc. I have implemented automatic differentiation for the gradient using reverse accumulation. However, I have found implementing automatic differentiation for the second derivative (the Hessian) much more difficult. I know how to do the individua开发者_C百科l 2nd partial gradient calculations, but I have had trouble coming up with an intelligent way to traverse the graph and do the accumulations. Does anyone know of good articles that give algorithms for automatic differentiation for the second derivative or open source libraries that implement the same that I may try to learn from?
First you must decide if you want o calculate a sparse Hessian or something closer to a fully dense Hessian.
If sparse is what you want, there are currently two competitive ways of doing this. Only using the computational graph in a clever way, one reverse sweep of the computational graph you can calculate the Hessian matrix using the edge_pushing algorithm:
http://www.tandfonline.com/doi/full/10.1080/10556788.2011.580098
Or you can try graph colouring techniques to compact your Hessian matrix into a matrix of fewer columns, then use reverse accumulation to calculate each column
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.66.2603
If what you want is a dense Hessian (unusual in practice) then your probably better of calculating one column of the Hessian at a time using reverse accumulation (search for BRUCE CHRISTIANSON and reverse accumulation)
The usual method for approximating the Hessian in 3 dimensions is the BFGS
The L-BFGS method is similar.
Here you can find the source code for the L-BFGS (which calculates the Hessian as an intermediate result for solving ODEs) in several languages (C#, C++, VBA, etc.) although not in python. I think it is not easy to translate.
If you are going to translate the alg from another language, pay particular attention to numerical errors and do sensitivity analysis (you'll need to calculate the inverse of the Hessian matrix)
精彩评论