A parallel solution for the Hessian matrix in neural network optimisation: the case of the mixture of expert model
Date & time
Second-order optimisation methods such as the Newton method have been known for their fast convergence. However, the high computational cost required for calculating the Hessian matrix and its inverse has hindered the use of the Newton method in neural network optimisation. The recent literature has shown that with a simple reordering of variables and equations, the Jacobian matrix of computable general equilibrium (CGE) models can be transformed into special forms that allow for an efficient parallel solution for large-scale CGE models. This paper shows that the Hessian matrix of certain ‘mixture of expert’ models can also be reordered into doubly-bordered block diagonal form and can be solved efficiently in both distributed and shared memory environments. The seminar will provide illustrative numerical applications, including for the S&P/ASX 200 stock index.
Updated: 29 March 2023/Responsible Officer: Crawford Engagement/Page Contact: CAP Web Team