Add Wolfe line search to Laplace approximation #3250

SteveBronder · 2025-10-24T21:04:55Z

Summary

This PR makes the following changes for the laplace approximation:

Adds a wolfe line search to the Newton solver used in the laplace approximation to improve convergence.

The example code provided from Laplace Bug when passing Eigen::Map in tuple of functor arguments #3205 fails on develop. The issue arose that the initial value of 0 for theta started the model in the tail of the distribution. The quick line search we did which only tested half of a newton step was not robust enough for this model to reach convergance. This PR adds a full wolfe line search to the Newton solver used in the laplace approximation to improve convergence in such cases.
The graphic below shows the difference in estimates of the log likelihood for laplace relative to integrate_1d on the roach test data plotted along the mu and sigma estimates. There is still a bias relative to integrate_1d as mu becomes negative and sigma becomes larger, but it is much nicer than before.

The main loop for laplace_marginal_density_est is expensive as it requires calculating either a diagonal hessian or block diagonal hessian with 2nd order autodiff. The wolfe line search only requires the gradients of the likelihood with respect to theta. So with that in mind the wolfe line search tries pretty aggressively get the best step size. If our initial step size is successful, we try to keep doubling until we hit a step size where the strong wolfe conditions fail and then return the information for the step right before that failure. If our initial step size does not satisfy strong wolfe then we do a bracketed zoom with cubic interpolation until till we find a step size that satisfies the strong wolfe conditions.
Tests for the wolfe line search are added to test/unit/math/laplace/wolfe_line_search.hpp.

Fixes bugs in the laplace approximation

Fix iteration mismatch between values when line search succeeds
In the last iteration of the laplace approximation we were returning the negative block diagonal hessian and derived matrices from the previous search. This is fine if the line search in that last step failed. But if the line search succeeds then we need to go back and recalculate the negative block diagonal hessian and it's derived quantities.
Breakup diagonal and block hessian functions
Previously we had one block_hessian function that calculated both the block hessian or the diagonal hessian at runtime. But this function is only used in places where we know at compile time whether we want a block or diagonal hessian. So I split out the two functions to avoid unnecessary runtime branching.
barzilai_borwein_step_size
For an initial step size estimate before each line search we use the Barzilai-Borwein method to get an estimate.
Adjoints of ll args only calculated once
Previously we calculated them eargerly in each laplace iteration. But they are not needed within the inner loop so we wait till we finish the inner search then calculate their adjoints once afterwards.
Calculate covariance once at the start and reuse throughout.
We were calculating the covariance matrix from inside of laplace_density_est, but this required us to then return it from that function and imo looked weird. So I pulled it out and now laplace_marginal_density_est is passed the covariance matrix.

Fixes numerical stability in laplace distributions
There were a few places where we could use log_sum_exp etc. so I made those changes.
Fixes "bug" in finite difference step size calculation

Changed from cube root of epsilon to epsilon^(1/7) for 6th order
The finite difference method in Stan was previously using stepsize optimzied a 2nd order method. But the code is a 6th order method. I modified finite_diff_stepsize to use epsilon^(1/7) instead of cbrt(epsilon). With this change all of the laplace tests pass with a much higher tolerance for precision.

Tests

All the AD tests now have a tighter tolerance for the laplace approximation.
There are also tests for the wolfe line search in test/unit/math/laplace/wolfe_line_search.hpp.

./runTests.py test/unit/math/laplace

Release notes

Improve laplace approximation with wolfe line search and bug fixes.

Checklist

Copyright holder: Steve Bronder

The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
- unit tests pass (to run, use: ./runTests.py test/unit)
- header checks pass, (make test-headers)
- dependencies checks pass, (make test-math-dependencies)
- docs build, (make doxygen)
- code passes the built in C++ standards checks (make cpplint)
the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested

…ch method

…line-search

… the lambdas

…lues for W, B, etc. are used

…order stepsize

stan-buildbot · 2025-10-29T22:20:08Z

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
arma/arma.stan	0.29	0.28	1.05	5.1% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.01	0.01	1.06	5.26% faster
gp_regr/gen_gp_data.stan	0.02	0.02	1.06	5.53% faster
gp_regr/gp_regr.stan	0.09	0.09	1.03	3.2% faster
sir/sir.stan	69.4	65.99	1.05	4.91% faster
irt_2pl/irt_2pl.stan	4.16	3.96	1.05	4.97% faster
eight_schools/eight_schools.stan	0.06	0.05	1.06	5.47% faster
pkpd/sim_one_comp_mm_elim_abs.stan	0.25	0.24	1.05	4.65% faster
pkpd/one_comp_mm_elim_abs.stan	19.57	18.65	1.05	4.7% faster
garch/garch.stan	0.41	0.4	1.02	1.75% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan	2.66	2.62	1.01	1.39% faster
arK/arK.stan	1.74	1.72	1.01	1.44% faster
gp_pois_regr/gp_pois_regr.stan	2.94	2.74	1.07	6.76% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	8.82	8.36	1.06	5.29% faster
performance.compilation	178.94	178.61	1.0	0.19% faster
Mean result: 1.0424922541231578

Jenkins Console Log
Blue Ocean
Commit hash: 6c17a9d27ed815deba4bbbed2bb3f6f637fac2b0

Machine information

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping: 4
CPU MHz: 2400.000
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 1.3 MiB
L1i cache: 1.3 MiB
L2 cache: 40 MiB
L3 cache: 55 MiB
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS; IBPB conditional; STIBP conditional; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities

G++:
g++ (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Clang:
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

SteveBronder · 2025-10-30T14:51:58Z

I guess that speaks to my last point about having performance tests.

Yes I'll throw in a little json logger on a side branch so we can get some performance numbers over all the tests. Aki's example will fail if we turn off the line search. So I think we should leave it on and allow it to try a few iterations. But yeah we can hop on a call and discuss the line search strategy.

SteveBronder · 2025-10-30T14:54:15Z

If we fail to converge after N iterations should we throw a hard error or return back what we have so far with a warning?

We should reject the current metropolis proposal.

What is the intuition with that? My thought process is that if someone asks for 1e-12 and the optimizer gets down to 1e-10, should we really throw out the whole result? I feel like just telling users "Hey we were only able to get down to x tolerance when you asked for y tolerance, the algorithm may not have converged." would suffice. That would also let users do more debugging imo

charlesm93 · 2025-10-31T18:09:47Z

Aki's example will fail if we turn off the line search. So I think we should leave it on and allow it to try a few iterations.

I think it's fine to have unit tests which fail with the default control parameters, as long as we get useful error messages, and we can get the unit tests to pass with non-default control parameters. The whole point of giving users control over tuning parameters is that sometimes the defaults don't cut it.

But yeah we can hop on a call and discuss the line search strategy.

Sure thing!

What is the intuition with that?

You raise a valid point and I'm open to the idea of issuing a warning message.

The argument for rejecting the proposal is that the user decides what an acceptable tolerance is for the solver---if the solver doesn't achieve that tolerance, then we might be concerned that the marginal likelihood is poorly approximated, say because the chain wondered into a pathological region of the parameter space and then it is better to backtrack.

Still, I like the idea of a warning message. It's then up to the user to check the quality of the inference directly, rather than relying on the quality of the numerical methods.

(Note that issuing a warning message would be inconsistent with what we've done with other numerical methods, like the newton_solver.)

SteveBronder · 2025-10-31T22:06:06Z

I wrote this branch that has a full json logger inside of it. There is a gist here with R code for making some graphs out of the json data (it's below the failure logs). NOTE: the test TEST(WriteArrayBodySimple, ExecutesBodyWithHardcodedData) can generate a json file that is 10+ GB in size. You can comment out the laplace stuff in that section if you want to.

Aki's example will fail if we turn off the line search. So I think we should leave it on and allow it to try a few iterations.

I think it's fine to have unit tests which fail with the default control parameters, as long as we get useful error messages, and we can get the unit tests to pass with non-default control parameters. The whole point of giving users control over tuning parameters is that sometimes the defaults don't cut it.

Sorry, by fail I mean the gradients and value that we return are completely wrong for the roach data without the wolfe line search. For example here is the output below that shows this when line search is set to zero.

./test/unit/math/expect_near_rel.hpp:36: Failure
The difference between x1 and x2 is 731.29171829436268, which exceeds tol_val, where
x1 evaluates to -741.32265363802378,
x2 evaluates to -10.03093534366114, and
tol_val evaluates to 187.83839724542122.
expect_near_rel_finite in: for (i) = (261), laplace and integrated results should be close for laplace_val vs integrated_val

./test/unit/math/expect_near_rel.hpp:36: Failure
The difference between x1 and x2 is 9.2560165465474817, which exceeds tol_val, where
x1 evaluates to -13.534972738479031,
x2 evaluates to -4.2789561919315489, and
tol_val evaluates to 4.4534822326026449.
expect_near_rel_finite in: for (i) = (262), laplace and integrated results should be close for laplace_val vs integrated_val

./test/unit/math/expect_near_rel.hpp:36: Failure
The difference between x1 and x2 is 6329.2815766619124, which exceeds tol_val, where
x1 evaluates to -7204.8762038114164,
x2 evaluates to -875.59462714950428, and
tol_val evaluates to 121.20706246441381.
expect_near_rel_finite in: sum laplace vs integrated sum for laplace_sum vs integrated_sum

I'm going to think for a little bit to see if we can't find a happy medium that does a little line search.

This graph shows the number of evaluations inside of the wolfe line search for the roach data over multiple runs of the data. Each line is a separate run.

So we need a bunch of evaluations in the beginning, middle, and end. Maybe I can just revert back to trying a full newton step, then if we pass the Wolfe conditions with one full step we keep going and otherwise we fallback to the wolfe line search.

I tried being too cutesy with this graph, but this is the initial stepsize and final stepsize for each of the laplace iterations over all of the runs of the roach data.

While there is too much going on here, you can see that we need to take a teeny tiny first step, but then after that we can get away with step sizes that are pretty big!

A few other graphs and notes

This shows the amount of time spent doing the wolfe line search for each of the tests in laplace_marginal_lpdf_test.cpp

This graph shows the runtime for each run of the tests given we either do a full newton step or use the line search.

So sometimes Wolfe is not that much worse. However, in the same graph below for the motorcycle data wolfe is way way slower!

But a full newton step here also fails the AD test suite by about 3e-5. So I think there is something we can do where, if a full newton step seems hairy we can fallback to wolfe, but otherwise just accept the full newton step and keep going. My intuition was that I thought the gradient for theta would be way cheaper than the diagonal hessian calculation. So in my mind it made sense to do more wolfe steps and get a better step relative to running the outside loop. But that seems to not always be the case so I need to rethink the search strategy.

This is what the arrow graph above is supposed to look like and I think it is more clear what is going on relative to the one above. At iteration 0 we start with a step size of 1 and end up at a step size of 2. Then at iteration 1 we jump back down from a stepsize of 2 to a stepsize of 1.

I need to go into the AD testing framework and add logging for if a test passes or fails with newton or wolfe (and by how much). It requires diving into the code for the general AD test suite we use so I didn't get around to it yet.

What is the intuition with that?

You raise a valid point and I'm open to the idea of issuing a warning message.

The argument for rejecting the proposal is that the user decides what an acceptable tolerance is for the solver---if the solver doesn't achieve that tolerance, then we might be concerned that the marginal likelihood is poorly approximated, say because the chain wondered into a pathological region of the parameter space and then it is better to backtrack.

Still, I like the idea of a warning message. It's then up to the user to check the quality of the inference directly, rather than relying on the quality of the numerical methods.

(Note that issuing a warning message would be inconsistent with what we've done with other numerical methods, like the newton_solver.)

Agree it would be nonstandard relative to our other solver. Though for LBFGS if we go through line search without hitting the tolerances we still report back values. If we can craft a nice and clear message for this then I think it would be nice to issue a warning.

SteveBronder · 2025-11-04T17:01:14Z

Two other things I was thinking about yesterday.

If solver 1 with a negative diagonal hessian fails because of an unrecoverable error like values along the diagonal being less than 0, could we have a backup where we try with a block diagonal Hessian of 2 or 3? Or switch to solver 2 or 3? We would have to have new parameters starting_block_size, max_block_size, starting_solver, and max_solver. But that would let us say, "If solve starting_solver with block diagonal hessian of size starting_block_size fails, then fallback to a Y+i:max_block_sizesized block diagonal hessian. And if that fails then fall back to solverX+j:max_solver`

SteveBronder · 2025-11-12T22:11:47Z

@charlesm93 just to update. Here is a plot of the time taken for the gp_motorcyle_2 test when running the wolfe line search vs taking a full newton step. Because newton can overshoot it can sometimes end up taking longer.

Here is the same graph but for the disease map test using the poisson_log

Newton is only shown on the graph for solver 2 and block size of 1 because it failed the tests using the other settings.

In the bernoulli_logit_dim_500 test we can see newton going much must faster than wolfe though

Napkin math, some of those newton runs are between 30%-50% faster than the wolfe version

The other issue is that taking no steps often leads to the tests failing. By failing I mean either failing the finite difference tests for gradient checks or catastrophically getting the wrong values and gradients.

So imo the safer thing for users here is to keep wolfe on as the default with a max line search of 250 or so.

I'm going to update this branch with a two things that should be ready for review by Friday.

Fixing NaN checks in the objective or gradient direction for step sizes. This is so that large step sizes that cause NaNs do not cause the program to fail
Sending a warning message when we go past the maximum number of iterations

…for fall through if solver throws an error

stan-buildbot · 2025-11-18T08:16:42Z

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
gp_regr/gp_regr.stan	0.09	0.09	1.01	1.0% faster
gp_regr/gen_gp_data.stan	0.02	0.02	1.0	0.04% faster
arK/arK.stan	1.73	1.71	1.02	1.5% faster
eight_schools/eight_schools.stan	0.05	0.05	1.03	3.08% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	8.44	8.39	1.01	0.65% faster
pkpd/one_comp_mm_elim_abs.stan	19.04	18.66	1.02	1.99% faster
pkpd/sim_one_comp_mm_elim_abs.stan	0.25	0.24	1.03	2.98% faster
sir/sir.stan	68.47	66.21	1.03	3.29% faster
gp_pois_regr/gp_pois_regr.stan	2.77	2.74	1.01	1.03% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan	2.6	2.57	1.01	1.12% faster
irt_2pl/irt_2pl.stan	3.92	3.86	1.02	1.71% faster
arma/arma.stan	0.28	0.27	1.04	3.99% faster
garch/garch.stan	0.4	0.39	1.02	1.55% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.01	0.01	1.0	0.02% faster
performance.compilation	215.68	221.77	0.97	-2.83% slower
Mean result: 1.0145429184887218

Jenkins Console Log
Blue Ocean
Commit hash: 6c17a9d27ed815deba4bbbed2bb3f6f637fac2b0

Machine information

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping: 4
CPU MHz: 2400.000
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 1.3 MiB
L1i cache: 1.3 MiB
L2 cache: 40 MiB
L3 cache: 55 MiB
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS; IBPB conditional; STIBP conditional; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities

G++:
g++ (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Clang:
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

stan-buildbot · 2025-12-04T00:23:14Z

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
gp_regr/gp_regr.stan	0.1	0.09	1.05	4.51% faster
gp_regr/gen_gp_data.stan	0.02	0.02	1.04	3.88% faster
arK/arK.stan	1.82	1.71	1.06	5.83% faster
eight_schools/eight_schools.stan	0.06	0.05	1.06	5.67% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	8.83	8.29	1.07	6.17% faster
pkpd/one_comp_mm_elim_abs.stan	19.91	18.89	1.05	5.11% faster
pkpd/sim_one_comp_mm_elim_abs.stan	0.25	0.24	1.07	6.19% faster
sir/sir.stan	71.46	66.5	1.07	6.94% faster
gp_pois_regr/gp_pois_regr.stan	2.88	2.74	1.05	4.81% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan	2.75	2.58	1.07	6.17% faster
irt_2pl/irt_2pl.stan	4.17	3.9	1.07	6.32% faster
arma/arma.stan	0.29	0.27	1.08	7.26% faster
garch/garch.stan	0.42	0.4	1.06	5.72% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.01	0.01	1.09	8.05% faster
performance.compilation	230.75	218.44	1.06	5.33% faster
Mean result: 1.0624338159050637

Jenkins Console Log
Blue Ocean
Commit hash: 48da668c3e597af398bdcb6c140bf8004adb10ad

Machine information

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping: 4
CPU MHz: 2400.000
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 1.3 MiB
L1i cache: 1.3 MiB
L2 cache: 40 MiB
L3 cache: 55 MiB
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS; IBPB conditional; STIBP conditional; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities

G++:
g++ (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Clang:
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

charlesm93 · 2025-12-09T16:35:56Z

@SteveBronder you mentioned I need to check the math on this. Is there any particular file or document you want me to look at?

SteveBronder · 2025-12-11T16:54:31Z

@charlesm93 the main thing I'd like you to look at is my wolfe impl. I would just like a spot check that how I'm doing that is reasonable.

Also if you lookover the laplace_marginal_density_est code. There I think I found a bug where, on the last iteration, if we accept our newest position we need to update the W etc matrices we return with the newest point. I think that is correct but want to double check.

@avehtari below is a script for pulling down a fresh command stan and building it with this branch. If you can try this out it would be appreciated!

git clone --recursive [email protected]:stan-dev/cmdstan.git
cd ./cmdstan/stan/lib/stan_math
git checkout fix/wolfe-zoom1
cd ../../../
make -j4 build

WardBrian · 2025-12-16T16:13:50Z

stan/math/prim/fun/finite_diff_stepsize.hpp

Two questions:

is this change actually relevant to the PR? It doesn't look like it's called anywhere in laplace directly

the stencil in the couple places that the code does call it is always 2, so maybe it could be simpler?

Yeah this can be reverted. Technically I think we are using the wrong finite diff stepsize for the order of finite differences we use, but now I don't need to fix this in this PR and we can do this after some discussion in a separate PR

WardBrian · 2025-12-16T16:20:13Z

stan/math/mix/functor/laplace_marginal_density.hpp

+    return -covariance * step.a() + covariance * step.theta_grad();
+  };
+  auto update_step = [&covariance, &obj_fun, &theta_grad_f, &grad_fun](
+                         auto& step_info, auto&& /* curr */, auto&& prev,


Should I find it odd that curr is unused? Why have the argument at all?

WardBrian · 2025-12-16T16:20:59Z

stan/math/mix/functor/laplace_marginal_density.hpp

+  try {
+    if (options.solver == 1) {
+      if (options.hessian_block_size == 1) {
+        //   std::cout << "Solver: 1Diag" << std::endl;


Suggested change

// std::cout << "Solver: 1Diag" << std::endl;

There are a few other commented cout calls as well

WardBrian · 2025-12-16T16:27:51Z

test/unit/math/prim/fun/inv_Phi_test.cpp

There are a few test files that changed and I can't immediately tell if the actual values are different or just the code layout -- are these relevant to the PR? (this one and mdivide_left/mdivide_right)

Thanks. Let me look this over. This is from a previous change I need to remove in this PR.

SteveBronder · 2025-12-17T21:12:37Z

test/unit/math/prim/fun/mdivide_left_test.cpp


  stan::math::matrix_d I = Eigen::MatrixXd::Identity(2, 2);
-  EXPECT_MATRIX_FLOAT_EQ(I, stan::math::mdivide_left(Ad, Ad));
+  EXPECT_MATRIX_NEAR(I, stan::math::mdivide_left(Ad, Ad), 1e-15);


This one is because of local failures on my desktop

stan-buildbot · 2025-12-18T07:29:25Z

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
gp_regr/gp_regr.stan	0.09	0.09	1.01	0.76% faster
gp_regr/gen_gp_data.stan	0.02	0.02	1.0	-0.47% slower
arK/arK.stan	1.74	1.74	1.0	0.14% faster
eight_schools/eight_schools.stan	0.05	0.05	1.02	2.18% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	8.47	8.49	1.0	-0.25% slower
pkpd/one_comp_mm_elim_abs.stan	18.72	18.7	1.0	0.13% faster
pkpd/sim_one_comp_mm_elim_abs.stan	0.24	0.24	1.0	0.06% faster
sir/sir.stan	68.43	67.59	1.01	1.23% faster
gp_pois_regr/gp_pois_regr.stan	2.78	2.69	1.03	3.32% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan	2.61	2.58	1.01	1.07% faster
irt_2pl/irt_2pl.stan	4.07	3.98	1.02	2.11% faster
arma/arma.stan	0.28	0.28	1.01	0.72% faster
garch/garch.stan	0.4	0.4	1.0	-0.2% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.01	0.01	1.0	-0.08% slower
performance.compilation	223.93	223.79	1.0	0.06% faster
Mean result: 1.0073405040148515

Jenkins Console Log
Blue Ocean
Commit hash: bfe460350fe8aacd6b972c41b0ab127a2b25e646

Machine information

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping: 4
CPU MHz: 3441.484
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 1.3 MiB
L1i cache: 1.3 MiB
L2 cache: 40 MiB
L3 cache: 55 MiB
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS; IBPB conditional; STIBP conditional; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Vmscape: Mitigation; IBPB before exit to userspace
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities

G++:
g++ (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Clang:
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

SteveBronder added 30 commits September 29, 2025 18:03

update the laplace line search to use a more advanced wolfe line sear…

43c3eef

…ch method

Merge remote-tracking branch 'origin/develop' into fix/laplace-wolfe-…

74a92bc

…line-search

add data for roach data test

2de7a4f

update tests

547288e

move wolfe to its own file

cb5e282

Merge remote-tracking branch 'origin' into fix/laplace-wolfe-line-search

cdaf700

update wolfe

6b22c85

Merge remote-tracking branch 'origin' into fix/laplace-wolfe-line-search

a1d0906

update to use barzilai borwein step size as initial step size estimate

5b6ffff

seperate moto from other lpdf tests

8eff766

update

f542cc5

add WolfeInfo

c845944

use WolfeInfo for extra data

40f1243

put everything for iterations in laplace into structs

6e528d2

update poisson test

d89eeb5

add swap functions

40d889f

cleanup laplace_density_est to reduce repeated code

59b7a2f

update to search for a good initial alpha on a space

c73f5aa

fix code for wolfe line search

773d417

update tests for zoom

b557dad

all tests pass for laplace with new wolfe

b18bf87

use log sum of diagonal of U matrix for solver 3 determinant

98df588

move update_step to be a user passed function

929dd47

cleanup the laplace code to remove some passed by reference values to…

2ebb01a

… the lambdas

cleanup the laplace code to remove some passed by reference values to…

3bbcef3

… the lambdas

update WolfeData with member accessors and use Eval within WolfeData

66ffec9

update docs for wolfe

ff5bee4

update logic in laplace_marginal_desntiy_est so that final updated va…

7a7415a

…lues for W, B, etc. are used

clang format

973144a

change stepsize of finite difference to use 6th order instead of 2nd …

cc5d49a

…order stepsize

SteveBronder and others added 2 commits November 17, 2025 17:56

handle NA values for obj and grad. Allow for zero line search. allow …

04b5b2e

…for fall through if solver throws an error

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

4117a31

SteveBronder and others added 10 commits December 1, 2025 11:26

update step iter

affabfa

cleanup zoom in wolfe line search

a143355

Merge commit '85c147ee6adbe58eb9ae1578c0478fcf3da9bf76' into HEAD

95c21d5

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

5ba7426

update poisson_log_lpmf test to use google test parameterization

475c632

add throw testing in neg_binomial_log_summary

307fb0c

breakout the laplace tests so they print nicely

5038198

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

7e9af37

fix cpplint

863223e

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

04197f2

WardBrian linked an issue Dec 4, 2025 that may be closed by this pull request

Laplace Bug when passing Eigen::Map in tuple of functor arguments #3205

Open

WardBrian reviewed Dec 16, 2025

View reviewed changes

SteveBronder commented Dec 17, 2025

View reviewed changes

SteveBronder and others added 3 commits December 17, 2025 16:12

address partial review comments

c8a1613

Merge commit 'a5f80224b857e06dd7ca753d826e5b292ee8e73c' into HEAD

aeb1662

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

ea9ffe0

Uh oh!

Add Wolfe line search to Laplace approximation #3250

Are you sure you want to change the base?

Add Wolfe line search to Laplace approximation #3250

Conversation

SteveBronder commented Oct 24, 2025

Summary

Tests

Release notes

Checklist

Uh oh!

stan-buildbot commented Oct 29, 2025

Uh oh!

SteveBronder commented Oct 30, 2025

Uh oh!

SteveBronder commented Oct 30, 2025

Uh oh!

charlesm93 commented Oct 31, 2025

Uh oh!

SteveBronder commented Oct 31, 2025

Uh oh!

SteveBronder commented Nov 4, 2025

Uh oh!

SteveBronder commented Nov 12, 2025

Uh oh!

stan-buildbot commented Nov 18, 2025

Uh oh!

stan-buildbot commented Dec 4, 2025

Uh oh!

charlesm93 commented Dec 9, 2025

Uh oh!

SteveBronder commented Dec 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stan-buildbot commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants