Inference benchmark results¶

This page reports how each posterior inference algorithm shipped in quivers.inference recovers known posterior moments on a deterministic suite of synthetic problems. The grid is regenerated by tests/benchmarks/runner.py from the seeded data factories and analytical references in tests/benchmarks/.

What the suite tests¶

Every benchmark is an (algorithm, problem) cell. A problem fixes:

A generative model written in QVR and loaded from tests/benchmarks/models/*.qvr.
A deterministic data generator (fixed torch.manual_seed) that produces the observations the model is conditioned on.
A reference posterior moment for one latent site, computed analytically (conjugate problems), by quadrature on a dense grid (constrained-support problems), or by a long cached NUTS run (Eight Schools).
A scalar metric (almost always \(|\mathbb{E}_q[\cdot] - \mathbb{E}_{\text{ref}}[\cdot]|\)) and a tolerance.

A cell runs the algorithm on the problem, draws posterior samples for the target site, and compares the recovered moment against the reference.

Throughput is reported as SVI iterations per second for the variational guides and as posterior draws per second (summed across chains) for the MCMC kernels.

Cell statuses¶

PASS: recovered moment is within tolerance of the reference.
FAIL: algorithm runs cleanly but the moment is outside tolerance.
ERROR: algorithm raised during execution (NaN gradient, support-boundary explosion, divergent trajectory, etc.).
capture problems invert the convention: PASS means the metric exceeds the tolerance, confirming a documented failure mode.

Determinism: every cell calls torch.manual_seed(0) before constructing the problem, so the same (algorithm, problem) pair reproduces across runs given fixed PyTorch and NumPy versions.

Algorithms¶

All algorithms are evaluated on every problem. Hyperparameters are uniform across problems so that the grid measures the algorithms, not a per-problem tuning effort.

Algorithm	Family	Key hyperparameters
`AutoNormal`	Mean-field SVI, factorised diagonal Normal in unconstrained space	Adam, lr=0.05, 800 steps (1500 for positive-support sites), 1500 posterior draws
`AutoMVN`	Full-covariance SVI, single MVN in unconstrained space	Adam, lr=0.05, 800 steps (1500 for positive-support sites), `init_scale=0.3`, 1500 draws
`AutoLaplace`	MAP plus a Gaussian centred at the mode with Hessian covariance	Adam, lr=0.05, 500 steps, 1500 draws
`HMC`	Hamiltonian Monte Carlo with fixed integrator length	`step_size=0.1` (adapted), `num_steps=10`, diagonal mass matrix (adapted), 200 warmup, 400 samples, 2 chains
`NUTS`	No-U-Turn HMC	`target_accept=0.8`, `max_tree_depth=8`, diagonal mass matrix, 200 warmup, 400 samples, 2 chains

Variational guides operate in unconstrained space via the bijector attached to each latent's support, so positive-support and bounded-support sites are exercised through exp / softplus / sigmoid transforms rather than through constrained Gaussian families.

Tier 1: conjugate posteriors¶

Five textbook problems with closed-form posteriors. They establish a floor: every algorithm should match the analytical moment to within a tight tolerance.

Beta-Bernoulli¶

Model. Conjugate Beta prior on a Bernoulli rate:

\[ \theta \sim \mathrm{Beta}(2, 2), \qquad y_i \mid \theta \sim \mathrm{Bernoulli}(\theta), \quad i = 1, \dots, 50. \]

Data. \(N = 50\) Bernoulli draws at \(\theta^\star = 0.7\).

Reference. Conjugacy gives \(\theta \mid y \sim \mathrm{Beta}\bigl(\alpha_0 + \sum_i y_i,\ \beta_0 + N - \sum_i y_i\bigr)\) with closed-form mean \(\alpha / (\alpha + \beta)\).

Metric. |E[theta]_q - E[theta]_true|, tolerance 0.05.

Normal-Normal¶

Model. Conjugate Normal prior on a Normal mean with known variance:

\[ \mu \sim \mathcal{N}(0, 1), \qquad y_i \mid \mu \sim \mathcal{N}(\mu, 1), \quad i = 1, \dots, 30. \]

Data. \(N = 30\) Normal draws at \(\mu^\star = 1.5\), \(\sigma = 1\).

Reference. Posterior precision \(\tau_N = \tau_0 + N / \sigma^2\) gives a Normal posterior with mean \((\tau_0 \mu_0 + N \bar{y} / \sigma^2) / \tau_N\).

Metric. |E[mu]_q - E[mu]_true|, tolerance 0.15.

Normal-Inverse-Gamma¶

Model. Joint conjugate prior on unknown mean and variance:

\[ \sigma^2 \sim \mathrm{InverseGamma}(3, 2), \qquad \mu \mid \sigma^2 \sim \mathcal{N}(0, \sigma), \qquad y_i \mid \mu, \sigma^2 \sim \mathcal{N}(\mu, \sigma), \quad i = 1, \dots, 60. \]

Data. \(N = 60\) Normal draws at \(\mu^\star = 0.3\), \(\sigma^{2\star} = 1.5\).

Reference. NIG posterior updates (Murphy 2007 §5) give marginal mean \(\mu_N = (\kappa_0 \mu_0 + N \bar{y}) / (\kappa_0 + N)\).

Stress test for guides handling two latents with mixed supports: the unconstrained \(\mu\) and the positive \(\sigma^2\) (whose bijector is \(\exp\) / softplus).

Metric. |E[mu]_q - E[mu]_true|, tolerance 0.2.

Gamma-Exponential¶

Model. Conjugate Gamma prior on an Exponential rate:

\[ r \sim \mathrm{Gamma}(2, 1), \qquad y_i \mid r \sim \mathrm{Exponential}(r), \quad i = 1, \dots, 80. \]

Data. \(N = 80\) Exponential draws at \(r^\star = 2\).

Reference. \(r \mid y \sim \mathrm{Gamma}\bigl(a_0 + N,\ b_0 + \sum_i y_i\bigr)\), with mean \(a / b\).

Metric. |E[rate]_q - E[rate]_true|, tolerance 0.3.

Bayesian linear regression¶

Model. Two-parameter linear regression with iid standard-Normal design and known observation noise:

\[ a, b \sim \mathcal{N}(0, 1), \qquad x_i \sim \mathcal{N}(0, 1), \qquad y_i \mid a, b \sim \mathcal{N}(a + b x_i, \sigma), \quad i = 1, \dots, 60, \]

with \(\sigma = 0.3\), \(a^\star = 0.7\), \(b^\star = -0.5\).

Reference. Closed-form Gaussian posterior with precision \(I + X^\top X / \sigma^2\) and mean \(\Sigma X^\top y / \sigma^2\).

Metric. |E[a]_q - E[a]_true|, tolerance 0.1.

Results¶

Posterior accuracy (metric / tolerance):

Problem	AutoNormal	AutoMVN	AutoLaplace	HMC	NUTS
Beta-Bernoulli	PASS `0.0398 / 0.05`	PASS `0.0402 / 0.05`	PASS `0.00926 / 0.05`	PASS `0.000594 / 0.05`	PASS `0.00157 / 0.05`
Normal-Normal	PASS `0.123 / 0.15`	PASS `0.124 / 0.15`	PASS `4.77e-07 / 0.15`	PASS `0.000607 / 0.15`	PASS `0.0225 / 0.15`
Normal-Inverse-Gamma	PASS `0.0345 / 0.2`	PASS `0.0298 / 0.2`	PASS `5.96e-08 / 0.2`	PASS `0.00715 / 0.2`	PASS `0.0068 / 0.2`
Gamma-Exponential	PASS `0.0513 / 0.3`	PASS `0.057 / 0.3`	PASS `0.0249 / 0.3`	PASS `0.00806 / 0.3`	PASS `0.0163 / 0.3`
Bayesian linear regression	PASS `0.0113 / 0.1`	PASS `0.00401 / 0.1`	PASS `1.79e-07 / 0.1`	PASS `0.000134 / 0.1`	PASS `0.00127 / 0.1`

Throughput (iters/s for SVI, draws/s for MCMC):

Problem	AutoNormal	AutoMVN	AutoLaplace	HMC	NUTS
Beta-Bernoulli	1149.7	868.0	1863.5	114.2	167.9
Normal-Normal	2013.3	1395.7	3451.5	236.8	256.7
Normal-Inverse-Gamma	883.0	571.8	1574.8	102.6	51.0
Gamma-Exponential	1803.4	1153.3	3276.4	233.1	180.0
Bayesian linear regression	1222.2	710.6	2305.4	163.5	40.8

Tier 2: hierarchical posteriors¶

The Eight Schools problem (Rubin 1981) in both parameterisations. Tests how each algorithm handles the funnel geometry that arises when a group-level scale tau shrinks toward zero.

Eight Schools (centered)¶

Model.

\[ \mu \sim \mathcal{N}(0, 10), \qquad \tau \sim \mathrm{HalfCauchy}(5), \qquad \theta_j \mid \mu, \tau \sim \mathcal{N}(\mu, \tau), \qquad y_j \mid \theta_j \sim \mathcal{N}(\theta_j, 12), \]

for \(j = 1, \dots, 8\) on the canonical Rubin (1981) effect sizes \(y = (28, 8, -3, 7, -1, 1, 18, 12)\).

Reference. Cached NUTS moments (4 chains, 5000 post-warmup draws): \(\mathbb{E}[\mu] \approx 5.4\), posterior standard deviation \(\approx 4\).

Tolerance is set at three reference standard deviations: a loose target reflecting how hard the funnel geometry is for VI.

Metric. |E[mu]_q - mu_ref|, tolerance 12.

Eight Schools (non-centered)¶

Model. Same priors as the centered model, with the group-level draws reparameterised:

\[ \eta_j \sim \mathcal{N}(0, 1), \qquad \theta_j = \mu + \tau \eta_j, \]

decoupling \(\tau\) from \(\theta_j\) and eliminating the funnel in the prior.

Reference. Same cached NUTS moments as the centered model.

Tolerance is tightened to two reference standard deviations: the reparam should pay off.

Metric. |E[mu]_q - mu_ref|, tolerance 8.

Results¶

Posterior accuracy (metric / tolerance):

Problem	AutoNormal	AutoMVN	AutoLaplace	HMC	NUTS
Eight Schools (centered)	PASS `5.4 / 12`	PASS `5.51 / 12`	PASS `5.4 / 12`	PASS `4.38 / 12`	PASS `5.75 / 12`
Eight Schools (non-centered)	PASS `0.891 / 8`	PASS `1.06 / 8`	PASS `2.01 / 8`	PASS `1.74 / 8`	PASS `1.45 / 8`

Throughput (iters/s for SVI, draws/s for MCMC):

Problem	AutoNormal	AutoMVN	AutoLaplace	HMC	NUTS
Eight Schools (centered)	776.5	559.8	1498.0	113.0	28.0
Eight Schools (non-centered)	729.2	572.4	1448.6	110.3	34.1

Tier 3: hard posterior geometry¶

Problems chosen to expose specific failure modes of mean-field VI and of HMC under poor preconditioning.

Correlated regression¶

Model. Linear regression as in Tier 1, but with a near-constant design:

\[ a, b \sim \mathcal{N}(0, 1), \qquad x_i = \rho + (1 - \rho) z_i, \quad z_i \sim \mathcal{N}(0, 1), \qquad y_i \mid a, b \sim \mathcal{N}(a + b x_i, 0.5), \]

with \(\rho = 0.95\) and \(N = 50\).

Reference. Closed-form Gaussian posterior with off-diagonal correlation \(\rho \approx 0.95+\).

The mean-field guide ignores this correlation; the first-moment metric below still passes (the documented underfit lives in the second moment).

Metric. |E[a]_q - E[a]_true|, tolerance 0.2.

Neal's funnel (under-estimation capture) (capture)¶

Model. Neal's funnel:

\[ v \sim \mathcal{N}(0, 3), \qquad x_i \mid v \sim \mathcal{N}(0, e^{v / 2}), \quad i = 1, \dots, 9. \]

Data. Condition on \(x_i = 0\) (inference target is \(p(v \mid x = 0)\)).

Reference. The log-likelihood is linear in \(v\): \(\log p(x_i = 0 \mid v) = -\tfrac{1}{2}\log(2\pi) - v / 2\), so the conditional posterior is Gaussian with mean \(-9 N / 2 = -40.5\) and variance \(9\) at \(N = 9\). The joint posterior over \((v, x)\) remains funnel-shaped; only the conditional given \(x = 0\) is tractable.

Capture semantics. All five algorithms under-estimate the magnitude of \(v\). PASS means the metric exceeds the tolerance, confirming the documented underfit.

Metric. |E[v]_q - E[v]_true|, tolerance 20.25.

Ill-conditioned product Gaussian¶

Model. Five-dimensional product Gaussian with five orders of magnitude of prior scale and a fixed observation noise:

\[ x_d \sim \mathcal{N}(0, \sigma_d^{\text{prior}}), \qquad y_d \mid x_d \sim \mathcal{N}(x_d, 0.1), \qquad d = 1, \dots, 5, \]

with \(\sigma^{\text{prior}} = (100, 10, 1, 0.1, 0.01)\).

Reference. Per-dimension Gaussian: \(x_d \mid y_d \sim \mathcal{N}\bigl(y_d / (1 + (0.1 / \sigma_d)^2),\ (1 / \sigma_d^2 + 1 / 0.01)^{-1}\bigr)\).

Tracks the middle scale \(x_3\), where the diagonal mass matrix is roughly correct but the gradient signal is dwarfed by the larger-scale dimensions.

Metric. |E[x_3]_q - E[x_3]_true|, tolerance 0.3.

Results¶

Posterior accuracy (metric / tolerance):

Problem	AutoNormal	AutoMVN	AutoLaplace	HMC	NUTS
Correlated regression	PASS `0.0365 / 0.2`	PASS `0.0308 / 0.2`	PASS `7.41e-05 / 0.2`	PASS `0.0512 / 0.2`	PASS `0.0569 / 0.2`
Neal's funnel (under-estimation capture) (capture)	PASS `40.6 / 20.2`	PASS `40.6 / 20.2`	PASS `40.5 / 20.2`	PASS `42 / 20.2`	PASS `40.9 / 20.2`
Ill-conditioned product Gaussian	PASS `0.00643 / 0.3`	PASS `0.00333 / 0.3`	PASS `7.15e-07 / 0.3`	PASS `0.086 / 0.3`	PASS `0.0424 / 0.3`

Throughput (iters/s for SVI, draws/s for MCMC):

Problem	AutoNormal	AutoMVN	AutoLaplace	HMC	NUTS
Correlated regression	1150.7	681.6	2314.7	167.3	67.1
Neal's funnel (under-estimation capture) (capture)	1795.3	1279.6	2807.7	281.0	1027.6
Ill-conditioned product Gaussian	482.0	461.1	876.7	68.3	54.5

Tier 6: constrained-support stress¶

Latents on a half-line or in a bounded interval. References come from dense-grid quadrature; variational guides must traverse a non-linear bijector to reach the constrained scale.

HalfNormal scale¶

Model.

\[ \sigma \sim \mathrm{HalfNormal}(2), \qquad y_i \mid \sigma \sim \mathcal{N}(0, \sigma), \quad i = 1, \dots, 80. \]

Reference. No conjugate form. Integrate

\[ p(\sigma \mid y) \propto \exp(-\sigma^2 / 8) \cdot \sigma^{-N} \cdot \exp\bigl(-\tfrac{1}{2 \sigma^2} \sum_i y_i^2\bigr) \]

on a 4096-point grid in \([0.05, 6]\) for the reference moments.

Metric. |E[sigma]_q - E[sigma]_true|, tolerance 0.15.

TruncatedNormal recovery¶

Model.

\[ \mu \sim \mathrm{Uniform}(0, 1), \qquad y_i \mid \mu \sim \mathrm{TruncatedNormal}(\mu, 0.2, 0, 1), \quad i = 1, \dots, 60. \]

Reference. Evaluate the truncated-Normal log-likelihood on a 4096-point \(\mu\)-grid in \((0, 1)\) with stable log-CDF differences for the truncation constant; normalise for the posterior moments.

Metric. |E[mu]_q - E[mu]_true|, tolerance 0.05.

Results¶

Posterior accuracy (metric / tolerance):

Problem	AutoNormal	AutoMVN	AutoLaplace	HMC	NUTS
HalfNormal scale	PASS `0.0265 / 0.15`	PASS `0.0458 / 0.15`	PASS `0.0247 / 0.15`	PASS `0.00324 / 0.15`	PASS `0.0124 / 0.15`
TruncatedNormal recovery	PASS `0.0318 / 0.05`	PASS `0.0309 / 0.05`	PASS `0.000249 / 0.05`	PASS `0.00095 / 0.05`	PASS `0.00257 / 0.05`

Throughput (iters/s for SVI, draws/s for MCMC):

Problem	AutoNormal	AutoMVN	AutoLaplace	HMC	NUTS
HalfNormal scale	1561.4	1045.9	2725.6	190.0	90.7
TruncatedNormal recovery	1355.7	1038.6	2152.0	145.2	105.1

Reproducing the grid¶

python -m tests.benchmarks.runner

The runner accepts --algorithms and --problems flags for partial runs and writes the regenerated table back to this file by default. See tests/benchmarks/runner.py for the cell definitions and tests/benchmarks/references.py for the reference posteriors.