MCMC¶
Markov-chain Monte Carlo: the HMC
and NUTS kernels, the
MCMC runner, and the
MCMCResult summary.
The runner targets a MonadicProgram directly. For models that
declare every latent as a sample site this is immediate; for
models with nn.Parameters or intermediate latent sites the
bayesian_lift_parameters
lift produces the matching MonadicProgram.
mcmc
¶
MCMC kernels and driver.
Public surface (also re-exported from
quivers.inference):
MCMCKernel— ABC for Markov kernels on the flat unconstrained latent vector.HMCKernel— Hamiltonian Monte Carlo with leapfrog integration, dual-averaging step-size adaptation, and Welford mass-matrix adaptation.NUTSKernel— No-U-Turn Sampler with multinomial sampling.MCMC— Chain orchestrator with warmup, parallel chains, and posterior diagnostics (split-:math:\hat R, effective sample size).MCMCResult— Posterior samples + per-chain diagnostics.
MCMC
¶
MCMC(kernel: MCMCKernel, num_warmup: int, num_samples: int, num_chains: int = 4, init_strategy: InitStrategy = 'prior')
MCMC chain runner.
| PARAMETER | DESCRIPTION |
|---|---|
kernel
|
Markov kernel (e.g.
TYPE:
|
num_warmup
|
Number of adaptation steps. The kernel's adaptation machinery (dual averaging, Welford covariance) runs over this prefix.
TYPE:
|
num_samples
|
Post-warmup samples per chain.
TYPE:
|
num_chains
|
Independent chains. Default
TYPE:
|
init_strategy
|
How to pick each chain's initial position.
TYPE:
|
Source code in src/quivers/inference/mcmc/driver.py
196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 | |
run
¶
run(model: MonadicProgram, x: Tensor, observations: dict[str, Tensor], guide: Guide | None = None) -> MCMCResult
Run the configured kernel for num_chains chains of
num_warmup + num_samples steps each.
Source code in src/quivers/inference/mcmc/driver.py
268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 | |
MCMCResult
dataclass
¶
MCMCResult(samples: dict[str, Tensor], log_densities: Tensor, acceptance_rates: Tensor, divergence_counts: Tensor, r_hat: dict[str, Tensor], ess: dict[str, Tensor], num_warmup: int, num_samples: int)
Posterior samples and per-chain diagnostics.
| ATTRIBUTE | DESCRIPTION |
|---|---|
samples |
Per-site posterior draws on the constrained support.
Shape
TYPE:
|
log_densities |
Unconstrained-space log-density (Jacobian-corrected) at
every posterior draw. Shape
TYPE:
|
acceptance_rates |
Per-chain post-warmup acceptance rate. Shape
TYPE:
|
divergence_counts |
Per-chain post-warmup divergence count. Shape
TYPE:
|
r_hat |
Per-site split-:math:
TYPE:
|
ess |
Per-site effective sample size. Same shape convention as
TYPE:
|
num_warmup |
TYPE:
|
num_samples |
TYPE:
|
HMCKernel
¶
HMCKernel(step_size: float = 0.1, num_steps: int = 10, mass_matrix: MassMatrixKind = 'identity', target_accept: float = 0.65, divergence_threshold: float = 1000.0, adapt_step_size: bool = True, adapt_mass_matrix: bool = True)
Bases: MCMCKernel
Hamiltonian Monte Carlo kernel with fixed trajectory length.
| PARAMETER | DESCRIPTION |
|---|---|
step_size
|
Leapfrog step size. Adapted during warmup when
TYPE:
|
num_steps
|
Leapfrog steps per proposal.
TYPE:
|
mass_matrix
|
Mass-matrix shape. Diagonal / dense are adapted during warmup from the empirical covariance of warmup samples.
TYPE:
|
target_accept
|
Target Metropolis acceptance for dual averaging. Default
TYPE:
|
divergence_threshold
|
Energy-error threshold for marking a proposal as divergent. Divergent steps still respect Metropolis correctness but are reported separately so the user can spot pathological regions.
TYPE:
|
Source code in src/quivers/inference/mcmc/hmc.py
211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 | |
NUTSKernel
¶
NUTSKernel(step_size: float = 0.1, max_tree_depth: int = 10, mass_matrix: MassMatrixKind = 'diagonal', target_accept: float = 0.8, divergence_threshold: float = 1000.0, adapt_step_size: bool = True, adapt_mass_matrix: bool = True)
Bases: MCMCKernel
No-U-Turn Sampler with multinomial sampling and the standard U-turn termination (Hoffman-Gelman 2014 algorithms 3 + 6, Betancourt 2017's generalized slice variant for multinomial sampling).
| PARAMETER | DESCRIPTION |
|---|---|
step_size
|
Initial leapfrog step size; adapted via dual averaging.
TYPE:
|
max_tree_depth
|
Maximum tree doubling depth. Default
TYPE:
|
target_accept
|
Target tree-averaged Metropolis acceptance for dual
averaging. Default
TYPE:
|
divergence_threshold
|
Energy-error threshold above which a leapfrog substep is marked divergent and terminates the tree on its branch.
TYPE:
|
Source code in src/quivers/inference/mcmc/hmc.py
439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 | |
KernelState
dataclass
¶
KernelState(position: Tensor, log_density: Tensor, grad_log_density: Tensor, step_count: int = 0, accept_count: int = 0, diverged: bool = False, extras: dict = dict())
Mutable container for one chain's MCMC state.
| ATTRIBUTE | DESCRIPTION |
|---|---|
position |
Current flat unconstrained latent vector. Shape
TYPE:
|
log_density |
Unconstrained-space log-density (Jacobian-corrected
log-joint) at
TYPE:
|
grad_log_density |
Gradient of
TYPE:
|
step_count |
Number of
TYPE:
|
accept_count |
Number of proposals accepted across the chain. Useful for reporting acceptance rate.
TYPE:
|
diverged |
Whether the most recent step's energy error exceeded the kernel's divergence threshold. Reset by each kernel as appropriate.
TYPE:
|
extras |
Per-kernel additional state (e.g. NUTS tree depth, HMC step-size adaptation cumulants).
TYPE:
|
MCMCKernel
¶
Bases: ABC
Abstract Markov kernel on the flat unconstrained latent vector.
Concrete subclasses implement init and step.
Adaptation phases (warmup) typically mutate kernel-internal
state (step size, mass matrix) and freeze it for the sampling
phase; the kernel's is_adapting flag tracks that.
init
abstractmethod
¶
init(registry: LatentRegistry, model: MonadicProgram, x: Tensor, observations: dict[str, Tensor], initial_position: Tensor) -> KernelState
Build the starting KernelState from the supplied
initial flat unconstrained vector. The initial gradient is
evaluated here so step can re-use it.
Source code in src/quivers/inference/mcmc/kernel.py
190 191 192 193 194 195 196 197 198 199 200 201 | |
step
abstractmethod
¶
step(state: KernelState, potential: PotentialFn) -> KernelState
Advance the chain one Metropolis step. The potential
function is constructed once per MCMC.run and
re-used across every step / chain.
Source code in src/quivers/inference/mcmc/kernel.py
203 204 205 206 207 208 209 210 211 | |
start_adaptation
¶
start_adaptation() -> None
Enter the adaptation (warmup) phase.
Source code in src/quivers/inference/mcmc/kernel.py
213 214 215 | |
stop_adaptation
¶
stop_adaptation() -> None
Freeze the kernel's adapted parameters.
Source code in src/quivers/inference/mcmc/kernel.py
217 218 219 | |
PotentialFn
¶
PotentialFn(model: MonadicProgram, registry: LatentRegistry, x: Tensor, observations: dict[str, Tensor])
Callable that maps a flat unconstrained position to the unconstrained-space negative log-density and its gradient.
HMC and NUTS need both the potential
:math:U(z) = -\log \tilde{p}(z) (where
:math:\tilde{p}(z) = p(T(z), y) \cdot |\det J_T(z)| is the
Jacobian-corrected unconstrained-space joint) and its gradient
:math:\nabla U(z). The two are computed in a single
autograd pass and cached on the kernel state.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
Generative model.
TYPE:
|
registry
|
Latent-site registry for
TYPE:
|
x
|
Model input. Shape
TYPE:
|
observations
|
Observed-site values and host data.
TYPE:
|
Source code in src/quivers/inference/mcmc/kernel.py
97 98 99 100 101 102 103 104 105 106 107 | |
log_density
¶
log_density(z: Tensor) -> Tensor
Unconstrained-space log-density (Jacobian-corrected).
Trajectories that wander to the edge of a constrained
support can produce values that fall outside
torch.distributions' validation envelope (e.g. exact
zeros against a strictly-positive support after a long
leapfrog stride). Rather than letting the resulting
ValueError propagate and kill the chain, this method
returns -inf for those positions; the kernel reads
non-finite log-densities as divergent transitions and
rejects them in the Metropolis step.
Source code in src/quivers/inference/mcmc/kernel.py
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | |
value_and_grad
¶
value_and_grad(z: Tensor) -> tuple[Tensor, Tensor]
Return (log_density, grad_log_density) for z.
z is expected to be a detached tensor; we make a fresh
leaf with requires_grad=True so gradient propagation
doesn't leak into the kernel's accumulated state.
For divergent positions (where the log-density is
-inf), returns a zero gradient — the kernel rejects
the trajectory in the Metropolis step anyway, and a zero
gradient keeps the leapfrog integrator from producing NaN
downstream.
Source code in src/quivers/inference/mcmc/kernel.py
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | |