Bayesian Wrap¶
nuts_program_from_deduction lifts the deduction's learnable
log-weights into a MonadicProgram whose joint log-density is
\(-\tfrac{1}{2\sigma^2}\lVert \mathbf{w} \rVert^2
+ \sum_n \log Z(s_n; \mathbf{w})\), ready for
MCMC.
The sampler targets exactly that joint with a deterministic log-density and exact gradients. Whether the joint is the Bayesian posterior \(p(\mathbf{w} \mid S)\) depends on the modelling reading (CRF / globally normalised vs. PCFG / locally normalised); see the module docstring for the precise statement and the cancellation condition.
bayes
¶
Bayesian posterior wrapping for weighted deduction systems.
nuts_program_from_deduction lifts the deduction's
learnable log-weights into a
quivers.continuous.programs.MonadicProgram whose
log_joint is
:math:-\tfrac{1}{2\sigma^2}\lVert \mathbf{w} \rVert^2
+ \sum_n \log Z(s_n; \mathbf{w}). The resulting program is
ready for quivers.inference.MCMC with
quivers.inference.NUTSKernel.
Modelling note
The sampler targets exactly
:math:\pi(\mathbf{w}) \propto \exp(-\lVert \mathbf{w}
\rVert^2/(2\sigma^2) + \sum_n \log Z(s_n; \mathbf{w})) with a
deterministic log-density and exact gradients. Whether that joint
is the Bayesian posterior :math:p(\mathbf{w} \mid S) depends
on the modelling reading:
- Undirected / globally-normalised (CRF / log-linear /
energy-based): :math:
\pi(\mathbf{w})is the posterior; the implementation is exact. - Directed / locally-normalised PCFG: the true sentence
likelihood is :math:
Z(s; \mathbf{w}) / \sum_{s'} Z(s'; \mathbf{w}); the global normaliser depends on :math:\mathbf{w}and is intractable. The sampler then targets a pseudo-posterior differing from the true posterior by a factor of :math:\bigl(\sum_{s'} Z(s'; \mathbf{w})\bigr)^{-N}. Users committed to this reading should constrain rule weights to local simplices via a Dirichlet + softmax surface rather than the free-parameter Normal lift this function provides.
nuts_program_from_deduction
¶
nuts_program_from_deduction(ded: DeductionSystem, corpus: Sequence[Sequence[str]], *, prior_scale: float = 1.0, site_prefix: str = 'log_w') -> tuple[MonadicProgram, Tensor, dict[str, Tensor]]
Lift a deduction system's learnable parameters to a
MonadicProgram suitable for NUTS / SVI.
The returned program has one
torch.distributions.Normal sample site per learnable
parameter (lexicon entries and rule bindings alike) plus one
score step that substitutes the sampled values into the
deduction's parameter slots and adds
:math:\sum_n \log Z(s_n; \mathbf{w}) to the joint.
| PARAMETER | DESCRIPTION |
|---|---|
ded
|
Deduction whose parameters are lifted.
TYPE:
|
corpus
|
Corpus the score step closes over.
TYPE:
|
prior_scale
|
Standard deviation of the Normal prior on each parameter.
TYPE:
|
site_prefix
|
Stem of each sample-site's name (the parameter's path is appended for round-trip mapping).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
(model, x, observations)
|
The lifted program plus a |
Source code in src/quivers/stochastic/deduction/bayes.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | |