Point-Estimate Fitting¶

adam_fit_deduction maximises the corpus log-marginal \(\sum_n \log Z(s_n; \mathbf{w})\) under an optional isotropic Normal prior on the deduction's learnable log-weights. Each \(\log Z\) is computed exactly by the chart's LogProb-semiring fixed point; autograd through the agenda's semiring operations gives the exact gradient \(\nabla_\mathbf{w} \log Z(s; \mathbf{w}) = \mathbb{E}_{d \mid s}[\phi(d)]\) (the standard inside-outside identity).

fit ¶

Point-estimate fitting of a weighted deduction system.

adam_fit_deduction runs gradient descent on the deduction's learnable log-weights to maximise the corpus log-marginal :math:\sum_n \log Z(s_n; \mathbf{w}), optionally under an isotropic Normal prior (MAP). Each :math:\log Z(s; \mathbf{w}) is computed exactly by the chart's LogProb-semiring fixed point; autograd through the agenda's semiring operations yields the exact gradient :math:\nabla_{\mathbf{w}} \log Z(s; \mathbf{w}) = \mathbb{E}_{d \mid s}[\phi(d)] (the standard inside-outside identity).

adam_fit_deduction ¶

adam_fit_deduction(ded: DeductionSystem, corpus: Sequence[Sequence[str]], *, steps: int = 300, lr: float = 0.05, prior_scale: float | None = None) -> list[float]

Maximise the corpus log-marginal under an optional Normal prior on the parameters.

PARAMETER	DESCRIPTION
`ded`	Deduction whose `_axiom_module` and `_rule_module` parameters are optimised. TYPE: `DeductionSystem`
`corpus`	Each sentence is a sequence of token strings the deduction's axiom injector accepts. TYPE: `sequence of sentences`
`steps`	Adam steps. TYPE: `int` DEFAULT: `300`
`lr`	Adam learning rate. TYPE: `float` DEFAULT: `0.05`
`prior_scale`	If supplied, adds a Gaussian regulariser :math:`\tfrac{1}{2\sigma^2}\lVert \mathbf{w} \rVert^2` to the loss (MAP). Defaults to `None` (MLE). TYPE: `float` DEFAULT: `None`

RETURNS	DESCRIPTION
`list[float]`	The loss trajectory; length == `steps`.

Source code in src/quivers/stochastic/deduction/fit.py

def adam_fit_deduction(
    ded: DeductionSystem,
    corpus: Sequence[Sequence[str]],
    *,
    steps: int = 300,
    lr: float = 5e-2,
    prior_scale: float | None = None,
) -> list[float]:
    """Maximise the corpus log-marginal under an optional Normal
    prior on the parameters.

    Parameters
    ----------
    ded : DeductionSystem
        Deduction whose ``_axiom_module`` and ``_rule_module``
        parameters are optimised.
    corpus : sequence of sentences
        Each sentence is a sequence of token strings the
        deduction's axiom injector accepts.
    steps : int
        Adam steps.
    lr : float
        Adam learning rate.
    prior_scale : float, optional
        If supplied, adds a Gaussian regulariser
        :math:`\\tfrac{1}{2\\sigma^2}\\lVert \\mathbf{w} \\rVert^2`
        to the loss (MAP). Defaults to ``None`` (MLE).

    Returns
    -------
    list[float]
        The loss trajectory; length == ``steps``.
    """
    materialise_parameters(ded, corpus)
    params = list(ded.parameters())
    if not params:
        return []
    optim = torch.optim.Adam(params, lr=lr)
    history: list[float] = []
    for _ in range(steps):
        optim.zero_grad()
        log_z = torch.zeros(())
        for sentence in corpus:
            log_z = log_z + ded(list(sentence)).goal_weight()
        loss = -log_z
        if prior_scale is not None:
            inv_var = 1.0 / (prior_scale**2)
            for p in params:
                loss = loss + 0.5 * inv_var * (p**2).sum()
        loss.backward()
        optim.step()
        history.append(float(loss.detach()))
    return history