Point-Estimate Fitting

adam_fit_deduction maximises the corpus log-marginal \(\sum_n \log Z(s_n; \mathbf{w})\) under an optional isotropic Normal prior on the deduction's learnable log-weights. Each \(\log Z\) is computed exactly by the chart's LogProb-semiring fixed point; autograd through the agenda's semiring operations gives the exact gradient \(\nabla_\mathbf{w} \log Z(s; \mathbf{w}) = \mathbb{E}_{d \mid s}[\phi(d)]\) (the standard inside-outside identity).

fit

Point-estimate fitting of a weighted deduction system.

adam_fit_deduction runs gradient descent on the deduction's learnable log-weights to maximise the corpus log-marginal :math:\sum_n \log Z(s_n; \mathbf{w}), optionally under an isotropic Normal prior (MAP). Each :math:\log Z(s; \mathbf{w}) is computed exactly by the chart's LogProb-semiring fixed point; autograd through the agenda's semiring operations yields the exact gradient :math:\nabla_{\mathbf{w}} \log Z(s; \mathbf{w}) = \mathbb{E}_{d \mid s}[\phi(d)] (the standard inside-outside identity).

adam_fit_deduction

adam_fit_deduction(ded: DeductionSystem, corpus: Sequence[Sequence[str]], *, steps: int = 300, lr: float = 0.05, prior_scale: float | None = None) -> list[float]

Maximise the corpus log-marginal under an optional Normal prior on the parameters.

PARAMETER DESCRIPTION
ded

Deduction whose _axiom_module and _rule_module parameters are optimised.

TYPE: DeductionSystem

corpus

Each sentence is a sequence of token strings the deduction's axiom injector accepts.

TYPE: sequence of sentences

steps

Adam steps.

TYPE: int DEFAULT: 300

lr

Adam learning rate.

TYPE: float DEFAULT: 0.05

prior_scale

If supplied, adds a Gaussian regulariser :math:\tfrac{1}{2\sigma^2}\lVert \mathbf{w} \rVert^2 to the loss (MAP). Defaults to None (MLE).

TYPE: float DEFAULT: None

RETURNS DESCRIPTION
list[float]

The loss trajectory; length == steps.

Source code in src/quivers/stochastic/deduction/fit.py
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
def adam_fit_deduction(
    ded: DeductionSystem,
    corpus: Sequence[Sequence[str]],
    *,
    steps: int = 300,
    lr: float = 5e-2,
    prior_scale: float | None = None,
) -> list[float]:
    """Maximise the corpus log-marginal under an optional Normal
    prior on the parameters.

    Parameters
    ----------
    ded : DeductionSystem
        Deduction whose ``_axiom_module`` and ``_rule_module``
        parameters are optimised.
    corpus : sequence of sentences
        Each sentence is a sequence of token strings the
        deduction's axiom injector accepts.
    steps : int
        Adam steps.
    lr : float
        Adam learning rate.
    prior_scale : float, optional
        If supplied, adds a Gaussian regulariser
        :math:`\\tfrac{1}{2\\sigma^2}\\lVert \\mathbf{w} \\rVert^2`
        to the loss (MAP). Defaults to ``None`` (MLE).

    Returns
    -------
    list[float]
        The loss trajectory; length == ``steps``.
    """
    materialise_parameters(ded, corpus)
    params = list(ded.parameters())
    if not params:
        return []
    optim = torch.optim.Adam(params, lr=lr)
    history: list[float] = []
    for _ in range(steps):
        optim.zero_grad()
        log_z = torch.zeros(())
        for sentence in corpus:
            log_z = log_z + ded(list(sentence)).goal_weight()
        loss = -log_z
        if prior_scale is not None:
            inv_var = 1.0 / (prior_scale**2)
            for p in params:
                loss = loss + 0.5 * inv_var * (p**2).sum()
        loss.backward()
        optim.step()
        history.append(float(loss.detach()))
    return history