Forward Sampling

sample_corpus draws length-fixed yields from the chart's length-conditional distribution \(p(s \mid \text{length} = L, \mathbf{w}) = Z(s; \mathbf{w}) / \sum_{s' \text{ of length } L} Z(s'; \mathbf{w})\). The implementation enumerates every length-\(L\) sequence over the deduction's surface vocabulary, evaluates \(\log Z\) exactly via the chart, softmaxes the log-weights, and draws a multinomial. The procedure is exact (no MCMC over derivations); the \(|V|^L\) enumeration cost is the fundamental cost of forward sampling from a globally-normalised chart-defined distribution.

sample

Forward sampling of yields from a weighted deduction system.

sample_corpus draws length-fixed token sequences from the length-conditional distribution induced by the chart's weights:

.. math::

p(s \mid \text{length} = L,\, \mathbf{w})
\;=\; \frac{Z(s; \mathbf{w})}
             {\sum_{s' \text{ of length } L} Z(s'; \mathbf{w})}.

It enumerates every length-:math:L sequence over the deduction's surface vocabulary, evaluates :math:\log Z(s; \mathbf{w}) for each via the chart, softmaxes the log-weights, and draws a multinomial. The procedure is exact (the chart already marginalises over the derivation forest); the :math:|V|^L enumeration cost is the fundamental cost of forward sampling from a globally-normalised chart-defined distribution.

sample_corpus

sample_corpus(ded: DeductionSystem, *, length: int, n_samples: int, seed: int | None = None) -> list[list[str]]

Sample n_samples yields of length length from the chart's length-conditional distribution under the deduction's current parameters.

PARAMETER DESCRIPTION
ded

The deduction with materialised parameters.

TYPE: DeductionSystem

length

Length of yields to enumerate.

TYPE: int

n_samples

Number of sentences to draw.

TYPE: int

seed

Seed for the multinomial draws.

TYPE: int DEFAULT: None

Source code in src/quivers/stochastic/deduction/sample.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def sample_corpus(
    ded: DeductionSystem,
    *,
    length: int,
    n_samples: int,
    seed: int | None = None,
) -> list[list[str]]:
    """Sample ``n_samples`` yields of length ``length`` from the
    chart's length-conditional distribution under the deduction's
    current parameters.

    Parameters
    ----------
    ded : DeductionSystem
        The deduction with materialised parameters.
    length : int
        Length of yields to enumerate.
    n_samples : int
        Number of sentences to draw.
    seed : int, optional
        Seed for the multinomial draws.
    """
    vocab = _vocabulary(ded)
    if not vocab:
        raise ValueError(
            "sample_corpus: cannot determine the deduction's "
            "vocabulary; set ``ded._vocabulary`` explicitly or "
            "call ``materialise_parameters`` first"
        )

    yields: list[list[str]] = []
    log_weights: list[torch.Tensor] = []
    for combo in itertools.product(vocab, repeat=length):
        chart = ded(list(combo))
        w = chart.goal_weight()
        if torch.isfinite(w):
            yields.append(list(combo))
            log_weights.append(w)
    if not yields:
        raise ValueError(
            f"sample_corpus: no yield of length {length} parses under "
            f"the deduction's current parameters"
        )
    logw = torch.stack([w.detach() for w in log_weights])
    probs = torch.softmax(logw, dim=0)
    gen = torch.Generator()
    if seed is not None:
        gen.manual_seed(seed)
    idxs = torch.multinomial(probs, n_samples, replacement=True, generator=gen)
    return [yields[int(i.item())] for i in idxs]