Custom Sequent Rules¶
QVR Source¶
# Custom Rules
#
# An AB grammar (Ajdukiewicz-Bar-Hillel) plus harmonic
# composition, illustrating that every rule in a deduction block
# is a sequent declared in the DSL itself; there are no built-in
# named rule schemas.
#
# Deduction:
#
# fwd_app : X/Y, Y |- X forward application
# bwd_app : Y, X\Y |- X backward application
# fwd_comp : X/Y, Y/Z |- X/Z forward composition
# bwd_comp : Y\Z, X\Y |- X\Z backward composition
#
# Pattern variables are single-uppercase identifiers (X, Y, Z,
# I, J, K); every other identifier appearing in a rule pattern
# must be listed in the atoms block.
object Term : FinSet 16
deduction AB : Term -> Term [semiring=LogProb, start=S, depth=6]
atoms S, NP, N, VP, PP, Fwd, Bwd, span, the, dog, runs
rule fwd_app : span(I, K, Fwd(X, Y)), span(K, J, Y) |- span(I, J, X) #[learnable]
rule bwd_app : span(I, K, Y), span(K, J, Bwd(X, Y)) |- span(I, J, X) #[learnable]
rule fwd_comp : span(I, K, Fwd(X, Y)), span(K, J, Fwd(Y, Z)) |- span(I, J, Fwd(X, Z)) #[learnable]
rule bwd_comp : span(I, K, Bwd(Y, Z)), span(K, J, Bwd(X, Y)) |- span(I, J, Bwd(X, Z)) #[learnable]
lexicon
"the" : Fwd(NP, N) = the #[learnable]
"dog" : N = dog #[learnable]
"runs" : Bwd(S, NP) = runs #[learnable]
Overview¶
Every rule in a deduction { … } block is a sequent declared in the DSL itself; there are no built-in named rule schemas to import. This example defines an AB grammar (Ajdukiewicz-Bar-Hillel) plus harmonic composition over chart-spans span(I, J, X). Fwd(X, Y) is the forward-slash constructor X/Y; Bwd(X, Y) is the backward-slash constructor X\Y.
Walkthrough¶
atoms NAME, NAME, … lists every identifier the rules may match literally. Category atoms (S, NP, N, VP, PP), slash constructors (Fwd, Bwd), and the chart-item constructor (span) are atoms. Identifiers that appear in a rule pattern but are not listed in atoms are pattern variables; the convention is single uppercase letters (X, Y, Z, I, J, K).
Each rule's body is a sequent: comma-separated premises on the left of |-, a single conclusion on the right. The premise multiplicity determines whether the rule fires on a single chart cell (unary) or on a pair of adjacent cells (binary).
A variable appearing multiple times in the same rule unifies across occurrences: in fwd_app, the Y in the first premise must match the Y in the second premise. Different rules instantiate independently.
DSL Features¶
rule NAME : premises |- conclusion: a sequent rule. Arbitrary-arity premise lists are supported; the compiler dispatches to the appropriate chart-cell shape.- Pattern variables vs atoms: single-uppercase identifiers bind as wildcards; every other identifier in a rule pattern must appear in the
atomslist. - Constructor applications in patterns:
Fwd(X, Y),Bwd(X, Y),span(I, J, X)are patterns whose head is an atom and whose arguments are nested patterns. Matching is structural and unification-based.
Pattern Matching and Unification¶
The compiler uses first-order pattern matching with occurs-checking-free unification to fire rules:
- Variables (
X,Y,Z): metavariables that match any concrete subterm. Repeated occurrences within a rule must unify. - Constructor patterns (
Fwd(X, Y)): match constructor-application items whose head is the same atom and whose children unify.
The agenda fires each rule at every chart cell (unary) or every pair of adjacent cells (binary) whose categories unify with the rule's premises; the conclusion item is then inserted into the chart under the rule's weight.
Extending the Rule Set¶
Additional combinators are spelled out as more sequent rules in the same block:
- Type-raising (unary):
rule type_raise : span(I, J, X) |- span(I, J, Fwd(Y, Bwd(Y, X))), note that introducing a fresh wildcard likeYin the conclusion requires either anaxioms = …source or a downstream rule that pins it down. - Restricted composition:
rule restricted_comp : span(I, K, Fwd(X, Y)), span(K, J, Fwd(Y, NP)) |- span(I, J, Fwd(X, NP)), by replacing the third wildcard with the literalNPatom we constrain when the rule fires.
Rule premise multiplicity is unary or binary; combinators that need three or more premises are expressed as a chain of binary rules sharing intermediate categories.
Try it¶
Every #[learnable] lexicon entry and every #[learnable] rule
exposes a real nn.Parameter on the compiled
DeductionSystem. The
system is callable: ded(sentence) returns a
ChartView whose
goal_weight() is the differentiable log-marginal
\(\log Z(s; \mathbf{w}) = \log \sum_d \exp \langle \mathbf{w}, \phi(d) \rangle\)
summed over every derivation \(d\) that the start symbol licenses for
the input. Fitting the lexicon and rule weights together is then a
regression-style problem: minimise \(-\sum_n \log Z(s_n)\) over a
corpus of sentences. The
quivers.stochastic.deduction module ships the
two standard surfaces.
MAP fit (Adam on rule & lexicon weights)¶
import torch
from quivers.dsl import load
from quivers.stochastic.deduction import adam_fit_deduction, sample_corpus
torch.manual_seed(0)
prog = load("docs/examples/source/custom_rules.qvr")
ded = prog.deductions["AB"]
corpus = [["the", "dog", "runs"]] * 4
history = adam_fit_deduction(
ded, corpus, steps=300, lr=5e-2, prior_scale=1.0,
)
print(f"loss: {history[0]:.2f} → {history[-1]:.2f}") # strictly decreasing
# Forward-sample under the fitted parameters and check the
# dominant length-3 yield recovers the training corpus.
draws = sample_corpus(ded, length=3, n_samples=32, seed=0)
print("dominant yield:", max(set(map(tuple, draws)), key=draws.count))
# → ("the dog runs",)
adam_fit_deduction maximises the corpus log-marginal under an
optional Normal prior on the parameters; prior_scale=1.0 gives MAP
under a unit Normal. sample_corpus enumerates yields of the chosen
length and draws from the categorical defined by their chart weights; exact forward sampling because the chart marginalises the
derivation forest.
NUTS (full Bayesian posterior)¶
import torch
from quivers.dsl import load
from quivers.inference import MCMC, NUTSKernel
from quivers.stochastic.deduction import nuts_program_from_deduction
torch.manual_seed(0)
prog = load("docs/examples/source/custom_rules.qvr")
ded = prog.deductions["AB"]
corpus = [["the", "dog", "runs"]] * 4
model, x, observations = nuts_program_from_deduction(
ded, corpus, prior_scale=1.0,
)
kernel = NUTSKernel(step_size=0.1, max_tree_depth=4, target_accept=0.8)
mc = MCMC(kernel, num_warmup=50, num_samples=50, num_chains=2)
result = mc.run(model, x, observations)
print("acceptance:", float(result.acceptance_rates.mean()))
print("divergences:", int(result.divergence_counts.sum()))
posterior_means = {
name: float(samples.mean()) for name, samples in result.samples.items()
}
print("posterior mean log-weights:", posterior_means)
nuts_program_from_deduction lifts every learnable parameter of the
deduction into a Normal(0, σ) sample
site and adds the corpus log-marginal \(\log Z\) to the joint via a
score step. The standard
NUTSKernel drives the
posterior \(p(\mathbf{w} \mid s_1, \ldots, s_N) \propto p(\mathbf{w})
\cdot \prod_n Z(s_n; \mathbf{w})\). The same Bayesian object
bayesian_regression fits, with the chart
total in place of the Gaussian likelihood.
Categorical Perspective¶
Sequent rules are hyperedges in the rule-system multicategory. A binary rule is a 2-input / 1-output hyperedge whose endpoints are pattern templates; firing the rule against the chart is the substitution along a pattern morphism into the category of concrete chart items. Variable unification across premises is exactly the pullback in the category of variable assignments: two premise patterns sharing a variable Y constrain the two assignments to agree on Y's value. The agenda's least-pre-fixed-point computation in the LogProb-enriched lattice of charts is independent of firing order (Goodman 1999 §3); the runtime picks a default strategy from rule arities and semiring properties.