Type-Logical Grammar (Lambek Calculus)¶
QVR Source¶
# Weighted Type-Logical Grammar
#
# The non-commutative Lambek calculus L expressed as an
# agenda-based weighted deduction. Categories carry slash (Fwd,
# Bwd) and tensor (Tns) constructors; chart items are span(I,
# J, X) triples.
#
# Deduction:
#
# right_app : X/Y, Y |- X forward application
# left_app : Y, X\Y |- X backward application
# tensor_intro : A, B |- Tns(A, B) product introduction
# tensor_left : Tns(A, B) |- A left projection
# tensor_right : Tns(A, B) |- B right projection
#
# The Lambek calculus is resource-sensitive (every hypothesis is
# used exactly once) and order-preserving (no permutation). It
# is the internal logic of a biclosed monoidal category; the
# residuation law A (x) B |- C iff A |- C/B iff B |- A\C makes
# L the free residuated monoid over its atoms.
object Term : FinSet 16
deduction Lambek : Term -> Term [semiring=LogProb, start=S, depth=6]
atoms S, NP, N, VP, PP, Fwd, Bwd, Tns, span, every, dog, barks
rule right_app : span(I, K, Fwd(A, B)), span(K, J, B) |- span(I, J, A) #[learnable]
rule left_app : span(I, K, B), span(K, J, Bwd(A, B)) |- span(I, J, A) #[learnable]
rule tensor_intro : span(I, K, A), span(K, J, B) |- span(I, J, Tns(A, B)) #[learnable]
rule tensor_left : span(I, J, Tns(A, B)) |- span(I, J, A) #[learnable]
rule tensor_right : span(I, J, Tns(A, B)) |- span(I, J, B) #[learnable]
lexicon
"every" : Fwd(NP, N) = every #[learnable]
"dog" : N = dog #[learnable]
"barks" : Bwd(S, NP) = barks #[learnable]
Overview¶
Type-logical grammar, grounded in the non-commutative Lambek calculus, is a resource-conscious approach to syntax: every hypothesis is used exactly once and argument order is preserved. The deduction above lists slash (Fwd, Bwd) and tensor (Tns) constructors over chart-spans span(I, J, X), and licenses right / left application, product introduction, and product elimination as sequent rules.
Walkthrough¶
atoms NAME, NAME, ... declares the constructor vocabulary. Category atoms are S, NP, N, VP, PP; structural constructors are Fwd(A, B) ≡ A/B, Bwd(A, B) ≡ A\B, Tns(A, B) ≡ A⊗B. The chart-item constructor span(I, J, X) packages a derivation covering tokens [I, J) carrying category X. Single-uppercase identifiers (A, B, I, J, K) appearing in rule patterns bind as wildcards.
The rules realize the four logical core operations of the Lambek calculus:
right_app: modus ponens for forward slash:A/B, B ⊢ A.left_app: modus ponens for backward slash:B, A\B ⊢ A.tensor_intro: product introduction: adjacent derivations ofAandBcombine into a derivation ofA⊗B.tensor_left/tensor_right: product elimination: a derivation ofA⊗Bprojects to derivations of either component over the same span.
Together these rules yield the equational theory of the residuated monoid. The agenda runs to depth 6 by default; the LogProb semiring accumulates inside log-probabilities that flow back as gradients to learnable axiom weights.
DSL Features¶
- Sequent rules with arbitrary arity: rule bodies declare premises on the left of
|-and a single conclusion on the right; the compiler routes unary patterns to unary chart cells and binary patterns to binary chart cells. - Resource sensitivity is structural: there is no contraction or weakening rule, so every premise in a sequent must match a distinct chart cell.
- Order preservation: pattern variables appear in textual order; the parser enforces left-to-right span composition.
- Tensor and slash as user atoms: there is no special syntax,
Tns,Fwd,Bwdare atoms declared in theatoms NAME, NAME, ...block and may be replaced or extended by the user.
Try it¶
Every #[learnable] lexicon entry and every #[learnable] rule
exposes a real nn.Parameter on the compiled
DeductionSystem. The
system is callable: ded(sentence) returns a
ChartView whose
goal_weight() is the differentiable log-marginal
\(\log Z(s; \mathbf{w}) = \log \sum_d \exp \langle \mathbf{w}, \phi(d) \rangle\)
summed over every derivation \(d\) that the start symbol licenses for
the input. Fitting the lexicon and rule weights together is then a
regression-style problem: minimise \(-\sum_n \log Z(s_n)\) over a
corpus of sentences. The
quivers.stochastic.deduction module ships the
two standard surfaces.
MAP fit (Adam on rule & lexicon weights)¶
import torch
from quivers.dsl import load
from quivers.stochastic.deduction import adam_fit_deduction, sample_corpus
torch.manual_seed(0)
prog = load("docs/examples/source/type_logical.qvr")
ded = prog.deductions["Lambek"]
corpus = [["every", "dog", "barks"]] * 4
history = adam_fit_deduction(
ded, corpus, steps=300, lr=5e-2, prior_scale=1.0,
)
print(f"loss: {history[0]:.2f} → {history[-1]:.2f}") # strictly decreasing
# Forward-sample under the fitted parameters and check the
# dominant length-3 yield recovers the training corpus.
draws = sample_corpus(ded, length=3, n_samples=32, seed=0)
print("dominant yield:", max(set(map(tuple, draws)), key=draws.count))
# → ("every dog barks",)
adam_fit_deduction maximises the corpus log-marginal under an
optional Normal prior on the parameters; prior_scale=1.0 gives MAP
under a unit Normal. sample_corpus enumerates yields of the chosen
length and draws from the categorical defined by their chart weights; exact forward sampling because the chart marginalises the
derivation forest.
NUTS (full Bayesian posterior)¶
import torch
from quivers.dsl import load
from quivers.inference import MCMC, NUTSKernel
from quivers.stochastic.deduction import nuts_program_from_deduction
torch.manual_seed(0)
prog = load("docs/examples/source/type_logical.qvr")
ded = prog.deductions["Lambek"]
corpus = [["every", "dog", "barks"]] * 4
model, x, observations = nuts_program_from_deduction(
ded, corpus, prior_scale=1.0,
)
kernel = NUTSKernel(step_size=0.1, max_tree_depth=4, target_accept=0.8)
mc = MCMC(kernel, num_warmup=50, num_samples=50, num_chains=2)
result = mc.run(model, x, observations)
print("acceptance:", float(result.acceptance_rates.mean()))
print("divergences:", int(result.divergence_counts.sum()))
posterior_means = {
name: float(samples.mean()) for name, samples in result.samples.items()
}
print("posterior mean log-weights:", posterior_means)
nuts_program_from_deduction lifts every learnable parameter of the
deduction into a Normal(0, σ) sample
site and adds the corpus log-marginal \(\log Z\) to the joint via a
score step. The standard
NUTSKernel drives the
posterior \(p(\mathbf{w} \mid s_1, \ldots, s_N) \propto p(\mathbf{w})
\cdot \prod_n Z(s_n; \mathbf{w})\). The same Bayesian object
bayesian_regression fits, with the chart
total in place of the Gaussian likelihood.
Categorical Perspective¶
The Lambek calculus is the internal language of a residuated monoidal category (biclosed monoidal category). The tensor ⊗ is the monoidal product; the two slashes are its left and right adjoints. The residuation laws
A ⊗ B ⊢ C iff A ⊢ C/B iff B ⊢ A\C
are the statement of that adjunction. Because there is no contraction (copying) or weakening (discarding), every derivation consumes its input span exactly once; the agenda's span indexing enforces this by attaching each item to a single token range.
Connections to Other Formalisms¶
The Lambek calculus is strictly more expressive than context-free grammar (handling extraction, gapping) but remains decidable and efficiently parseable. Compared to CCG it is more restricted: CCG implicitly permits structural rules (weakening, contraction) that the Lambek calculus does not. The multimodal extensions (see multimodal-tlg) introduce controlled structural operators that license specific deviations from strict linearity.