Chart Parsers

Differentiable CKY chart parsers implemented as weighted deductive systems. ChartParser extends DeductiveSystem, composing LexicalAxiom, BinarySpanDeduction, UnarySpanDeduction, SpanGoal, and CKYSchedule into a single nn.Module. Supports learnable rule weights and semiring-parameterized scoring (log-probability, Viterbi, boolean, counting).

Use ChartParser.from_schema(schema, category_system, ...) to construct a parser from composable RuleSchema objects, or ChartParser.from_category_system(...) to construct from an explicit RuleSystem.

Includes concrete CCGParser and LambekParser convenience subclasses.

parsers

Chart parsers as weighted deductive systems.

A ChartParser is a DeductiveSystem that composes:

  • LexicalAxiom — populates length-1 spans from a learnable lexicon.
  • BinarySpanDeduction — applies binary structural rules.
  • UnarySpanDeduction — applies unary rules to convergence.
  • SpanGoal — extracts the start-symbol score for the full span.
  • CKYSchedule — bottom-up evaluation by span length.

Concrete grammar formalisms are specific choices of RuleSchema:

  • CCG = EVALUATION | HARMONIC_COMPOSITION | CROSSED_COMPOSITION
  • LAMBEK = EVALUATION | ADJUNCTION_UNITS | TENSOR_INTRODUCTION | TENSOR_PROJECTION
  • NL = EVALUATION

The ChartParser can also be constructed directly from a RuleSystem or from a RuleSchema + CategorySystem.

Categorical perspective

A chart parser implements a morphism

parse : FreeMonoid(T) -> 1

in the category of measurable spaces, mapping variable-length terminal strings to their log-probability under the grammar. The RuleSchema specifies the structural rules (the generating arrows of the free category), while the lexicon provides the interpretation functor mapping terminals to weighted category distributions.

The parsing algorithm is parameterized by a ChartSemiring (Goodman, 1999), making the parser agnostic to the scoring algebra.

ChartParser

ChartParser(rule_system: RuleSystem, n_terminals: int, start_idx: int, category_system: CategorySystem | None = None, semiring: ChartSemiring | None = None, learnable_rule_weights: bool = True)

Bases: DeductiveSystem

Differentiable CKY chart parser as a weighted deductive system.

Composes LexicalAxiom, BinarySpanDeduction, UnarySpanDeduction, SpanGoal, and CKYSchedule into a single DeductiveSystem (nn.Module).

PARAMETER DESCRIPTION
rule_system

The structural rules governing the grammar.

TYPE: RuleSystem

n_terminals

Vocabulary size.

TYPE: int

start_idx

Index of the start category in the rule system.

TYPE: int

category_system

Optional reference to the category system (for introspection).

TYPE: CategorySystem or None DEFAULT: None

semiring

Scoring algebra. Defaults to log-probability semiring.

TYPE: ChartSemiring or None DEFAULT: None

learnable_rule_weights

Whether rule weights are learnable parameters.

TYPE: bool DEFAULT: True

Source code in src/quivers/stochastic/parsers.py
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def __init__(
    self,
    rule_system: RuleSystem,
    n_terminals: int,
    start_idx: int,
    category_system: CategorySystem | None = None,
    semiring: ChartSemiring | None = None,
    learnable_rule_weights: bool = True,
) -> None:
    axiom = LexicalAxiom(n_terminals, rule_system.n_categories)
    binary_ded = BinarySpanDeduction(
        rule_system,
        learnable=learnable_rule_weights,
    )
    unary_ded = UnarySpanDeduction(
        rule_system,
        learnable=learnable_rule_weights,
    )
    goal = SpanGoal(start_idx)
    schedule = CKYSchedule()

    super().__init__(
        axiom=axiom,
        deductions=[binary_ded, unary_ded],
        goal=goal,
        schedule=schedule,
        semiring=semiring,
    )

    self._rule_system = rule_system
    self._system = category_system
    self._n_term = n_terminals
    self._n_cat = rule_system.n_categories
    self._start_idx = start_idx

rule_system property

rule_system: RuleSystem

The structural rule system.

category_system property

category_system: CategorySystem | None

The category inventory (if available).

n_rules property

n_rules: int

Number of binary combination rules.

n_unary_rules property

n_unary_rules: int

Number of unary rules.

log_lexicon property

log_lexicon: Tensor

Log-probabilities of lexical category assignments.

RETURNS DESCRIPTION
Tensor

Shape (n_terminals, n_categories).

from_schema classmethod

from_schema(schema: RuleSchema, category_system: CategorySystem, n_terminals: int, start: str | Category = 'S', semiring: ChartSemiring | None = None, learnable_rule_weights: bool = True) -> ChartParser

Create a chart parser from a rule schema and category system.

PARAMETER DESCRIPTION
schema

The composable rule schema (e.g. CCG, LAMBEK).

TYPE: RuleSchema

category_system

The category inventory.

TYPE: CategorySystem

n_terminals

Vocabulary size.

TYPE: int

start

The start category.

TYPE: str or Category DEFAULT: 'S'

semiring

Scoring algebra.

TYPE: ChartSemiring or None DEFAULT: None

learnable_rule_weights

Whether rule weights are learnable.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
ChartParser

A new parser instance.

Source code in src/quivers/stochastic/parsers.py
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
@classmethod
def from_schema(
    cls,
    schema: RuleSchema,
    category_system: CategorySystem,
    n_terminals: int,
    start: str | Category = "S",
    semiring: ChartSemiring | None = None,
    learnable_rule_weights: bool = True,
) -> ChartParser:
    """Create a chart parser from a rule schema and category system.

    Parameters
    ----------
    schema : RuleSchema
        The composable rule schema (e.g. ``CCG``, ``LAMBEK``).
    category_system : CategorySystem
        The category inventory.
    n_terminals : int
        Vocabulary size.
    start : str or Category
        The start category.
    semiring : ChartSemiring or None
        Scoring algebra.
    learnable_rule_weights : bool
        Whether rule weights are learnable.

    Returns
    -------
    ChartParser
        A new parser instance.
    """
    rule_system = schema(category_system)

    return cls.from_category_system(
        category_system,
        rule_system,
        n_terminals,
        start,
        semiring,
        learnable_rule_weights,
    )

from_category_system classmethod

from_category_system(category_system: CategorySystem, rule_system: RuleSystem, n_terminals: int, start: str | Category = 'S', semiring: ChartSemiring | None = None, learnable_rule_weights: bool = True) -> ChartParser

Create a chart parser from a category system and rules.

PARAMETER DESCRIPTION
category_system

The category inventory.

TYPE: CategorySystem

rule_system

The structural rules.

TYPE: RuleSystem

n_terminals

Vocabulary size.

TYPE: int

start

The start category.

TYPE: str or Category DEFAULT: 'S'

semiring

Scoring algebra.

TYPE: ChartSemiring or None DEFAULT: None

learnable_rule_weights

Whether rule weights are learnable.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
ChartParser

A new parser instance.

Source code in src/quivers/stochastic/parsers.py
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
@classmethod
def from_category_system(
    cls,
    category_system: CategorySystem,
    rule_system: RuleSystem,
    n_terminals: int,
    start: str | Category = "S",
    semiring: ChartSemiring | None = None,
    learnable_rule_weights: bool = True,
) -> ChartParser:
    """Create a chart parser from a category system and rules.

    Parameters
    ----------
    category_system : CategorySystem
        The category inventory.
    rule_system : RuleSystem
        The structural rules.
    n_terminals : int
        Vocabulary size.
    start : str or Category
        The start category.
    semiring : ChartSemiring or None
        Scoring algebra.
    learnable_rule_weights : bool
        Whether rule weights are learnable.

    Returns
    -------
    ChartParser
        A new parser instance.
    """
    if isinstance(start, str):
        start = AtomicCategory(name=start)

    if start not in category_system:
        raise ValueError(f"start category {start!r} not in category system")

    return cls(
        rule_system=rule_system,
        n_terminals=n_terminals,
        start_idx=category_system.index(start),
        category_system=category_system,
        semiring=semiring,
        learnable_rule_weights=learnable_rule_weights,
    )

forward

forward(input: Tensor) -> Tensor

Compute sentence scores via CKY chart parsing.

PARAMETER DESCRIPTION
input

Integer tensor of terminal indices. Shape (batch, seq_len) or (seq_len,) for a single sentence.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

Score of each sentence under the grammar.

Source code in src/quivers/stochastic/parsers.py
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
def forward(self, input: torch.Tensor) -> torch.Tensor:
    """Compute sentence scores via CKY chart parsing.

    Parameters
    ----------
    input : torch.Tensor
        Integer tensor of terminal indices. Shape
        ``(batch, seq_len)`` or ``(seq_len,)`` for a single
        sentence.

    Returns
    -------
    torch.Tensor
        Score of each sentence under the grammar.
    """
    squeeze = False

    if input.dim() == 1:
        input = input.unsqueeze(0)
        squeeze = True

    if input.shape[1] == 0:
        raise ValueError("cannot parse empty sentences")

    result = super().forward(input)

    if squeeze:
        return result.squeeze(0)

    return result

inside_chart

inside_chart(tokens: Tensor) -> Tensor

Compute the full inside chart (for analysis/debugging).

PARAMETER DESCRIPTION
tokens

Integer tensor of terminal indices.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

Full chart of shape (batch, N, seq_len, seq_len+1).

Source code in src/quivers/stochastic/parsers.py
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
def inside_chart(self, tokens: torch.Tensor) -> torch.Tensor:
    """Compute the full inside chart (for analysis/debugging).

    Parameters
    ----------
    tokens : torch.Tensor
        Integer tensor of terminal indices.

    Returns
    -------
    torch.Tensor
        Full chart of shape ``(batch, N, seq_len, seq_len+1)``.
    """
    squeeze = False

    if tokens.dim() == 1:
        tokens = tokens.unsqueeze(0)
        squeeze = True

    if tokens.shape[1] == 0:
        raise ValueError("cannot parse empty sentences")

    # run axiom + schedule but skip goal extraction
    chart = self.axiom(tokens, self._semiring)
    chart = self._schedule.run(chart, self.deductions, self._semiring)

    if squeeze:
        return chart.squeeze(0)

    return chart

ccg

Weighted Combinatory Categorial Grammar (CCG) parser.

Implements a differentiable CKY-style chart parser for weighted CCG, supporting the full set of standard combinatory rules:

  • Forward application (>): X/Y Y → X
  • Backward application (<): Y X\Y → X
  • Forward composition (>B): X/Y Y/Z → X/Z
  • Backward composition (<B): Y\Z X\Y → X\Z
  • Forward crossed composition (>Bx): X/Y Y\Z → X\Z
  • Backward crossed composition (<Bx): Y/Z X\Y → X/Z
  • Generalized forward composition (>B^n): X/Y Y|_1 Z_1 ... |_n Z_n → X|_1 Z_1 ... |_n Z_n
  • Forward type raising (>T): X → T/(T\X)
  • Backward type raising (<T): X → T(T/X)

All computation is in log-space for numerical stability. The lexicon (word → category distribution) is a learnable stochastic morphism, while the combinatory rules are structural (fixed by the grammar).

Categorical perspective

CCG is the internal language of a closed monoidal category. The slash categories X/Y and X\Y are left and right internal homs, and the application rules are evaluation morphisms (counits of the hom-tensor adjunction). Composition rules correspond to composition of internal hom morphisms.

Examples:

>>> from quivers.stochastic.categories import CategorySystem
>>> from quivers.stochastic.ccg import CCGParser
>>> cs = CategorySystem.from_atoms(["S", "NP", "N"])
>>> cs.add_slash(cs["S"], cs["NP"], "\\")   # S\NP
>>> cs.add_slash(cs["S\\NP"], cs["NP"], "/")  # (S\NP)/NP
>>> cs.add_slash(cs["NP"], cs["N"], "/")      # NP/N
>>> parser = CCGParser(cs, n_terminals=100, start="S")
>>> tokens = torch.randint(0, 100, (4, 5))
>>> log_probs = parser(tokens)

CCGParser

CCGParser(category_system: CategorySystem, n_terminals: int, start: str | Category = 'S', enable_composition: bool = True, enable_crossed_composition: bool = True, enable_type_raising: bool = False, generalized_composition_depth: int = 1)

Bases: ChartParser

Weighted CKY chart parser for Combinatory Categorial Grammar.

A convenience subclass of ChartParser that builds CCG-specific rules from a CategorySystem and grammar options.

PARAMETER DESCRIPTION
category_system

The finite inventory of grammatical categories.

TYPE: CategorySystem

n_terminals

Vocabulary size.

TYPE: int

start

The start (sentence) category.

TYPE: str or Category DEFAULT: 'S'

enable_composition

Enable harmonic composition (>B, <B). Default True.

TYPE: bool DEFAULT: True

enable_crossed_composition

Enable crossed composition (>Bx, <Bx). Default True.

TYPE: bool DEFAULT: True

enable_type_raising

Enable type raising (>T, <T). Default False.

TYPE: bool DEFAULT: False

generalized_composition_depth

Maximum depth for generalized composition. Default 1.

TYPE: int DEFAULT: 1

Source code in src/quivers/stochastic/ccg.py
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
def __init__(
    self,
    category_system: CategorySystem,
    n_terminals: int,
    start: str | Category = "S",
    enable_composition: bool = True,
    enable_crossed_composition: bool = True,
    enable_type_raising: bool = False,
    generalized_composition_depth: int = 1,
) -> None:
    # resolve start category
    if isinstance(start, str):
        start_cat = AtomicCategory(name=start)

    else:
        start_cat = start

    if start_cat not in category_system:
        raise ValueError(f"start category {start_cat!r} not in category system")

    # build CCG rule system
    rules = ccg_rules(
        category_system,
        enable_composition=enable_composition,
        enable_crossed_composition=enable_crossed_composition,
        enable_type_raising=enable_type_raising,
        generalized_composition_depth=generalized_composition_depth,
    )

    super().__init__(
        rule_system=rules,
        n_terminals=n_terminals,
        start_idx=category_system.index(start_cat),
        category_system=category_system,
    )

lambek

Weighted type-logical grammar (Lambek calculus) parser.

Implements a differentiable chart parser for the non-commutative Lambek calculus (L), supporting:

  • Right application (modus ponens right): A/B, B ⊢ A
  • Left application (modus ponens left): B, A\B ⊢ A (equivalently written B, B\A ⊢ A with the convention A\B = A seeks B on the left)
  • Right lifting (/L): A ⊢ B/(A\B)
  • Left lifting (\L): A ⊢ (B/A)\B
  • Product introduction (⊗I): A, B ⊢ A⊗B
  • Product elimination (⊗E): A⊗B ⊢ A, B (projected)

The Lambek calculus is the internal logic of a residuated monoidal (bi-closed) category:

A ⊗ B ⊢ C   iff   A ⊢ C/B   iff   B ⊢ A\\C

Unlike CCG, the Lambek calculus does NOT permit: - Composition as a primitive rule (it's derivable) - Crossed composition - Permuting rules

This makes the Lambek calculus resource-sensitive and order-preserving, modeling word order constraints naturally.

Extended calculi

This module also supports:

  • NL (Non-associative Lambek): If associative=False, the parser uses tree-structured proof search, disallowing reassociation of the product.
  • LP (Lambek with permutation): If commutative=True, the product is commutative, relaxing word order constraints.
  • Displacement calculus: Not yet supported; would require additional modalities.
Categorical perspective

The Lambek calculus is the internal logic of a residuated monoidal category (a biclosed monoidal category). The types are objects; the proofs are morphisms. A valid derivation A_1 ⊗ ... ⊗ A_n ⊢ B corresponds to a morphism in the category.

The parsing algorithm searches for such morphisms (proofs), computing the total weight (probability) of all derivations.

Examples:

>>> from quivers.stochastic.categories import CategorySystem
>>> from quivers.stochastic.lambek import LambekParser
>>> cs = CategorySystem.from_atoms(["S", "NP", "N"])
>>> cs.add_slash(cs["S"], cs["NP"], "\\")   # S\NP
>>> cs.add_slash(cs["S\\NP"], cs["NP"], "/")  # (S\NP)/NP
>>> cs.add_slash(cs["NP"], cs["N"], "/")      # NP/N
>>> parser = LambekParser(cs, n_terminals=100, start="S")
>>> tokens = torch.randint(0, 100, (4, 5))
>>> log_probs = parser(tokens)

LambekParser

LambekParser(category_system: CategorySystem, n_terminals: int, start: str | Category = 'S', associative: bool = True, commutative: bool = False, enable_lifting: bool = True, enable_product: bool = True)

Bases: ChartParser

Weighted chart parser for type-logical grammar (Lambek calculus).

A convenience subclass of ChartParser that builds Lambek calculus rules from a CategorySystem and grammar options.

PARAMETER DESCRIPTION
category_system

The finite category inventory.

TYPE: CategorySystem

n_terminals

Vocabulary size.

TYPE: int

start

The start category (default "S").

TYPE: str or Category DEFAULT: 'S'

associative

Use associative Lambek calculus (default True).

TYPE: bool DEFAULT: True

commutative

Use commutative product / LP calculus (default False).

TYPE: bool DEFAULT: False

enable_lifting

Enable Lambek lifting rules (default True).

TYPE: bool DEFAULT: True

enable_product

Enable product introduction (default True).

TYPE: bool DEFAULT: True

Source code in src/quivers/stochastic/lambek.py
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
def __init__(
    self,
    category_system: CategorySystem,
    n_terminals: int,
    start: str | Category = "S",
    associative: bool = True,
    commutative: bool = False,
    enable_lifting: bool = True,
    enable_product: bool = True,
) -> None:
    # resolve start category
    if isinstance(start, str):
        start_cat = AtomicCategory(name=start)

    else:
        start_cat = start

    if start_cat not in category_system:
        raise ValueError(f"start category {start_cat!r} not in category system")

    # build Lambek rule system
    rules = lambek_rules(
        category_system,
        associative=associative,
        commutative=commutative,
        enable_lifting=enable_lifting,
        enable_product=enable_product,
    )

    super().__init__(
        rule_system=rules,
        n_terminals=n_terminals,
        start_idx=category_system.index(start_cat),
        category_system=category_system,
    )

    self._associative = associative
    self._commutative = commutative