Chart Parsers¶
Differentiable CKY chart parsers implemented as weighted deductive systems. ChartParser extends DeductiveSystem, composing LexicalAxiom, BinarySpanDeduction, UnarySpanDeduction, SpanGoal, and CKYSchedule into a single nn.Module. Supports learnable rule weights and semiring-parameterized scoring (log-probability, Viterbi, boolean, counting).
Use ChartParser.from_schema(schema, category_system, ...) to construct a parser from composable RuleSchema objects, or ChartParser.from_category_system(...) to construct from an explicit RuleSystem.
Includes concrete CCGParser and LambekParser convenience subclasses.
parsers
¶
Chart parsers as weighted deductive systems.
A ChartParser is a DeductiveSystem that composes:
LexicalAxiom— populates length-1 spans from a learnable lexicon.BinarySpanDeduction— applies binary structural rules.UnarySpanDeduction— applies unary rules to convergence.SpanGoal— extracts the start-symbol score for the full span.CKYSchedule— bottom-up evaluation by span length.
Concrete grammar formalisms are specific choices of RuleSchema:
CCG = EVALUATION | HARMONIC_COMPOSITION | CROSSED_COMPOSITIONLAMBEK = EVALUATION | ADJUNCTION_UNITS | TENSOR_INTRODUCTION | TENSOR_PROJECTIONNL = EVALUATION
The ChartParser can also be constructed directly from a RuleSystem
or from a RuleSchema + CategorySystem.
Categorical perspective
A chart parser implements a morphism
parse : FreeMonoid(T) -> 1
in the category of measurable spaces, mapping variable-length terminal
strings to their log-probability under the grammar. The RuleSchema
specifies the structural rules (the generating arrows of the free
category), while the lexicon provides the interpretation functor
mapping terminals to weighted category distributions.
The parsing algorithm is parameterized by a ChartSemiring
(Goodman, 1999), making the parser agnostic to the scoring algebra.
ChartParser
¶
ChartParser(rule_system: RuleSystem, n_terminals: int, start_idx: int, category_system: CategorySystem | None = None, semiring: ChartSemiring | None = None, learnable_rule_weights: bool = True)
Bases: DeductiveSystem
Differentiable CKY chart parser as a weighted deductive system.
Composes LexicalAxiom, BinarySpanDeduction,
UnarySpanDeduction, SpanGoal, and CKYSchedule
into a single DeductiveSystem (nn.Module).
| PARAMETER | DESCRIPTION |
|---|---|
rule_system
|
The structural rules governing the grammar.
TYPE:
|
n_terminals
|
Vocabulary size.
TYPE:
|
start_idx
|
Index of the start category in the rule system.
TYPE:
|
category_system
|
Optional reference to the category system (for introspection).
TYPE:
|
semiring
|
Scoring algebra. Defaults to log-probability semiring.
TYPE:
|
learnable_rule_weights
|
Whether rule weights are learnable parameters.
TYPE:
|
Source code in src/quivers/stochastic/parsers.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | |
category_system
property
¶
category_system: CategorySystem | None
The category inventory (if available).
log_lexicon
property
¶
log_lexicon: Tensor
Log-probabilities of lexical category assignments.
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Shape |
from_schema
classmethod
¶
from_schema(schema: RuleSchema, category_system: CategorySystem, n_terminals: int, start: str | Category = 'S', semiring: ChartSemiring | None = None, learnable_rule_weights: bool = True) -> ChartParser
Create a chart parser from a rule schema and category system.
| PARAMETER | DESCRIPTION |
|---|---|
schema
|
The composable rule schema (e.g.
TYPE:
|
category_system
|
The category inventory.
TYPE:
|
n_terminals
|
Vocabulary size.
TYPE:
|
start
|
The start category.
TYPE:
|
semiring
|
Scoring algebra.
TYPE:
|
learnable_rule_weights
|
Whether rule weights are learnable.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ChartParser
|
A new parser instance. |
Source code in src/quivers/stochastic/parsers.py
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | |
from_category_system
classmethod
¶
from_category_system(category_system: CategorySystem, rule_system: RuleSystem, n_terminals: int, start: str | Category = 'S', semiring: ChartSemiring | None = None, learnable_rule_weights: bool = True) -> ChartParser
Create a chart parser from a category system and rules.
| PARAMETER | DESCRIPTION |
|---|---|
category_system
|
The category inventory.
TYPE:
|
rule_system
|
The structural rules.
TYPE:
|
n_terminals
|
Vocabulary size.
TYPE:
|
start
|
The start category.
TYPE:
|
semiring
|
Scoring algebra.
TYPE:
|
learnable_rule_weights
|
Whether rule weights are learnable.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ChartParser
|
A new parser instance. |
Source code in src/quivers/stochastic/parsers.py
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 | |
forward
¶
forward(input: Tensor) -> Tensor
Compute sentence scores via CKY chart parsing.
| PARAMETER | DESCRIPTION |
|---|---|
input
|
Integer tensor of terminal indices. Shape
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Score of each sentence under the grammar. |
Source code in src/quivers/stochastic/parsers.py
239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 | |
inside_chart
¶
inside_chart(tokens: Tensor) -> Tensor
Compute the full inside chart (for analysis/debugging).
| PARAMETER | DESCRIPTION |
|---|---|
tokens
|
Integer tensor of terminal indices.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Full chart of shape |
Source code in src/quivers/stochastic/parsers.py
270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 | |
ccg
¶
Weighted Combinatory Categorial Grammar (CCG) parser.
Implements a differentiable CKY-style chart parser for weighted CCG, supporting the full set of standard combinatory rules:
- Forward application (>): X/Y Y → X
- Backward application (<): Y X\Y → X
- Forward composition (>B): X/Y Y/Z → X/Z
- Backward composition (<B): Y\Z X\Y → X\Z
- Forward crossed composition (>Bx): X/Y Y\Z → X\Z
- Backward crossed composition (<Bx): Y/Z X\Y → X/Z
- Generalized forward composition (>B^n): X/Y Y|_1 Z_1 ... |_n Z_n → X|_1 Z_1 ... |_n Z_n
- Forward type raising (>T): X → T/(T\X)
- Backward type raising (<T): X → T(T/X)
All computation is in log-space for numerical stability. The lexicon (word → category distribution) is a learnable stochastic morphism, while the combinatory rules are structural (fixed by the grammar).
Categorical perspective
CCG is the internal language of a closed monoidal category. The slash categories X/Y and X\Y are left and right internal homs, and the application rules are evaluation morphisms (counits of the hom-tensor adjunction). Composition rules correspond to composition of internal hom morphisms.
Examples:
>>> from quivers.stochastic.categories import CategorySystem
>>> from quivers.stochastic.ccg import CCGParser
>>> cs = CategorySystem.from_atoms(["S", "NP", "N"])
>>> cs.add_slash(cs["S"], cs["NP"], "\\") # S\NP
>>> cs.add_slash(cs["S\\NP"], cs["NP"], "/") # (S\NP)/NP
>>> cs.add_slash(cs["NP"], cs["N"], "/") # NP/N
>>> parser = CCGParser(cs, n_terminals=100, start="S")
>>> tokens = torch.randint(0, 100, (4, 5))
>>> log_probs = parser(tokens)
CCGParser
¶
CCGParser(category_system: CategorySystem, n_terminals: int, start: str | Category = 'S', enable_composition: bool = True, enable_crossed_composition: bool = True, enable_type_raising: bool = False, generalized_composition_depth: int = 1)
Bases: ChartParser
Weighted CKY chart parser for Combinatory Categorial Grammar.
A convenience subclass of ChartParser that builds CCG-specific
rules from a CategorySystem and grammar options.
| PARAMETER | DESCRIPTION |
|---|---|
category_system
|
The finite inventory of grammatical categories.
TYPE:
|
n_terminals
|
Vocabulary size.
TYPE:
|
start
|
The start (sentence) category.
TYPE:
|
enable_composition
|
Enable harmonic composition (>B, <B). Default True.
TYPE:
|
enable_crossed_composition
|
Enable crossed composition (>Bx, <Bx). Default True.
TYPE:
|
enable_type_raising
|
Enable type raising (>T, <T). Default False.
TYPE:
|
generalized_composition_depth
|
Maximum depth for generalized composition. Default 1.
TYPE:
|
Source code in src/quivers/stochastic/ccg.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | |
lambek
¶
Weighted type-logical grammar (Lambek calculus) parser.
Implements a differentiable chart parser for the non-commutative Lambek calculus (L), supporting:
- Right application (modus ponens right): A/B, B ⊢ A
- Left application (modus ponens left): B, A\B ⊢ A (equivalently written B, B\A ⊢ A with the convention A\B = A seeks B on the left)
- Right lifting (/L): A ⊢ B/(A\B)
- Left lifting (\L): A ⊢ (B/A)\B
- Product introduction (⊗I): A, B ⊢ A⊗B
- Product elimination (⊗E): A⊗B ⊢ A, B (projected)
The Lambek calculus is the internal logic of a residuated monoidal (bi-closed) category:
A ⊗ B ⊢ C iff A ⊢ C/B iff B ⊢ A\\C
Unlike CCG, the Lambek calculus does NOT permit: - Composition as a primitive rule (it's derivable) - Crossed composition - Permuting rules
This makes the Lambek calculus resource-sensitive and order-preserving, modeling word order constraints naturally.
Extended calculi
This module also supports:
- NL (Non-associative Lambek): If
associative=False, the parser uses tree-structured proof search, disallowing reassociation of the product. - LP (Lambek with permutation): If
commutative=True, the product is commutative, relaxing word order constraints. - Displacement calculus: Not yet supported; would require additional modalities.
Categorical perspective
The Lambek calculus is the internal logic of a residuated monoidal category (a biclosed monoidal category). The types are objects; the proofs are morphisms. A valid derivation A_1 ⊗ ... ⊗ A_n ⊢ B corresponds to a morphism in the category.
The parsing algorithm searches for such morphisms (proofs), computing the total weight (probability) of all derivations.
Examples:
>>> from quivers.stochastic.categories import CategorySystem
>>> from quivers.stochastic.lambek import LambekParser
>>> cs = CategorySystem.from_atoms(["S", "NP", "N"])
>>> cs.add_slash(cs["S"], cs["NP"], "\\") # S\NP
>>> cs.add_slash(cs["S\\NP"], cs["NP"], "/") # (S\NP)/NP
>>> cs.add_slash(cs["NP"], cs["N"], "/") # NP/N
>>> parser = LambekParser(cs, n_terminals=100, start="S")
>>> tokens = torch.randint(0, 100, (4, 5))
>>> log_probs = parser(tokens)
LambekParser
¶
LambekParser(category_system: CategorySystem, n_terminals: int, start: str | Category = 'S', associative: bool = True, commutative: bool = False, enable_lifting: bool = True, enable_product: bool = True)
Bases: ChartParser
Weighted chart parser for type-logical grammar (Lambek calculus).
A convenience subclass of ChartParser that builds Lambek
calculus rules from a CategorySystem and grammar options.
| PARAMETER | DESCRIPTION |
|---|---|
category_system
|
The finite category inventory.
TYPE:
|
n_terminals
|
Vocabulary size.
TYPE:
|
start
|
The start category (default "S").
TYPE:
|
associative
|
Use associative Lambek calculus (default True).
TYPE:
|
commutative
|
Use commutative product / LP calculus (default False).
TYPE:
|
enable_lifting
|
Enable Lambek lifting rules (default True).
TYPE:
|
enable_product
|
Enable product introduction (default True).
TYPE:
|
Source code in src/quivers/stochastic/lambek.py
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | |