Weighted Combinatory Categorial Grammar

QVR Source

category S, NP, N, VP, PP
object Token : 256

let grammar = parser(
    rules=[evaluation, harmonic_composition, crossed_composition],
    terminal=Token,
    start=S
)

output grammar

Overview

Combinatory Categorial Grammar (CCG) assigns syntactic categories to words and applies combinatory rules to derive sentence structures. This example demonstrates symbolic category declarations, pre-defined rule schema combinators (evaluation, harmonic_composition, crossed_composition), and the schema-strategy separation in the QVR language.

Walkthrough

category S, NP, N, VP, PP introduces five symbolic categories. Unlike numeric indices, these are readable names that the compiler maps to indices internally. Categories can appear in slash expressions like S / NP.

object Token : 256 introduces a terminal category with 256 indices representing possible tokens.

The parser declaration specifies three rule schemas. The rules parameter lists abstract schema functors, not grounded rules. The evaluation schema generates forward application (\(X/Y, Y \Rightarrow X\)) and backward application (\(Y, X \backslash Y \Rightarrow X\)) for every pair of categories. harmonic_composition generates forward composition (\(X/Y, Y/Z \Rightarrow X/Z\)) and backward composition (\(Y \backslash Z, X \backslash Y \Rightarrow X \backslash Z\)) for all compatible triples. crossed_composition generates rules that permute argument order, handling phenomena like gapping and coordination.

terminal=Token tells the parser which category represents terminals. start=S specifies that derivations must produce a sentence.

DSL Features

Python Usage

Categorical Perspective

CCG is the internal language of a closed monoidal category. The forward slash \(X/Y\) and backward slash \(X \backslash Y\) are internal hom-objects (exponentials), and the application rule is the counit of the hom-tensor adjunction: \([Y, X] \otimes Y \to X\). Composition corresponds to chaining adjunctions; given \(X/Y\) and \(Y/Z\), transitivity yields \(X/Z\). Crossed composition relies on a braiding isomorphism to swap argument order. The type of an expression completely determines what it can combine with, because the closed structure forces all combination to go through the adjunction.

Grammar Expressiveness and Linguistic Coverage

CCG is more expressive than context-free grammar while remaining efficiently parseable. Functor categories let modifiers like VP \ VP attach flexibly without extra rules. The schemas provided here handle a wide variety of linguistic phenomena, though more exotic rule sets can be added via custom rules (see the custom-rules example).

Semiring Selection and Parsing Strategies

The choice of semiring affects the parser's behavior: ViterbiSemiring finds the highest-weight parse, LogSemiring uses log-probabilities for numerical stability, CountSemiring counts distinct parses, and BooleanSemiring checks membership without weights. Parametrizing the parser over the semiring lets the same grammar serve different objectives.