Encoders

Encoder is a torch.nn.Module that realizes an F-algebra homomorphism T_Σ → Vec_D from terms over a signature to fixed-length vectors. The recursion is supplied by the framework; the analyst supplies only the per-operation parametric functions (or omits them and accepts the compiler's scaffolded 2-layer MLP defaults with correct per-arg dimensions).

For binders, the framework threads a typed de-Bruijn context Γ through the recursion: each binder's annotations are compressed in the outer context, fresh variable embeddings are minted via the encoder's var_init_fns, and the scoped arguments are recursed under the extended context.

Graph signatures are compressed via forward_graph: per-vertex-kind initial embedders, finitely many message-passing rounds with per-edge-kind message functions and per-vertex-kind update functions, finally a readout reducing the per-vertex final embeddings to a single graph-level vector.

encoder

Encoder runtime: F-algebra homomorphisms T_Σ -> Vec_D.

A Encoder is a torch.nn.Module that, given a closed Term over a signature, returns its vector embedding. The recursion structure is supplied by the framework (algebra homomorphism); per-operation parametric functions are supplied by the user (declared in a encoder { … } DSL block) or by the default scaffolding.

Canonical term form

A term over Σ has exactly one of three argument shapes at each position, determined by the codomain sort's kind:

  • object — the argument is a Term whose op is a constructor of Σ, a binder of Σ, or the reserved built-in "BoundVar".
  • data — the argument is a raw Python value (string, int, float, …) consumed by the encoder's per-data-sort embedder.
  • index — the argument is a non-negative integer denoting a de-Bruijn index into the current scope context Γ.

No Term("Data", …) or other wrappings are accepted at runtime; the canonical form is the only form.

The reserved "BoundVar" op is the built-in de-Bruijn reference: Term("BoundVar", (i,)) at an object position reads the i-th in-scope variable's embedding from Γ.

Binders thread a Context of in-scope variables: each entry records the variable's sort, its embedding, and the type term captured at binding. Compressing a binder constructor:

  1. Compresses each of the binder's binds arguments in the OUTER context.
  2. Synthesises a fresh variable embedding from each via the encoder's var_init function.
  3. Pushes the new entries onto Γ and recurses on the scoped arguments under the extended context.

Graph-shaped signatures are compressed by forward_graph: per-vertex-kind init seeds initial embeddings, finitely many message_passing rounds alternate per-edge-kind message and per-vertex-kind update functions, and a readout reduces the final per-vertex embeddings to one fixed-length vector.

Encoder

Encoder(name: str, signature: Signature, sort_dims: dict[str, int], op_fns: dict[str, _PerOpFn], var_init_fns: dict[tuple[str, str] | str, Callable[[Tensor | None], Tensor]], data_embedders: dict[str, Callable[[DataLeaf], Tensor]], modules_owned: list[Module] | None = None, iterations: int = 0, init_fns: dict[str, Callable[[DataLeaf], Tensor]] | None = None, message_fns: dict[str, Callable[..., Tensor]] | None = None, update_fns: dict[str, Callable[..., Tensor]] | None = None, readout: Callable[[list[Tensor]], Tensor] | None = None)

Bases: Module

An F-algebra homomorphism T_Σ -> Vec_D realised as an nn.Module.

Construction parameters

name : str Identifier used in diagnostics. signature : Signature The Σ whose terms this encoder consumes. sort_dims : dict[str, int] Per-sort embedding dimension. op_fns : dict[str, _PerOpFn] One entry per constructor and binder of Σ. The Compiler scaffolds defaults for omitted ops. var_init : callable (type_embedding) -> variable_embedding. Called at every binder to mint a fresh variable embedding for each introduced scope variable. data_embedders : dict[str, callable] One entry per data sort: (raw_value) -> embedding.

For graph-shaped signatures, additionally:

iterations : int Number of message-passing rounds in forward_graph. init_fns : dict[str, callable] Per-vertex-kind initial embedders. message_fns : dict[str, callable] Per-edge-kind (src_e, tgt_e) -> message functions. update_fns : dict[str, callable] Per-vertex-kind (self_e, aggregated_msg) -> next_e. readout : callable (list[Vec_D]) -> Vec_D reducer.

Source code in src/quivers/structural/encoder.py
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
def __init__(
    self,
    name: str,
    signature: Signature,
    sort_dims: dict[str, int],
    op_fns: dict[str, _PerOpFn],
    var_init_fns: dict[
        tuple[str, str] | str,
        Callable[[torch.Tensor | None], torch.Tensor],
    ],
    data_embedders: dict[str, Callable[[DataLeaf], torch.Tensor]],
    modules_owned: list[nn.Module] | None = None,
    iterations: int = 0,
    init_fns: dict[str, Callable[[DataLeaf], torch.Tensor]] | None = None,
    message_fns: dict[str, Callable[..., torch.Tensor]] | None = None,
    update_fns: dict[str, Callable[..., torch.Tensor]] | None = None,
    readout: Callable[[list[torch.Tensor]], torch.Tensor] | None = None,
) -> None:
    super().__init__()
    self.name = name
    self.signature = signature
    self.sort_dims = dict(sort_dims)
    self.op_fns = dict(op_fns)
    # Keyed by `(var_sort, annot_sort)` for annotated binders or
    # by `var_sort` (a plain string) for unannotated ones.
    self.var_init_fns: dict[
        tuple[str, str] | str,
        Callable[[torch.Tensor | None], torch.Tensor],
    ] = dict(var_init_fns)
    self.data_embedders = dict(data_embedders)
    self.iterations = iterations
    self.init_fns = dict(init_fns or {})
    self.message_fns = dict(message_fns or {})
    self.update_fns = dict(update_fns or {})
    self.readout = readout
    for i, m in enumerate(modules_owned or []):
        self.add_module(f"_op_{i}", m)

make_default_op_fn

make_default_op_fn(op: str, arg_dims: tuple[int, ...], out_dim: int) -> tuple[Module, Callable[..., Tensor]]

Build a parametric per-op function as a 2-layer MLP over concatenated child embeddings.

arg_dims is the tuple of per-argument dimensions in the order in which the children are passed to the function. For binders this is (annot_dim_1, …, annot_dim_k, scoped_dim_1, …, scoped_dim_m).

Source code in src/quivers/structural/encoder.py
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
def make_default_op_fn(
    op: str,
    arg_dims: tuple[int, ...],
    out_dim: int,
) -> tuple[nn.Module, Callable[..., torch.Tensor]]:
    """Build a parametric per-op function as a 2-layer MLP over
    concatenated child embeddings.

    ``arg_dims`` is the tuple of per-argument dimensions in the
    order in which the children are passed to the function. For
    binders this is ``(annot_dim_1, …, annot_dim_k, scoped_dim_1,
    …, scoped_dim_m)``.
    """
    arity = len(arg_dims)
    if arity == 0:
        param = nn.Parameter(torch.randn(out_dim) * 0.1)
        mod = nn.Module()
        mod.register_parameter(f"const_{op}", param)

        def call() -> torch.Tensor:
            return param

        return mod, call

    in_dim = sum(arg_dims)
    hidden = max(out_dim, 64)
    mlp = nn.Sequential(
        nn.Linear(in_dim, hidden),
        nn.Tanh(),
        nn.Linear(hidden, out_dim),
    )

    def call(*children: torch.Tensor) -> torch.Tensor:
        if len(children) != arity:
            raise RuntimeError(
                f"op {op!r}: expected {arity} children, got {len(children)}"
            )
        cat = torch.cat([c.reshape(-1) for c in children], dim=-1)
        return mlp(cat)

    return mlp, call

make_default_var_init

make_default_var_init(in_dim: int, out_dim: int) -> tuple[Module, Callable[[Tensor], Tensor]]

A parametric var_init(ty) function: an MLP mapping a type embedding to a fresh variable embedding.

Source code in src/quivers/structural/encoder.py
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
def make_default_var_init(
    in_dim: int, out_dim: int
) -> tuple[nn.Module, Callable[[torch.Tensor], torch.Tensor]]:
    """A parametric `var_init(ty)` function: an MLP mapping a type
    embedding to a fresh variable embedding."""
    mlp = nn.Sequential(
        nn.Linear(in_dim, max(out_dim, 64)),
        nn.Tanh(),
        nn.Linear(max(out_dim, 64), out_dim),
    )

    def call(ty: torch.Tensor) -> torch.Tensor:
        return mlp(ty.reshape(-1))

    return mlp, call