Bayesian Elman Network

Overview

The Elman network is a simple recurrent architecture that decomposes its cell into separate transformation and context-persistence morphisms composed with >>. This example demonstrates compositional morphism decomposition, where each morphism has a distinct role, and shows how scale parameters control initialization behavior.

QVR Source

object Token : 256
type Embedded = Euclidean 64
type Hidden = Euclidean 128
type Output = Euclidean 64

embed tok_embed : Token -> Embedded

continuous transition : Embedded * Hidden -> Hidden ~ Normal [scale=0.1]
continuous context_copy : Hidden -> Hidden ~ Normal [scale=0.01]
continuous output_proj : Hidden -> Output ~ Normal [scale=0.1]

let cell = transition >> context_copy
let elman = tok_embed >> scan(cell) >> output_proj

output elman

Walkthrough

Type Structure for Simple Recurrence

The Elman network uses the same type structure as the vanilla RNN: discrete tokens, a 64-dimensional embedding, a 128-dimensional hidden state, and a 64-dimensional output. No composite state vectors are needed.

Transition Morphism

continuous transition : Embedded * Hidden -> Hidden ~ Normal [scale=0.1] is the primary transformation. It takes the product of current embedding and previous hidden state and produces the next hidden state. All state evolution flows through this single morphism.

Context Copy Morphism

continuous context_copy : Hidden -> Hidden ~ Normal [scale=0.01] is an endomorphism (same source and target type) that refines the state after transition. Its small scale (0.01 vs. 0.1) initializes it near identity, so it initially passes most state information through with minimal modification. This separates the roles of transformation (transition) and state persistence (context_copy).

Cell Composition

let cell = transition >> context_copy composes the two into a single cell suitable for scanning. The cell is not a primitive morphism but a composite, making the information flow explicit: transition handles input-state transformation, context_copy handles state refinement.

Full Elman Network

let elman = tok_embed >> scan(cell) >> output_proj constructs the complete network. The scan combinator treats the composed cell as atomic at each time step, applying transition then context_copy internally.

DSL Features

Python Usage

Categorical Perspective

The Elman cell's explicit decomposition into \(\mathrm{transition} >> \mathrm{context\_copy}\) is a factorization of the cell morphism in the Kleisli category. The transition morphism maps \(\mathrm{Embedded} \times \mathrm{Hidden} \to \mathrm{Hidden}\), and context_copy is an endomorphism on \(\mathrm{Hidden}\). Endomorphisms on an object form a monoid under composition, meaning context_copy could be composed with itself or with other Hidden-to-Hidden morphisms without type errors.

The different scale parameters implement different priors: the small scale on context_copy concentrates probability mass near zero, so the morphism initially acts close to an identity. This makes the design intent explicit in the categorical structure rather than hiding it inside a single opaque cell. The tradeoff is that without gating, all information must survive repeated application of the same transition morphism, leading to the same gradient flow issues as the vanilla RNN.