Bayesian Bidirectional RNN

Overview

The bidirectional RNN processes a sequence in both forward and backward directions using independent recurrent paths, then combines their outputs. This example demonstrates the tensor product operator (@) for parallel composition of morphisms, showing how it differs from sequential composition with >>.

QVR Source

object Token : 256
type Embedded = Euclidean 64
type FwdHidden = Euclidean 64
type BwdHidden = Euclidean 64
type Combined = Euclidean 128
type Output = Euclidean 32

embed tok_embed : Token -> Embedded

continuous fwd_cell : Embedded * FwdHidden -> FwdHidden ~ Normal [scale=0.1]

let forward_path = tok_embed >> scan(fwd_cell)

continuous bwd_cell : Embedded * BwdHidden -> BwdHidden ~ Normal [scale=0.1]

let backward_path = tok_embed >> scan(bwd_cell)

continuous combine : Combined -> Output ~ Normal [scale=0.1]

let birnn = (forward_path @ backward_path) >> combine

output birnn

Walkthrough

Type Declarations for Bidirectional Processing

FwdHidden and BwdHidden are both 64-dimensional but are declared as distinct types so forward and backward hidden states cannot be accidentally mixed. Combined = Euclidean 128 is the concatenation of both directions (64 + 64).

Shared Token Embedding

embed tok_embed : Token -> Embedded is shared by both forward and backward paths, giving tokens consistent initial representations in both directions while keeping the directional computations independent.

Forward and Backward Paths

fwd_cell and bwd_cell have identical type signatures but independent parameters. Each is composed with the shared embedding and scanned: let forward_path = tok_embed >> scan(fwd_cell) processes tokens left-to-right, let backward_path = tok_embed >> scan(bwd_cell) processes tokens right-to-left (sequence reversal is handled at the data level, not in the DSL).

Tensor Product Composition

let birnn = (forward_path @ backward_path) >> combine uses the tensor product @ to run both paths in parallel on the same input. The @ operator applies both morphisms independently and pairs their outputs into a product type. The paired final hidden states (64 + 64 = 128 dimensions) are then passed to combine, which projects to 32-dimensional output.

The tensor product differs from >>: where >> threads data sequentially, @ applies morphisms in parallel without data dependency between them.

DSL Features

Python Usage

Categorical Perspective

The tensor product @ is the monoidal product on morphisms. For morphisms \(f : A \to B\) and \(g : A \to C\) sharing a source, \(f \otimes g : A \to B \times C\) applies both independently and pairs the results. This captures computational parallelism: the two paths never interact until their outputs are combined, and each maintains its own state space and parameters.

The combine morphism then acts as a projection from the product space \(\mathrm{FwdHidden} \times \mathrm{BwdHidden}\) into the output space. The monoidal structure is associative, so \((f \otimes g) \otimes h = f \otimes (g \otimes h)\), and more than two paths can be composed in parallel. The bidirectional architecture addresses the limitation that a unidirectional RNN at position \(t\) has no access to context from positions after \(t\); the tensor product makes the independence and parallelism of the two directional passes explicit in the categorical structure.