Core Concepts

This section introduces the fundamental design principles and architecture underlying bead.

Stand-off Annotation

Bead uses stand-off annotation to minimize data duplication and maintain provenance tracking throughout the experimental pipeline.

Objects reference each other by UUID rather than embedding full copies. An experimental list stores item UUIDs, not the items themselves. Items store filled template UUIDs, not the template content. This pattern creates a single source of truth for each piece of data.

Example: when you modify a template, all items referencing that template automatically reflect the change (if you regenerate them). You don't need to update copies scattered across different files.

The stand-off approach enables:

When resolving references, pass metadata dictionaries mapping UUIDs to their data:

# Items stored as UUIDs
item_uuids = [uuid1, uuid2, uuid3]

# Metadata for constraint evaluation
item_metadata = {
    uuid1: {"verb": "put", "frame": "transitive"},
    uuid2: {"verb": "place", "frame": "transitive"},
    uuid3: {"verb": "drop", "frame": "transitive"},
}

# Partitioner receives both
partitioner.partition_with_batch_constraints(items=item_uuids, metadata=item_metadata)

BeadBaseModel

All bead data structures inherit from BeadBaseModel, which provides three standard fields:

Additionally, every model includes a metadata dictionary for arbitrary key-value pairs. Metadata flows through the entire pipeline: add a field to a lexical item, and it appears in filled templates, items, and experimental lists.

Example metadata usage:

{"id": "uuid", "lemma": "run", "pos": "V", "metadata": {"frequency": 1000, "source": "verbnet"}}

The frequency field propagates through filling, item construction, and partitioning, enabling constraint-based experimental design.

6-Stage Pipeline

Bead implements a linear pipeline with six stages. Data flows forward through these stages, with each stage consuming output from the previous one.

Stage 1: Resources

Create lexical items and templates with optional constraints.

Output: lexicons/*.jsonl, templates/*.jsonl

Stage 2: Templates

Fill template slots with lexical items using various strategies.

Output: filled_templates/*.jsonl

Stage 3: Items

Convert filled templates into experimental items with task-specific structure.

Output: items/*.jsonl

Stage 4: Lists

Partition items into experimental lists satisfying constraints.

Output: lists/*.jsonl

Stage 5: Deployment

Generate jsPsych 8.x experiments for JATOS deployment.

Output: experiment/ directory with HTML/JavaScript/CSS

Stage 6: Training

Train models on collected data, optionally using active learning.

Output: models/ directory with trained weights and config

Task Types vs Judgment Types

Bead distinguishes between task types (UI presentation) and judgment types (underlying measurement).

Task types define the interface participants see:

Judgment types describe the measurement goal:

The same judgment type may use different task types depending on experimental goals. Acceptability can use ordinal scales (rate sentence naturalness) or forced choice (which sentence is more natural).

Annotation Protocols

Above the task / judgment distinction sits a separate type-theoretic layer for what a question measures and how it is phrased. The bead.protocol package factors annotation design into four roles:

QuestionFamily packages an anchor with a realization strategy and a drift guard; AnnotationProtocol sequences families into the iterated dependent product Sigma(a_1 : Q_1(ctx)). Sigma(a_2 : Q_2(ctx, a_1)). ..., threading each response into the context so later questions can condition on earlier answers. See the protocols user guide for the full walkthrough.

Configuration-First Design

Bead orchestrates the entire pipeline from a single YAML configuration file. The config specifies paths, strategies, constraints, and parameters for all six stages.

Benefits:

The CLI commands read config files to set default parameters, reducing typing and ensuring consistency across pipeline stages.

Language-Agnostic Principles

Bead works with any language supported by linguistic resources like UniMorph.

Language-specific information lives in:

The constraint system uses language-neutral DSL expressions, allowing the same constraint logic across languages. Template slot patterns remain language-independent.

When adapting experiments to new languages:

  1. Create lexicons with appropriate morphological features
  2. Generate templates matching the language's syntax
  3. Use UniMorph for inflection (if available)
  4. Adjust trial timing for language reading speed

The core pipeline stages remain identical across languages.

Next Steps

Now that you understand bead's architecture, explore specific pipeline stages:

For complete command reference, see the API documentation.