bead.lists

Stage 4 of the bead pipeline: list partitioning with constraint satisfaction.

Core Classes

experiment_list

Experiment list data model for organizing experimental items.

The ExperimentList model uses stand-off annotation: it stores only item UUIDs, not full Item objects. Items are looked up by UUID against an ItemCollection or Repository.

ConstraintSatisfaction

Bases: BeadBaseModel

Whether a single constraint is satisfied for the list.

Attributes:

Name Type Description
constraint_id UUID

UUID of the constraint.

satisfied bool

Whether the constraint holds.

ExperimentList

Bases: BeadBaseModel

A list of experimental items selected for participant presentation.

Attributes:

Name Type Description
name str

List name (e.g. "list_0", "practice_list").

list_number int

Numeric identifier (>= 0).

item_refs tuple[UUID, ...]

UUIDs of the items in this list, in insertion order.

list_constraints tuple[ListConstraint, ...]

Constraints the list must satisfy.

constraint_satisfaction tuple[ConstraintSatisfaction, ...]

Per-constraint satisfaction records.

presentation_order tuple[UUID, ...] | None

Explicit presentation order; None falls back to item_refs.

list_metadata dict[str, MetadataValue]

Metadata for this list.

balance_metrics dict[str, MetadataValue]

Metrics about list balance.

with_item(item_id: UUID) -> Self

Return a new list with item_id appended.

without_item(item_id: UUID) -> Self

Return a new list with item_id removed.

Raises:

Type Description
ValueError

If item_id is not in the list.

with_shuffled_order(seed: int | None = None) -> Self

Return a new list whose presentation_order is a shuffle of items.

get_presentation_order() -> tuple[UUID, ...]

Return presentation_order if set, else item_refs.

validate_presentation_order(experiment_list: ExperimentList) -> None

Raise ValueError if presentation_order and item_refs disagree.

The order must be a permutation of the item refs (no missing, extra, or duplicated UUIDs).

list_collection

List collection data model.

The ListCollection model groups multiple ExperimentList instances together with metadata describing the partitioning process that produced them.

CoverageValidationResult

Bases: TypedDict

Outcome of a coverage check across a ListCollection.

ListCollection

Bases: BeadBaseModel

A collection of experimental lists with partitioning metadata.

Attributes:

Name Type Description
name str

Collection name.

source_items_id UUID

UUID of the source ItemCollection.

partitioning_strategy str

Strategy name (e.g. "balanced", "random").

lists tuple[ExperimentList, ...]

Member experiment lists.

partitioning_config dict[str, MetadataValue]

Configuration for the partitioning process.

partitioning_stats dict[str, MetadataValue]

Statistics from the partitioning process.

with_list(exp_list: ExperimentList) -> Self

Return a new collection with exp_list appended.

get_list_by_number(list_number: int) -> ExperimentList | None

Return the list with the matching list_number, or None.

get_all_item_refs() -> tuple[UUID, ...]

Return every distinct item UUID referenced across all lists.

validate_coverage(all_item_ids: set[UUID]) -> CoverageValidationResult

Check that every item in all_item_ids is assigned exactly once.

Returns a report with keys valid, missing_items, duplicate_items, and total_assigned.

to_jsonl(path: Path | str) -> None

Write each contained list as a JSONL line at path.

from_jsonl(path: Path | str, name: str = 'loaded_lists', source_items_id: UUID | None = None, partitioning_strategy: str = 'unknown') -> ListCollection classmethod

Build a collection from a JSONL file of experiment lists.

Constraints

constraints

Constraint models for experimental list composition.

List-level constraints govern composition of a single list (uniqueness, balance, quantile distribution, size, ordering, etc.). Batch-level constraints govern composition across a collection of lists (coverage, balance, diversity, minimum occurrence). Each family is a discriminated union rooted at ListConstraint / BatchConstraint; subclass construction takes the matching constraint_type value.

ListConstraint

Bases: BeadBaseModel, TaggedUnion

Discriminated union root for list-level constraints.

BatchConstraint

Bases: BeadBaseModel, TaggedUnion

Discriminated union root for batch-level constraints.

UniquenessConstraint

Bases: ListConstraint

No two items in a list share the same value of property_expression.

Attributes:

Name Type Description
property_expression str

DSL expression returning the value that must be unique across the list.

context dict[str, ContextValue]

Extra DSL evaluation context.

allow_null bool

Allow multiple items with a None value.

priority int

Higher values are weighted more heavily during partitioning.

ConditionalUniquenessConstraint

Bases: ListConstraint

Uniqueness applied only when condition_expression evaluates true.

Attributes:

Name Type Description
property_expression str

DSL expression returning the value that must be unique.

condition_expression str

DSL boolean expression gating constraint application.

context dict[str, ContextValue]

Extra DSL evaluation context.

allow_null bool

Allow multiple items with a None value.

priority int

Constraint priority.

BalanceConstraint

Bases: ListConstraint

Balanced distribution of a categorical property within a list.

Attributes:

Name Type Description
property_expression str

DSL expression returning the category value.

context dict[str, ContextValue]

Extra DSL evaluation context.

target_counts dict[str, int] | None

Target counts per category. None means equal distribution.

tolerance float

Allowed deviation from target as a proportion (0.0-1.0).

priority int

Constraint priority.

QuantileConstraint

Bases: ListConstraint

Uniform distribution of items across quantiles of a numeric property.

Attributes:

Name Type Description
property_expression str

DSL expression returning the numeric value to quantile.

context dict[str, ContextValue]

Extra DSL evaluation context.

n_quantiles int

Number of quantiles to create (>= 2).

items_per_quantile int

Target items per quantile (>= 1).

priority int

Constraint priority.

GroupedQuantileConstraint

Bases: ListConstraint

Quantile uniformity applied within groups defined by another expression.

Attributes:

Name Type Description
property_expression str

DSL expression returning the numeric value to quantile.

group_by_expression str

DSL expression returning the grouping key.

context dict[str, ContextValue]

Extra DSL evaluation context.

n_quantiles int

Quantiles per group.

items_per_quantile int

Target items per quantile per group.

priority int

Constraint priority.

DiversityConstraint

Bases: ListConstraint

Minimum number of unique values for a property within a list.

Attributes:

Name Type Description
property_expression str

DSL expression returning the value to count for diversity.

min_unique_values int

Minimum number of unique values required (>= 1).

context dict[str, ContextValue]

Extra DSL evaluation context.

priority int

Constraint priority.

SizeConstraint

Bases: ListConstraint

Size requirements for a list.

Specify exact_size, or min_size and/or max_size.

Attributes:

Name Type Description
min_size int | None

Minimum list size.

max_size int | None

Maximum list size.

exact_size int | None

Exact required size; mutually exclusive with min_size / max_size.

priority int

Constraint priority.

OrderingPair

Bases: BeadBaseModel

Precedence relation between two items in a list.

Attributes:

Name Type Description
before UUID

Item that must appear earlier in the list.

after UUID

Item that must appear later in the list.

OrderingConstraint

Bases: ListConstraint

Item presentation order requirements.

Enforced primarily at jsPsych runtime; the Python model stores the specification.

Attributes:

Name Type Description
precedence_pairs tuple[OrderingPair, ...]

Pairs (before, after) requiring before to precede after.

no_adjacent_property str | None

Property path; items sharing a value cannot be adjacent.

block_by_property str | None

Property path used to group items into contiguous blocks.

min_distance int | None

Minimum item separation between equal-property neighbours.

max_distance int | None

Maximum span between start and end of a property block.

practice_item_property str | None

Property identifying practice items, which precede main items.

randomize_within_blocks bool

Randomize order within property blocks.

priority int

Constraint priority (unused for static partitioning).

BatchCoverageConstraint

Bases: BatchConstraint

All values of property_expression appear somewhere in the batch.

Attributes:

Name Type Description
property_expression str

DSL expression returning the property value to cover.

context dict[str, ContextValue]

Extra DSL evaluation context.

target_values tuple[str | int | float, ...] | None

Values that must be covered. None uses every observed value.

min_coverage float

Minimum fraction of target values that must appear (0.0-1.0).

priority int

Constraint priority.

BatchBalanceConstraint

Bases: BatchConstraint

Balanced distribution of a categorical property across the entire batch.

Attributes:

Name Type Description
property_expression str

DSL expression returning the category value.

target_distribution dict[str, float]

Target proportions per category (values sum to ~1.0).

context dict[str, ContextValue]

Extra DSL evaluation context.

tolerance float

Allowed deviation from target.

priority int

Constraint priority.

BatchDiversityConstraint

Bases: BatchConstraint

No single value appears in too many lists.

Attributes:

Name Type Description
property_expression str

DSL expression returning the property value.

max_lists_per_value int

Maximum lists any value may appear in (>= 1).

context dict[str, ContextValue]

Extra DSL evaluation context.

priority int

Constraint priority.

BatchMinOccurrenceConstraint

Bases: BatchConstraint

Each value of property_expression appears at least min_occurrences times.

Attributes:

Name Type Description
property_expression str

DSL expression returning the property value.

min_occurrences int

Minimum total occurrences across all lists (>= 1).

context dict[str, ContextValue]

Extra DSL evaluation context.

priority int

Constraint priority.

Partitioning

partitioner

List partitioning for experimental item distribution.

This module provides the ListPartitioner class for partitioning items into balanced experimental lists. Implements three strategies: random, balanced, and stratified. Uses stand-off annotation (works with UUIDs only).

ListPartitioner

Partitions items into balanced experimental lists.

Uses stand-off annotation: only stores UUIDs, not full item objects. Requires item metadata dict for constraint checking and balancing.

Implements three partitioning strategies: - Random: Simple round-robin after shuffling - Balanced: Greedy algorithm to minimize constraint violations - Stratified: Quantile-based stratification with balanced distribution

Parameters:

Name Type Description Default
random_seed int | None

Random seed for reproducibility.

None

Attributes:

Name Type Description
random_seed int | None

Random seed for reproducibility.

Examples:

>>> from uuid import uuid4
>>> partitioner = ListPartitioner(random_seed=42)
>>> items = [uuid4() for _ in range(100)]
>>> metadata = {uid: {"property": i} for i, uid in enumerate(items)}
>>> lists = partitioner.partition(items, n_lists=5, metadata=metadata)
>>> len(lists)
5

partition(items: list[UUID], n_lists: int, constraints: list[ListConstraint] | None = None, strategy: str = 'balanced', metadata: MetadataDict | None = None) -> list[ExperimentList]

Partition items into lists.

Parameters:

Name Type Description Default
items list[UUID]

Item UUIDs to partition.

required
n_lists int

Number of lists to create.

required
constraints list[ListConstraint] | None

Constraints to satisfy.

None
strategy str

Partitioning strategy ("balanced", "random", "stratified").

"balanced"
metadata dict[UUID, dict[str, Any]] | None

Metadata for each item UUID. Required for constraint checking.

None

Returns:

Type Description
list[ExperimentList]

The partitioned lists.

Raises:

Type Description
ValueError

If strategy is unknown or n_lists < 1.

partition_with_batch_constraints(items: list[UUID], n_lists: int, list_constraints: list[ListConstraint] | None = None, batch_constraints: list[BatchConstraint] | None = None, strategy: str = 'balanced', metadata: MetadataDict | None = None, max_iterations: int = 1000, tolerance: float = 0.05) -> list[ExperimentList]

Partition items with batch-level constraints.

Creates initial partition using standard partitioning, then iteratively refines to satisfy batch constraints through item swaps between lists.

Parameters:

Name Type Description Default
items list[UUID]

Item UUIDs to partition.

required
n_lists int

Number of lists to create.

required
list_constraints list[ListConstraint] | None

Per-list constraints to satisfy.

None
batch_constraints list[BatchConstraint] | None

Batch-level constraints to satisfy.

None
strategy str

Initial partitioning strategy ("balanced", "random", "stratified").

"balanced"
metadata dict[UUID, dict[str, Any]] | None

Metadata for each item UUID.

None
max_iterations int

Maximum refinement iterations.

1000
tolerance float

Tolerance for batch constraint satisfaction (score >= 1.0 - tolerance).

0.05

Returns:

Type Description
list[ExperimentList]

Partitioned lists satisfying both list and batch constraints.

Examples:

>>> from bead.lists.constraints import BatchCoverageConstraint
>>> partitioner = ListPartitioner(random_seed=42)
>>> constraint = BatchCoverageConstraint(
...     property_expression="item['template_id']",
...     target_values=list(range(26)),
...     min_coverage=1.0
... )
>>> lists = partitioner.partition_with_batch_constraints(
...     items=item_uids,
...     n_lists=8,
...     batch_constraints=[constraint],
...     metadata=metadata_dict,
...     max_iterations=500
... )

stratification

Stratification utilities for quantile-based item assignment.

This module provides utilities for assigning items to quantile bins based on numeric properties, with optional stratification by grouping variables.

assign_quantiles(items: list[T], property_getter: Callable[[T], float], n_quantiles: int = 10, stratify_by: Callable[[T], Hashable] | None = None) -> dict[T, int]

Assign quantile bins to items based on numeric property.

Divides items into n_quantiles bins based on the distribution of a numeric property extracted via property_getter. Optionally stratifies by a grouping variable, computing separate quantiles for each group.

Parameters:

Name Type Description Default
items list[T]

List of items to assign to quantile bins.

required
property_getter Callable[[T], float]

Function that extracts a numeric value from each item. This value is used to compute quantiles.

required
n_quantiles int

Number of quantile bins (default: 10 for deciles). Must be >= 2.

10
stratify_by Callable[[T], Hashable] | None

Optional function that extracts a grouping variable from each item. If provided, quantiles are computed separately for each group. Groups must be hashable (str, int, UUID, tuple, etc.).

None

Returns:

Type Description
dict[T, int]

Dictionary mapping each item to its quantile bin (0 to n_quantiles-1).

Raises:

Type Description
ValueError

If n_quantiles < 2 or items list is empty.

Examples:

Basic usage with simple numeric values:

>>> items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> result = assign_quantiles(
...     items,
...     property_getter=lambda x: x,
...     n_quantiles=4
... )
>>> result[1]  # First item in lowest quartile
0
>>> result[10]  # Last item in highest quartile
3

With Item objects and stratification:

>>> from bead.items.item import Item
>>> from uuid import uuid4
>>> items = [
...     Item(item_template_id=uuid4(), item_metadata={"score": 10.5, "group": "A"}),
...     Item(item_template_id=uuid4(), item_metadata={"score": 5.2, "group": "A"}),
...     Item(item_template_id=uuid4(), item_metadata={"score": 8.1, "group": "B"}),
...     Item(item_template_id=uuid4(), item_metadata={"score": 3.3, "group": "B"}),
... ]
>>> result = assign_quantiles(
...     items,
...     property_getter=lambda x: x.item_metadata["score"],
...     n_quantiles=2,
...     stratify_by=lambda x: x.item_metadata["group"]
... )

With UUID keys (common pattern):

>>> from uuid import UUID
>>> item_uuids = [uuid4() for _ in range(100)]
>>> item_scores = {uid: float(i) for i, uid in enumerate(item_uuids)}
>>> result = assign_quantiles(
...     item_uuids,
...     property_getter=lambda uid: item_scores[uid],
...     n_quantiles=10
... )

assign_quantiles_by_uuid(item_ids: list[UUID], item_metadata: dict[UUID, dict[str, MetadataValue]], property_key: str, n_quantiles: int = 10, stratify_by_key: str | None = None) -> dict[UUID, int]

Assign quantile bins to items by UUID with metadata lookup.

Convenience function for the common pattern of working with UUIDs and metadata dictionaries (stand-off annotation pattern).

Parameters:

Name Type Description Default
item_ids list[UUID]

List of item UUIDs.

required
item_metadata dict[UUID, dict[str, MetadataValue]]

Metadata dictionary mapping UUIDs to their metadata dicts.

required
property_key str

Key in item_metadata[uuid] dict to use for quantile computation.

required
n_quantiles int

Number of quantile bins (default: 10).

10
stratify_by_key str | None

Optional key in metadata dict to use for stratification.

None

Returns:

Type Description
dict[UUID, int]

Dictionary mapping each UUID to its quantile bin (0 to n_quantiles-1).

Raises:

Type Description
ValueError

If property_key missing from any item's metadata.

KeyError

If any UUID not found in item_metadata.

Examples:

>>> from uuid import uuid4
>>> uuids = [uuid4() for _ in range(100)]
>>> metadata = {
...     uid: {"score": float(i), "group": "A" if i < 50 else "B"}
...     for i, uid in enumerate(uuids)
... }
>>> result = assign_quantiles_by_uuid(
...     uuids,
...     metadata,
...     property_key="score",
...     n_quantiles=4,
...     stratify_by_key="group"
... )

Balancing

balancer

Quantile balancing for experimental list partitioning.

This module provides the QuantileBalancer class for ensuring uniform distribution of items across quantiles of a numeric property. Uses NumPy for efficient quantile computation and maintains stand-off annotation pattern (works with UUIDs).

QuantileBalancer

Ensures uniform distribution of items across quantiles.

Used by stratified partitioning strategy to create balanced distribution of numeric properties (e.g., LM probabilities, word frequencies).

Works with UUIDs only (stand-off annotation). Requires value_func callable to extract numeric values from items via their UUIDs.

Parameters:

Name Type Description Default
n_quantiles int

Number of quantiles to create (must be >= 2).

5
random_seed int | None

Random seed for reproducibility. If None, uses non-deterministic RNG.

None

Attributes:

Name Type Description
n_quantiles int

Number of quantiles to create.

random_seed int | None

Random seed for reproducibility.

Examples:

>>> from uuid import uuid4
>>> import numpy as np
>>> balancer = QuantileBalancer(n_quantiles=5, random_seed=42)
>>> # Create items with known values
>>> items = [uuid4() for _ in range(100)]
>>> values = {item: float(i) for i, item in enumerate(items)}
>>> value_func = lambda uid: values[uid]
>>> # Balance across 4 lists, 5 items per quantile per list
>>> lists = balancer.balance(items, value_func, n_lists=4,
...                          items_per_quantile_per_list=5)
>>> len(lists)
4

balance(item_ids: list[UUID], value_func: Callable[[UUID], float], n_lists: int, items_per_quantile_per_list: int) -> list[list[UUID]]

Balance items across lists and quantiles.

Distributes items uniformly across quantiles and lists to ensure balanced representation of the numeric property across all lists.

Parameters:

Name Type Description Default
item_ids list[UUID]

UUIDs of items to balance.

required
value_func Callable[[UUID], float]

Function to extract numeric value from item UUID.

required
n_lists int

Number of lists to create.

required
items_per_quantile_per_list int

Target number of items per quantile per list.

required

Returns:

Type Description
list[list[UUID]]

Balanced lists of item UUIDs.

Raises:

Type Description
ValueError

If n_lists < 1 or items_per_quantile_per_list < 1.

Examples:

>>> from uuid import uuid4
>>> balancer = QuantileBalancer(n_quantiles=5, random_seed=42)
>>> items = [uuid4() for _ in range(100)]
>>> values = {item: float(i) for i, item in enumerate(items)}
>>> lists = balancer.balance(items, lambda uid: values[uid], 4, 5)
>>> all(len(lst) == 25 for lst in lists)  # 5 quantiles * 5 items
True
Notes
  • Items are assigned to quantiles using np.percentile and np.digitize
  • Within each quantile, items are shuffled before distribution
  • If insufficient items exist in a quantile, fewer items are assigned

compute_balance_score(item_ids: list[UUID], value_func: Callable[[UUID], float]) -> float

Compute balance score for items.

Score is 1.0 for perfect balance (uniform distribution across quantiles), lower for imbalanced distributions. Score is based on deviation from expected uniform distribution.

Parameters:

Name Type Description Default
item_ids list[UUID]

UUIDs of items to score.

required
value_func Callable[[UUID], float]

Function to extract numeric values.

required

Returns:

Type Description
float

Balance score (0.0-1.0, higher is better).

Examples:

>>> from uuid import uuid4
>>> balancer = QuantileBalancer(n_quantiles=5)
>>> # Uniformly distributed values
>>> items = [uuid4() for _ in range(100)]
>>> values = {item: float(i) for i, item in enumerate(items)}
>>> score = balancer.compute_balance_score(items, lambda uid: values[uid])
>>> score > 0.9  # Should be close to 1.0
True
Notes
  • Returns 0.0 for empty item lists
  • Uses mean absolute deviation from expected uniform count