Skip to content

bead.lists

Stage 4 of the bead pipeline: list partitioning with constraint satisfaction.

Core Classes

experiment_list

Experiment list data model for organizing experimental items.

This module provides the ExperimentList model for organizing experimental items into lists for presentation to participants. Lists use stand-off annotation with UUID references to items rather than embedding full item objects.

The model supports: - Item assignment tracking via UUIDs - Presentation order specification - Constraint satisfaction tracking - Balance metrics computation

ExperimentList

Bases: BeadBaseModel

A list of experimental items for participant presentation.

Uses stand-off annotation - stores only item UUIDs, not full items. Items can be looked up by UUID from an ItemCollection or Repository.

Attributes:

Name Type Description
name str

Name of this list (e.g., "list_0", "practice_list").

list_number int

Numeric identifier for this list (must be >= 0).

item_refs list[UUID]

UUIDs of items in this list (stand-off annotation).

list_constraints list[ListConstraint]

Constraints this list must satisfy.

constraint_satisfaction dict[UUID, bool]

Map of constraint UUIDs to satisfaction status.

presentation_order list[UUID] | None

Explicit presentation order (if None, use item_refs order). Must contain exactly the same UUIDs as item_refs.

list_metadata dict[str, Any]

Metadata for this list.

balance_metrics dict[str, Any]

Metrics about list balance (e.g., distribution statistics).

Examples:

>>> from uuid import uuid4
>>> exp_list = ExperimentList(
...     name="list_0",
...     list_number=0
... )
>>> item_id = uuid4()
>>> exp_list.add_item(item_id)
>>> len(exp_list.item_refs)
1
>>> exp_list.shuffle_order(seed=42)
>>> exp_list.get_presentation_order()[0] == item_id
True

validate_name(v: str) -> str classmethod

Validate name is non-empty.

Parameters:

Name Type Description Default
v str

Name to validate.

required

Returns:

Type Description
str

Validated name (whitespace stripped).

Raises:

Type Description
ValueError

If name is empty or contains only whitespace.

validate_presentation_order() -> ExperimentList

Validate presentation_order matches item_refs.

If presentation_order is set, it must contain exactly the same UUIDs as item_refs (no more, no less, no duplicates).

Returns:

Type Description
ExperimentList

Validated list.

Raises:

Type Description
ValueError

If presentation_order doesn't match item_refs.

add_item(item_id: UUID) -> None

Add an item to this list.

Parameters:

Name Type Description Default
item_id UUID

UUID of item to add.

required

Examples:

>>> from uuid import uuid4
>>> exp_list = ExperimentList(name="test", list_number=0)
>>> item_id = uuid4()
>>> exp_list.add_item(item_id)
>>> item_id in exp_list.item_refs
True

remove_item(item_id: UUID) -> None

Remove an item from this list.

Parameters:

Name Type Description Default
item_id UUID

UUID of item to remove.

required

Raises:

Type Description
ValueError

If item_id is not in the list.

Examples:

>>> from uuid import uuid4
>>> exp_list = ExperimentList(name="test", list_number=0)
>>> item_id = uuid4()
>>> exp_list.add_item(item_id)
>>> exp_list.remove_item(item_id)
>>> item_id in exp_list.item_refs
False

shuffle_order(seed: int | None = None) -> None

Shuffle presentation order.

Creates a randomized presentation order from item_refs. Uses random.Random(seed) for reproducible shuffling.

Parameters:

Name Type Description Default
seed int | None

Random seed for reproducibility.

None

Examples:

>>> from uuid import uuid4
>>> exp_list = ExperimentList(name="test", list_number=0)
>>> exp_list.add_item(uuid4())
>>> exp_list.add_item(uuid4())
>>> exp_list.shuffle_order(seed=42)
>>> exp_list.presentation_order is not None
True

get_presentation_order() -> list[UUID]

Get the presentation order.

Returns presentation_order if set, otherwise returns item_refs.

Returns:

Type Description
list[UUID]

UUIDs in presentation order.

Examples:

>>> from uuid import uuid4
>>> exp_list = ExperimentList(name="test", list_number=0)
>>> item_id = uuid4()
>>> exp_list.add_item(item_id)
>>> exp_list.get_presentation_order()[0] == item_id
True

list_collection

List collection data model for managing multiple experimental lists.

This module provides the ListCollection model for managing multiple ExperimentList instances along with metadata about the partitioning process that created them.

The model supports: - Multiple experimental lists - Partitioning metadata tracking - Coverage validation (ensuring all items are assigned exactly once) - List lookup by number - JSONL serialization (one list per line)

CoverageValidationResult

Bases: TypedDict

Result of coverage validation.

ListCollection

Bases: BeadBaseModel

A collection of experimental lists.

Contains multiple ExperimentList instances along with metadata about the partitioning process that created them.

Attributes:

Name Type Description
name str

Name of this collection.

source_items_id UUID

UUID of source ItemCollection.

lists list[ExperimentList]

The experimental lists.

partitioning_strategy str

Strategy used for partitioning (e.g., "balanced", "random", "stratified").

partitioning_config dict[str, Any]

Configuration for partitioning.

partitioning_stats dict[str, Any]

Statistics about the partitioning process.

Examples:

>>> from uuid import uuid4
>>> collection = ListCollection(
...     name="my_lists",
...     source_items_id=uuid4(),
...     partitioning_strategy="balanced"
... )
>>> exp_list = ExperimentList(name="list_0", list_number=0)
>>> collection.add_list(exp_list)
>>> len(collection.lists)
1

validate_non_empty_string(v: str) -> str classmethod

Validate string fields are non-empty.

Parameters:

Name Type Description Default
v str

String to validate.

required

Returns:

Type Description
str

Validated string (whitespace stripped).

Raises:

Type Description
ValueError

If string is empty or contains only whitespace.

validate_unique_list_numbers(v: list[ExperimentList]) -> list[ExperimentList] classmethod

Validate all list_numbers are unique.

Parameters:

Name Type Description Default
v list[ExperimentList]

Lists to validate.

required

Returns:

Type Description
list[ExperimentList]

Validated lists.

Raises:

Type Description
ValueError

If duplicate list_numbers found.

add_list(exp_list: ExperimentList) -> None

Add a list to the collection.

Parameters:

Name Type Description Default
exp_list ExperimentList

List to add.

required

Examples:

>>> from uuid import uuid4
>>> collection = ListCollection(
...     name="test",
...     source_items_id=uuid4(),
...     partitioning_strategy="balanced"
... )
>>> exp_list = ExperimentList(name="list_0", list_number=0)
>>> collection.add_list(exp_list)
>>> len(collection.lists)
1

get_list_by_number(list_number: int) -> ExperimentList | None

Get a list by its number.

Parameters:

Name Type Description Default
list_number int

List number to search for.

required

Returns:

Type Description
ExperimentList | None

List with matching number, or None if not found.

Examples:

>>> from uuid import uuid4
>>> collection = ListCollection(
...     name="test",
...     source_items_id=uuid4(),
...     partitioning_strategy="balanced"
... )
>>> exp_list = ExperimentList(name="list_0", list_number=0)
>>> collection.add_list(exp_list)
>>> found = collection.get_list_by_number(0)
>>> found is not None
True

get_all_item_refs() -> list[UUID]

Return all unique item UUIDs across all lists.

Returns:

Type Description
list[UUID]

All unique item UUIDs.

Examples:

>>> from uuid import uuid4
>>> collection = ListCollection(
...     name="test",
...     source_items_id=uuid4(),
...     partitioning_strategy="balanced"
... )
>>> exp_list = ExperimentList(name="list_0", list_number=0)
>>> item_id = uuid4()
>>> exp_list.add_item(item_id)
>>> collection.add_list(exp_list)
>>> item_id in collection.get_all_item_refs()
True

validate_coverage(all_item_ids: set[UUID]) -> CoverageValidationResult

Check that all items are assigned exactly once.

Validates that: - All items in all_item_ids are assigned to at least one list - No item appears in multiple lists (items assigned exactly once)

Parameters:

Name Type Description Default
all_item_ids set[UUID]

Set of all item UUIDs that should be assigned.

required

Returns:

Type Description
CoverageValidationResult

Validation report with keys: - "valid": bool - Whether validation passed - "missing_items": list[UUID] - Items not assigned to any list - "duplicate_items": list[UUID] - Items assigned to multiple lists - "total_assigned": int - Total assignments across all lists

Examples:

>>> from uuid import uuid4
>>> collection = ListCollection(
...     name="test",
...     source_items_id=uuid4(),
...     partitioning_strategy="balanced"
... )
>>> item_id = uuid4()
>>> exp_list = ExperimentList(name="list_0", list_number=0)
>>> exp_list.add_item(item_id)
>>> collection.add_list(exp_list)
>>> result = collection.validate_coverage({item_id})
>>> result["valid"]
True

to_jsonl(path: Path | str) -> None

Write lists to a JSONL file (one list per line).

Parameters:

Name Type Description Default
path Path | str

Path to output JSONL file.

required

Examples:

>>> from uuid import uuid4
>>> collection = ListCollection(
...     name="test",
...     source_items_id=uuid4(),
...     partitioning_strategy="balanced"
... )
>>> exp_list = ExperimentList(name="list_0", list_number=0)
>>> collection.add_list(exp_list)
>>> collection.to_jsonl("lists.jsonl")

from_jsonl(path: Path | str, name: str = 'loaded_lists', source_items_id: UUID | None = None, partitioning_strategy: str = 'unknown') -> ListCollection classmethod

Load lists from a JSONL file (one list per line).

Parameters:

Name Type Description Default
path Path | str

Path to JSONL file containing experiment lists.

required
name str

Name for the collection (default: "loaded_lists").

'loaded_lists'
source_items_id UUID | None

Source items UUID. If None, uses a nil UUID.

None
partitioning_strategy str

Strategy name (default: "unknown").

'unknown'

Returns:

Type Description
ListCollection

Collection containing the loaded lists.

Examples:

>>> collection = ListCollection.from_jsonl("lists.jsonl")

Constraints

constraints

Constraint models for experimental list composition.

This module defines constraints that can be applied to experimental lists to ensure balanced, well-distributed item selections. Constraints can specify: - Uniqueness: No duplicate property values - Balance: Balanced distribution across categories - Quantile: Uniform distribution across quantiles - Size: List size requirements - Ordering: Item presentation order constraints (runtime enforcement)

All constraints inherit from BeadBaseModel and use Pydantic discriminated unions for type-safe deserialization.

UniquenessConstraint

Bases: BeadBaseModel

Constraint requiring unique values for a property.

Ensures that no two items in a list have the same value for the specified property. Useful for preventing duplicate target verbs, sentence structures, or other experimental materials.

Attributes:

Name Type Description
constraint_type Literal['uniqueness']

Discriminator field for constraint type (always "uniqueness").

property_expression str

DSL expression that extracts the value that must be unique. The item is available as 'item' in the expression. Examples: "item.metadata.target_verb", "item.templates.sentence.text"

context dict[str, ContextValue]

Additional context variables for DSL evaluation.

allow_null bool, default=False

Whether to allow null/None values. If False, None values count as duplicates. If True, multiple None values are allowed.

priority int, default=1

Constraint priority (higher = more important). When partitioning, violations of higher-priority constraints are penalized more heavily.

Examples:

>>> # No two items with same target verb (high priority)
>>> constraint = UniquenessConstraint(
...     property_expression="item.metadata.target_verb",
...     allow_null=False,
...     priority=5
... )
>>> constraint.priority
5

validate_property_expression(v: str) -> str classmethod

Validate property expression is non-empty.

Parameters:

Name Type Description Default
v str

Property expression to validate.

required

Returns:

Type Description
str

Validated property expression.

Raises:

Type Description
ValueError

If property expression is empty or contains only whitespace.

BalanceConstraint

Bases: BeadBaseModel

Constraint requiring balanced distribution.

Ensures balanced distribution of a categorical property across items in a list. Can specify target counts for each category or request equal distribution.

Attributes:

Name Type Description
constraint_type Literal['balance']

Discriminator field for constraint type (always "balance").

property_expression str

DSL expression that extracts the category value to balance. The item is available as 'item' in the expression. Example: "item.metadata.transitivity"

context dict[str, ContextValue]

Additional context variables for DSL evaluation.

target_counts dict[str, int] | None, default=None

Target counts for each category value. If None, equal distribution is assumed. Keys are category values, values are target counts.

tolerance float, default=0.1

Allowed deviation from target as a proportion (0.0-1.0). For example, 0.1 means up to 10% deviation is acceptable.

priority int, default=1

Constraint priority (higher = more important). When partitioning, violations of higher-priority constraints are penalized more heavily.

Examples:

>>> # Equal number of transitive and intransitive verbs
>>> constraint = BalanceConstraint(
...     property_expression="item.metadata.transitivity",
...     tolerance=0.1
... )
>>> # 2:1 ratio with high priority
>>> constraint2 = BalanceConstraint(
...     property_expression="item.metadata.grammatical",
...     target_counts={"true": 20, "false": 10},
...     tolerance=0.05,
...     priority=3
... )

validate_property_expression(v: str) -> str classmethod

Validate property expression is non-empty.

Parameters:

Name Type Description Default
v str

Property expression to validate.

required

Returns:

Type Description
str

Validated property expression.

Raises:

Type Description
ValueError

If property expression is empty or contains only whitespace.

validate_target_counts(v: dict[str, int] | None) -> dict[str, int] | None classmethod

Validate target counts are non-negative.

Parameters:

Name Type Description Default
v dict[str, int] | None

Target counts to validate.

required

Returns:

Type Description
dict[str, int] | None

Validated target counts.

Raises:

Type Description
ValueError

If any count is negative.

QuantileConstraint

Bases: BeadBaseModel

Constraint requiring uniform distribution across quantiles.

Ensures uniform distribution of items across quantiles of a numeric property. Useful for balancing language model probabilities, word frequencies, or other continuous variables. Supports complex DSL expressions for computing derived metrics.

Attributes:

Name Type Description
constraint_type Literal['quantile']

Discriminator field for constraint type (always "quantile").

property_expression str

DSL expression that computes the numeric value to quantile. The item is available as 'item' in the expression. Can be simple (e.g., "item.metadata.lm_prob") or complex (e.g., "variance([item['val1'], item['val2'], item['val3']])")

context dict[str, ContextValue]

Additional context variables for DSL evaluation. Example: {"hyp_keys": ["hyp1", "hyp2", "hyp3"]}

n_quantiles int, default=5

Number of quantiles to create (must be >= 2).

items_per_quantile int, default=2

Target number of items per quantile (must be >= 1).

priority int, default=1

Constraint priority (higher = more important). When partitioning, violations of higher-priority constraints are penalized more heavily.

Examples:

>>> # Uniform distribution of LM probabilities across 5 quantiles
>>> constraint = QuantileConstraint(
...     property_expression="item.metadata.lm_prob",
...     n_quantiles=5,
...     items_per_quantile=2
... )
>>> # Variance of precomputed NLI scores
>>> constraint2 = QuantileConstraint(
...     property_expression="item['nli_variance']",
...     n_quantiles=5,
...     items_per_quantile=2
... )

validate_property_expression(v: str) -> str classmethod

Validate property expression is non-empty.

Parameters:

Name Type Description Default
v str

Property expression to validate.

required

Returns:

Type Description
str

Validated property expression.

Raises:

Type Description
ValueError

If property expression is empty or contains only whitespace.

GroupedQuantileConstraint

Bases: BeadBaseModel

Constraint requiring uniform quantile distribution within groups.

Ensures uniform distribution across quantiles of a numeric property within each group defined by a grouping property. Useful for balancing a continuous variable independently within categorical groups.

Attributes:

Name Type Description
constraint_type Literal['grouped_quantile']

Discriminator field for constraint type (always "grouped_quantile").

property_expression str

DSL expression that computes the numeric value to quantile. The item is available as 'item' in the expression. Example: "item.metadata.lm_prob"

group_by_expression str

DSL expression that computes the grouping key. The item is available as 'item' in the expression. Example: "item.metadata.condition"

context dict[str, ContextValue]

Additional context variables for DSL evaluation.

n_quantiles int, default=5

Number of quantiles to create per group (must be >= 2).

items_per_quantile int, default=2

Target number of items per quantile per group (must be >= 1).

priority int, default=1

Constraint priority (higher = more important). When partitioning, violations of higher-priority constraints are penalized more heavily.

Examples:

>>> # Balance LM probability quantiles within each condition
>>> constraint = GroupedQuantileConstraint(
...     property_expression="item.metadata.lm_prob",
...     group_by_expression="item.metadata.condition",
...     n_quantiles=5,
...     items_per_quantile=2
... )
>>> # Balance embedding similarity IQR within semantic categories
>>> constraint2 = GroupedQuantileConstraint(
...     property_expression="item['embedding_iqr']",
...     group_by_expression="item['semantic_category']",
...     n_quantiles=4,
...     items_per_quantile=3
... )

validate_expression(v: str) -> str classmethod

Validate expression is non-empty.

Parameters:

Name Type Description Default
v str

Expression to validate.

required

Returns:

Type Description
str

Validated expression.

Raises:

Type Description
ValueError

If expression is empty or contains only whitespace.

ConditionalUniquenessConstraint

Bases: BeadBaseModel

Constraint requiring uniqueness when a condition is met.

Ensures that values are unique only when a boolean condition is satisfied. Useful for enforcing uniqueness on a subset of items while allowing duplicates in others.

Attributes:

Name Type Description
constraint_type Literal['conditional_uniqueness']

Discriminator field for constraint type (always "conditional_uniqueness").

property_expression str

DSL expression that computes the value that must be unique. The item is available as 'item' in the expression. Example: "item.metadata.target_word"

condition_expression str

DSL boolean expression that determines if constraint applies. The item is available as 'item' in the expression. Example: "item.metadata.is_critical == True"

context dict[str, ContextValue]

Additional context variables for DSL evaluation.

allow_null bool, default=False

Whether to allow multiple null values when condition is true.

priority int, default=1

Constraint priority (higher = more important). When partitioning, violations of higher-priority constraints are penalized more heavily.

Examples:

>>> # Unique target words only for critical items
>>> constraint = ConditionalUniquenessConstraint(
...     property_expression="item.metadata.target_word",
...     condition_expression="item.metadata.is_critical == True",
...     allow_null=False,
...     priority=3
... )
>>> # Unique sentences only when grammaticality is tested
>>> constraint2 = ConditionalUniquenessConstraint(
...     property_expression="item.templates.sentence.text",
...     condition_expression="item.metadata.test_type in test_grammaticality",
...     context={"test_grammaticality": {"gram", "acceptability"}},
...     allow_null=True
... )

validate_expression(v: str) -> str classmethod

Validate expression is non-empty.

Parameters:

Name Type Description Default
v str

Expression to validate.

required

Returns:

Type Description
str

Validated expression.

Raises:

Type Description
ValueError

If expression is empty or contains only whitespace.

DiversityConstraint

Bases: BeadBaseModel

Constraint requiring minimum diversity (unique values) for a property.

Ensures that a list contains at least a minimum number of unique values for a specified property. Useful for ensuring template diversity, verb diversity, or other experimental richness requirements.

Attributes:

Name Type Description
constraint_type Literal['diversity']

Discriminator field for constraint type (always "diversity").

property_expression str

DSL expression that extracts the value to count for diversity. The item is available as 'item' in the expression. Examples: "item.metadata.template_id", "item.metadata.verb_lemma"

min_unique_values int

Minimum number of unique values required in the list.

context dict[str, ContextValue]

Additional context variables for DSL evaluation.

priority int, default=1

Constraint priority (higher = more important). When partitioning, violations of higher-priority constraints are penalized more heavily.

Examples:

>>> # Ensure at least 15 unique templates per list
>>> constraint = DiversityConstraint(
...     property_expression="item.metadata.template_id",
...     min_unique_values=15,
...     priority=2
... )
>>> constraint.min_unique_values
15

validate_property_expression(v: str) -> str classmethod

Validate property expression is non-empty.

Parameters:

Name Type Description Default
v str

Property expression to validate.

required

Returns:

Type Description
str

Validated property expression.

Raises:

Type Description
ValueError

If property expression is empty or contains only whitespace.

SizeConstraint

Bases: BeadBaseModel

Constraint on list size.

Specifies size requirements for a list. Can specify exact size, minimum size, maximum size, or a range (min and max).

Often used with high priority to ensure participants do equal work.

Attributes:

Name Type Description
constraint_type Literal['size']

Discriminator field for constraint type (always "size").

min_size int | None, default=None

Minimum list size (must be >= 0 if set).

max_size int | None, default=None

Maximum list size (must be >= 0 if set).

exact_size int | None, default=None

Exact required size (must be >= 0 if set). Cannot be used with min_size or max_size.

priority int, default=1

Constraint priority (higher = more important). When partitioning, violations of higher-priority constraints are penalized more heavily. Size constraints often use high priority (e.g., 10) to ensure participants do exactly equal amounts of work.

Examples:

>>> # Exactly 40 items per list (highest priority)
>>> constraint = SizeConstraint(exact_size=40, priority=10)
>>> # Between 30-50 items per list
>>> constraint2 = SizeConstraint(min_size=30, max_size=50)
>>> # At least 20 items
>>> constraint3 = SizeConstraint(min_size=20)
>>> # At most 100 items
>>> constraint4 = SizeConstraint(max_size=100)

validate_size_params() -> SizeConstraint

Validate size parameter combinations.

Ensures that: - At least one size parameter is set - exact_size is not used with min_size or max_size - min_size <= max_size if both are set

Returns:

Type Description
SizeConstraint

Validated constraint.

Raises:

Type Description
ValueError

If validation fails.

OrderingConstraint

Bases: BeadBaseModel

Constraint on item presentation order.

CRITICAL: This constraint is primarily enforced at jsPsych runtime, not during static list construction. The Python data model stores the constraint specification, which is then translated to JavaScript code for runtime enforcement during per-participant randomization.

Attributes:

Name Type Description
constraint_type Literal['ordering']

Discriminator for constraint type.

precedence_pairs list[tuple[UUID, UUID]]

Pairs of (item_a_id, item_b_id) where item_a must appear before item_b.

no_adjacent_property str | None

Property path; items with same value cannot be adjacent. Example: "item_metadata.condition" prevents AA, BB patterns.

block_by_property str | None

Property path to group items into contiguous blocks. Example: "item_metadata.block_type" creates blocked design.

min_distance int | None

Minimum number of items between items with same no_adjacent_property value.

max_distance int | None

Maximum number of items between start and end of items with same block_by_property value (enforces tight blocking).

practice_item_property str | None

Property path identifying practice items (should appear first). Example: "item_metadata.is_practice" with value True.

randomize_within_blocks bool

Whether to randomize order within blocks (default True). Only applies when block_by_property is set.

Examples:

>>> # No adjacent items with same condition
>>> constraint = OrderingConstraint(
...     no_adjacent_property="item_metadata.condition"
... )
>>> # Practice items first, then main items
>>> constraint = OrderingConstraint(
...     practice_item_property="item_metadata.is_practice"
... )
>>> # Blocked by condition, randomized within blocks
>>> constraint = OrderingConstraint(
...     block_by_property="item_metadata.condition",
...     randomize_within_blocks=True
... )
>>> # Item A before Item B
>>> from uuid import uuid4
>>> item_a, item_b = uuid4(), uuid4()
>>> constraint = OrderingConstraint(
...     precedence_pairs=[(item_a, item_b)]
... )

validate_distance_constraints() -> OrderingConstraint

Validate distance constraint combinations.

Returns:

Type Description
OrderingConstraint

Validated constraint.

Raises:

Type Description
ValueError

If validation fails.

BatchCoverageConstraint

Bases: BeadBaseModel

Constraint ensuring all values appear somewhere in the batch.

Ensures that all values of a property appear across the collection of lists. Useful for guaranteeing coverage of experimental conditions, templates, or stimulus categories across all participants.

Attributes:

Name Type Description
constraint_type Literal['coverage']

Discriminator field for constraint type (always "coverage").

property_expression str

DSL expression that extracts the property value to check coverage. The item is available as 'item' in the expression (metadata dict). Example: "item['template_id']"

context dict[str, ContextValue]

Additional context variables for DSL evaluation.

target_values list[str | int | float] | None

Target values that must be covered. If None, uses all observed values.

min_coverage float, default=1.0

Minimum coverage fraction (0.0-1.0). 1.0 means 100% of target values must appear.

priority int, default=1

Constraint priority (higher = more important).

Examples:

>>> # Ensure all 26 templates appear across all lists
>>> constraint = BatchCoverageConstraint(
...     property_expression="item['template_id']",
...     target_values=list(range(26)),
...     min_coverage=1.0
... )
>>> # Ensure at least 90% of verbs are covered
>>> constraint = BatchCoverageConstraint(
...     property_expression="item['verb_lemma']",
...     target_values=["run", "jump", "eat", "sleep", "think"],
...     min_coverage=0.9
... )

validate_property_expression(v: str) -> str classmethod

Validate property expression is non-empty.

BatchBalanceConstraint

Bases: BeadBaseModel

Constraint ensuring balanced distribution across the entire batch.

Ensures balanced distribution of a categorical property across all lists combined. Unlike per-list balance constraints, this operates on the aggregate distribution across the entire batch.

Attributes:

Name Type Description
constraint_type Literal['balance']

Discriminator field for constraint type (always "balance").

property_expression str

DSL expression that extracts the category value to balance. Example: "item['pair_type']"

context dict[str, ContextValue]

Additional context variables for DSL evaluation.

target_distribution dict[str, float]

Target distribution (values sum to 1.0). Keys are category values, values are target proportions.

tolerance float, default=0.1

Allowed deviation from target as a proportion (0.0-1.0).

priority int, default=1

Constraint priority (higher = more important).

Examples:

>>> # Ensure 50/50 balance of pair types across all lists
>>> constraint = BatchBalanceConstraint(
...     property_expression="item['pair_type']",
...     target_distribution={"same_verb": 0.5, "different_verb": 0.5},
...     tolerance=0.05
... )
>>> # Three-way split across conditions
>>> constraint = BatchBalanceConstraint(
...     property_expression="item['condition']",
...     target_distribution={"A": 0.333, "B": 0.333, "C": 0.334},
...     tolerance=0.1
... )

validate_property_expression(v: str) -> str classmethod

Validate property expression is non-empty.

validate_target_distribution(v: dict[str, float]) -> dict[str, float] classmethod

Validate target distribution sums to ~1.0 and values are in [0, 1].

BatchDiversityConstraint

Bases: BeadBaseModel

Constraint preventing values from appearing in too many lists.

Ensures that no single value of a property appears in too many lists, promoting diversity across lists. Useful for ensuring that stimuli (e.g., verbs, nouns) are distributed across participants rather than concentrated in a few lists.

Attributes:

Name Type Description
constraint_type Literal['diversity']

Discriminator field for constraint type (always "diversity").

property_expression str

DSL expression that extracts the property value to check diversity. Example: "item['verb_lemma']"

context dict[str, ContextValue]

Additional context variables for DSL evaluation.

max_lists_per_value int

Maximum number of lists any value can appear in.

priority int, default=1

Constraint priority (higher = more important).

Examples:

>>> # No verb should appear in more than 3 out of 8 lists
>>> constraint = BatchDiversityConstraint(
...     property_expression="item['verb_lemma']",
...     max_lists_per_value=3
... )
>>> # No template in more than half the lists
>>> constraint = BatchDiversityConstraint(
...     property_expression="item['template_id']",
...     max_lists_per_value=4
... )

validate_property_expression(v: str) -> str classmethod

Validate property expression is non-empty.

BatchMinOccurrenceConstraint

Bases: BeadBaseModel

Constraint ensuring minimum representation across the batch.

Ensures that each value of a property appears at least a minimum number of times across all lists. Useful for guaranteeing sufficient data for each experimental condition or stimulus category.

Attributes:

Name Type Description
constraint_type Literal['min_occurrence']

Discriminator field for constraint type (always "min_occurrence").

property_expression str

DSL expression that extracts the property value to check occurrences. Example: "item['quantile']"

context dict[str, ContextValue]

Additional context variables for DSL evaluation.

min_occurrences int

Minimum number of times each value must appear across all lists.

priority int, default=1

Constraint priority (higher = more important).

Examples:

>>> # Each quantile appears at least 50 times across all lists
>>> constraint = BatchMinOccurrenceConstraint(
...     property_expression="item['quantile']",
...     min_occurrences=50
... )
>>> # Each template at least 5 times
>>> constraint = BatchMinOccurrenceConstraint(
...     property_expression="item['template_id']",
...     min_occurrences=5
... )

validate_property_expression(v: str) -> str classmethod

Validate property expression is non-empty.

Partitioning

partitioner

List partitioning for experimental item distribution.

This module provides the ListPartitioner class for partitioning items into balanced experimental lists. Implements three strategies: random, balanced, and stratified. Uses stand-off annotation (works with UUIDs only).

ListPartitioner

Partitions items into balanced experimental lists.

Uses stand-off annotation: only stores UUIDs, not full item objects. Requires item metadata dict for constraint checking and balancing.

Implements three partitioning strategies: - Random: Simple round-robin after shuffling - Balanced: Greedy algorithm to minimize constraint violations - Stratified: Quantile-based stratification with balanced distribution

Parameters:

Name Type Description Default
random_seed int | None

Random seed for reproducibility.

None

Attributes:

Name Type Description
random_seed int | None

Random seed for reproducibility.

Examples:

>>> from uuid import uuid4
>>> partitioner = ListPartitioner(random_seed=42)
>>> items = [uuid4() for _ in range(100)]
>>> metadata = {uid: {"property": i} for i, uid in enumerate(items)}
>>> lists = partitioner.partition(items, n_lists=5, metadata=metadata)
>>> len(lists)
5

partition(items: list[UUID], n_lists: int, constraints: list[ListConstraint] | None = None, strategy: str = 'balanced', metadata: MetadataDict | None = None) -> list[ExperimentList]

Partition items into lists.

Parameters:

Name Type Description Default
items list[UUID]

Item UUIDs to partition.

required
n_lists int

Number of lists to create.

required
constraints list[ListConstraint] | None

Constraints to satisfy.

None
strategy str

Partitioning strategy ("balanced", "random", "stratified").

"balanced"
metadata dict[UUID, dict[str, Any]] | None

Metadata for each item UUID. Required for constraint checking.

None

Returns:

Type Description
list[ExperimentList]

The partitioned lists.

Raises:

Type Description
ValueError

If strategy is unknown or n_lists < 1.

partition_with_batch_constraints(items: list[UUID], n_lists: int, list_constraints: list[ListConstraint] | None = None, batch_constraints: list[BatchConstraint] | None = None, strategy: str = 'balanced', metadata: MetadataDict | None = None, max_iterations: int = 1000, tolerance: float = 0.05) -> list[ExperimentList]

Partition items with batch-level constraints.

Creates initial partition using standard partitioning, then iteratively refines to satisfy batch constraints through item swaps between lists.

Parameters:

Name Type Description Default
items list[UUID]

Item UUIDs to partition.

required
n_lists int

Number of lists to create.

required
list_constraints list[ListConstraint] | None

Per-list constraints to satisfy.

None
batch_constraints list[BatchConstraint] | None

Batch-level constraints to satisfy.

None
strategy str

Initial partitioning strategy ("balanced", "random", "stratified").

"balanced"
metadata dict[UUID, dict[str, Any]] | None

Metadata for each item UUID.

None
max_iterations int

Maximum refinement iterations.

1000
tolerance float

Tolerance for batch constraint satisfaction (score >= 1.0 - tolerance).

0.05

Returns:

Type Description
list[ExperimentList]

Partitioned lists satisfying both list and batch constraints.

Examples:

>>> from bead.lists.constraints import BatchCoverageConstraint
>>> partitioner = ListPartitioner(random_seed=42)
>>> constraint = BatchCoverageConstraint(
...     property_expression="item['template_id']",
...     target_values=list(range(26)),
...     min_coverage=1.0
... )
>>> lists = partitioner.partition_with_batch_constraints(
...     items=item_uids,
...     n_lists=8,
...     batch_constraints=[constraint],
...     metadata=metadata_dict,
...     max_iterations=500
... )

stratification

Stratification utilities for quantile-based item assignment.

This module provides utilities for assigning items to quantile bins based on numeric properties, with optional stratification by grouping variables.

assign_quantiles(items: list[T], property_getter: Callable[[T], float], n_quantiles: int = 10, stratify_by: Callable[[T], Hashable] | None = None) -> dict[T, int]

Assign quantile bins to items based on numeric property.

Divides items into n_quantiles bins based on the distribution of a numeric property extracted via property_getter. Optionally stratifies by a grouping variable, computing separate quantiles for each group.

Parameters:

Name Type Description Default
items list[T]

List of items to assign to quantile bins.

required
property_getter Callable[[T], float]

Function that extracts a numeric value from each item. This value is used to compute quantiles.

required
n_quantiles int

Number of quantile bins (default: 10 for deciles). Must be >= 2.

10
stratify_by Callable[[T], Hashable] | None

Optional function that extracts a grouping variable from each item. If provided, quantiles are computed separately for each group. Groups must be hashable (str, int, UUID, tuple, etc.).

None

Returns:

Type Description
dict[T, int]

Dictionary mapping each item to its quantile bin (0 to n_quantiles-1).

Raises:

Type Description
ValueError

If n_quantiles < 2 or items list is empty.

Examples:

Basic usage with simple numeric values:

>>> items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> result = assign_quantiles(
...     items,
...     property_getter=lambda x: x,
...     n_quantiles=4
... )
>>> result[1]  # First item in lowest quartile
0
>>> result[10]  # Last item in highest quartile
3

With Item objects and stratification:

>>> from bead.items.item import Item
>>> from uuid import uuid4
>>> items = [
...     Item(item_template_id=uuid4(), item_metadata={"score": 10.5, "group": "A"}),
...     Item(item_template_id=uuid4(), item_metadata={"score": 5.2, "group": "A"}),
...     Item(item_template_id=uuid4(), item_metadata={"score": 8.1, "group": "B"}),
...     Item(item_template_id=uuid4(), item_metadata={"score": 3.3, "group": "B"}),
... ]
>>> result = assign_quantiles(
...     items,
...     property_getter=lambda x: x.item_metadata["score"],
...     n_quantiles=2,
...     stratify_by=lambda x: x.item_metadata["group"]
... )

With UUID keys (common pattern):

>>> from uuid import UUID
>>> item_uuids = [uuid4() for _ in range(100)]
>>> item_scores = {uid: float(i) for i, uid in enumerate(item_uuids)}
>>> result = assign_quantiles(
...     item_uuids,
...     property_getter=lambda uid: item_scores[uid],
...     n_quantiles=10
... )

assign_quantiles_by_uuid(item_ids: list[UUID], item_metadata: dict[UUID, dict[str, MetadataValue]], property_key: str, n_quantiles: int = 10, stratify_by_key: str | None = None) -> dict[UUID, int]

Assign quantile bins to items by UUID with metadata lookup.

Convenience function for the common pattern of working with UUIDs and metadata dictionaries (stand-off annotation pattern).

Parameters:

Name Type Description Default
item_ids list[UUID]

List of item UUIDs.

required
item_metadata dict[UUID, dict[str, MetadataValue]]

Metadata dictionary mapping UUIDs to their metadata dicts.

required
property_key str

Key in item_metadata[uuid] dict to use for quantile computation.

required
n_quantiles int

Number of quantile bins (default: 10).

10
stratify_by_key str | None

Optional key in metadata dict to use for stratification.

None

Returns:

Type Description
dict[UUID, int]

Dictionary mapping each UUID to its quantile bin (0 to n_quantiles-1).

Raises:

Type Description
ValueError

If property_key missing from any item's metadata.

KeyError

If any UUID not found in item_metadata.

Examples:

>>> from uuid import uuid4
>>> uuids = [uuid4() for _ in range(100)]
>>> metadata = {
...     uid: {"score": float(i), "group": "A" if i < 50 else "B"}
...     for i, uid in enumerate(uuids)
... }
>>> result = assign_quantiles_by_uuid(
...     uuids,
...     metadata,
...     property_key="score",
...     n_quantiles=4,
...     stratify_by_key="group"
... )

Balancing

balancer

Quantile balancing for experimental list partitioning.

This module provides the QuantileBalancer class for ensuring uniform distribution of items across quantiles of a numeric property. Uses NumPy for efficient quantile computation and maintains stand-off annotation pattern (works with UUIDs).

QuantileBalancer

Ensures uniform distribution of items across quantiles.

Used by stratified partitioning strategy to create balanced distribution of numeric properties (e.g., LM probabilities, word frequencies).

Works with UUIDs only (stand-off annotation). Requires value_func callable to extract numeric values from items via their UUIDs.

Parameters:

Name Type Description Default
n_quantiles int

Number of quantiles to create (must be >= 2).

5
random_seed int | None

Random seed for reproducibility. If None, uses non-deterministic RNG.

None

Attributes:

Name Type Description
n_quantiles int

Number of quantiles to create.

random_seed int | None

Random seed for reproducibility.

Examples:

>>> from uuid import uuid4
>>> import numpy as np
>>> balancer = QuantileBalancer(n_quantiles=5, random_seed=42)
>>> # Create items with known values
>>> items = [uuid4() for _ in range(100)]
>>> values = {item: float(i) for i, item in enumerate(items)}
>>> value_func = lambda uid: values[uid]
>>> # Balance across 4 lists, 5 items per quantile per list
>>> lists = balancer.balance(items, value_func, n_lists=4,
...                          items_per_quantile_per_list=5)
>>> len(lists)
4

balance(item_ids: list[UUID], value_func: Callable[[UUID], float], n_lists: int, items_per_quantile_per_list: int) -> list[list[UUID]]

Balance items across lists and quantiles.

Distributes items uniformly across quantiles and lists to ensure balanced representation of the numeric property across all lists.

Parameters:

Name Type Description Default
item_ids list[UUID]

UUIDs of items to balance.

required
value_func Callable[[UUID], float]

Function to extract numeric value from item UUID.

required
n_lists int

Number of lists to create.

required
items_per_quantile_per_list int

Target number of items per quantile per list.

required

Returns:

Type Description
list[list[UUID]]

Balanced lists of item UUIDs.

Raises:

Type Description
ValueError

If n_lists < 1 or items_per_quantile_per_list < 1.

Examples:

>>> from uuid import uuid4
>>> balancer = QuantileBalancer(n_quantiles=5, random_seed=42)
>>> items = [uuid4() for _ in range(100)]
>>> values = {item: float(i) for i, item in enumerate(items)}
>>> lists = balancer.balance(items, lambda uid: values[uid], 4, 5)
>>> all(len(lst) == 25 for lst in lists)  # 5 quantiles * 5 items
True
Notes
  • Items are assigned to quantiles using np.percentile and np.digitize
  • Within each quantile, items are shuffled before distribution
  • If insufficient items exist in a quantile, fewer items are assigned

compute_balance_score(item_ids: list[UUID], value_func: Callable[[UUID], float]) -> float

Compute balance score for items.

Score is 1.0 for perfect balance (uniform distribution across quantiles), lower for imbalanced distributions. Score is based on deviation from expected uniform distribution.

Parameters:

Name Type Description Default
item_ids list[UUID]

UUIDs of items to score.

required
value_func Callable[[UUID], float]

Function to extract numeric values.

required

Returns:

Type Description
float

Balance score (0.0-1.0, higher is better).

Examples:

>>> from uuid import uuid4
>>> balancer = QuantileBalancer(n_quantiles=5)
>>> # Uniformly distributed values
>>> items = [uuid4() for _ in range(100)]
>>> values = {item: float(i) for i, item in enumerate(items)}
>>> score = balancer.compute_balance_score(items, lambda uid: values[uid])
>>> score > 0.9  # Should be close to 1.0
True
Notes
  • Returns 0.0 for empty item lists
  • Uses mean absolute deviation from expected uniform count