bead.lists¶
Stage 4 of the bead pipeline: list partitioning with constraint satisfaction.
Core Classes¶
experiment_list
¶
Experiment list data model for organizing experimental items.
This module provides the ExperimentList model for organizing experimental items into lists for presentation to participants. Lists use stand-off annotation with UUID references to items rather than embedding full item objects.
The model supports: - Item assignment tracking via UUIDs - Presentation order specification - Constraint satisfaction tracking - Balance metrics computation
ExperimentList
¶
Bases: BeadBaseModel
A list of experimental items for participant presentation.
Uses stand-off annotation - stores only item UUIDs, not full items. Items can be looked up by UUID from an ItemCollection or Repository.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Name of this list (e.g., "list_0", "practice_list"). |
list_number |
int
|
Numeric identifier for this list (must be >= 0). |
item_refs |
list[UUID]
|
UUIDs of items in this list (stand-off annotation). |
list_constraints |
list[ListConstraint]
|
Constraints this list must satisfy. |
constraint_satisfaction |
dict[UUID, bool]
|
Map of constraint UUIDs to satisfaction status. |
presentation_order |
list[UUID] | None
|
Explicit presentation order (if None, use item_refs order). Must contain exactly the same UUIDs as item_refs. |
list_metadata |
dict[str, Any]
|
Metadata for this list. |
balance_metrics |
dict[str, Any]
|
Metrics about list balance (e.g., distribution statistics). |
Examples:
>>> from uuid import uuid4
>>> exp_list = ExperimentList(
... name="list_0",
... list_number=0
... )
>>> item_id = uuid4()
>>> exp_list.add_item(item_id)
>>> len(exp_list.item_refs)
1
>>> exp_list.shuffle_order(seed=42)
>>> exp_list.get_presentation_order()[0] == item_id
True
validate_name(v: str) -> str
classmethod
¶
Validate name is non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
Name to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Validated name (whitespace stripped). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If name is empty or contains only whitespace. |
validate_presentation_order() -> ExperimentList
¶
Validate presentation_order matches item_refs.
If presentation_order is set, it must contain exactly the same UUIDs as item_refs (no more, no less, no duplicates).
Returns:
| Type | Description |
|---|---|
ExperimentList
|
Validated list. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If presentation_order doesn't match item_refs. |
add_item(item_id: UUID) -> None
¶
remove_item(item_id: UUID) -> None
¶
Remove an item from this list.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item_id
|
UUID
|
UUID of item to remove. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If item_id is not in the list. |
Examples:
shuffle_order(seed: int | None = None) -> None
¶
Shuffle presentation order.
Creates a randomized presentation order from item_refs. Uses random.Random(seed) for reproducible shuffling.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seed
|
int | None
|
Random seed for reproducibility. |
None
|
Examples:
get_presentation_order() -> list[UUID]
¶
Get the presentation order.
Returns presentation_order if set, otherwise returns item_refs.
Returns:
| Type | Description |
|---|---|
list[UUID]
|
UUIDs in presentation order. |
Examples:
list_collection
¶
List collection data model for managing multiple experimental lists.
This module provides the ListCollection model for managing multiple ExperimentList instances along with metadata about the partitioning process that created them.
The model supports: - Multiple experimental lists - Partitioning metadata tracking - Coverage validation (ensuring all items are assigned exactly once) - List lookup by number - JSONL serialization (one list per line)
CoverageValidationResult
¶
Bases: TypedDict
Result of coverage validation.
ListCollection
¶
Bases: BeadBaseModel
A collection of experimental lists.
Contains multiple ExperimentList instances along with metadata about the partitioning process that created them.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Name of this collection. |
source_items_id |
UUID
|
UUID of source ItemCollection. |
lists |
list[ExperimentList]
|
The experimental lists. |
partitioning_strategy |
str
|
Strategy used for partitioning (e.g., "balanced", "random", "stratified"). |
partitioning_config |
dict[str, Any]
|
Configuration for partitioning. |
partitioning_stats |
dict[str, Any]
|
Statistics about the partitioning process. |
Examples:
>>> from uuid import uuid4
>>> collection = ListCollection(
... name="my_lists",
... source_items_id=uuid4(),
... partitioning_strategy="balanced"
... )
>>> exp_list = ExperimentList(name="list_0", list_number=0)
>>> collection.add_list(exp_list)
>>> len(collection.lists)
1
validate_non_empty_string(v: str) -> str
classmethod
¶
Validate string fields are non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
String to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Validated string (whitespace stripped). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If string is empty or contains only whitespace. |
validate_unique_list_numbers(v: list[ExperimentList]) -> list[ExperimentList]
classmethod
¶
Validate all list_numbers are unique.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
list[ExperimentList]
|
Lists to validate. |
required |
Returns:
| Type | Description |
|---|---|
list[ExperimentList]
|
Validated lists. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If duplicate list_numbers found. |
add_list(exp_list: ExperimentList) -> None
¶
Add a list to the collection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
exp_list
|
ExperimentList
|
List to add. |
required |
Examples:
get_list_by_number(list_number: int) -> ExperimentList | None
¶
Get a list by its number.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
list_number
|
int
|
List number to search for. |
required |
Returns:
| Type | Description |
|---|---|
ExperimentList | None
|
List with matching number, or None if not found. |
Examples:
>>> from uuid import uuid4
>>> collection = ListCollection(
... name="test",
... source_items_id=uuid4(),
... partitioning_strategy="balanced"
... )
>>> exp_list = ExperimentList(name="list_0", list_number=0)
>>> collection.add_list(exp_list)
>>> found = collection.get_list_by_number(0)
>>> found is not None
True
get_all_item_refs() -> list[UUID]
¶
Return all unique item UUIDs across all lists.
Returns:
| Type | Description |
|---|---|
list[UUID]
|
All unique item UUIDs. |
Examples:
>>> from uuid import uuid4
>>> collection = ListCollection(
... name="test",
... source_items_id=uuid4(),
... partitioning_strategy="balanced"
... )
>>> exp_list = ExperimentList(name="list_0", list_number=0)
>>> item_id = uuid4()
>>> exp_list.add_item(item_id)
>>> collection.add_list(exp_list)
>>> item_id in collection.get_all_item_refs()
True
validate_coverage(all_item_ids: set[UUID]) -> CoverageValidationResult
¶
Check that all items are assigned exactly once.
Validates that: - All items in all_item_ids are assigned to at least one list - No item appears in multiple lists (items assigned exactly once)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
all_item_ids
|
set[UUID]
|
Set of all item UUIDs that should be assigned. |
required |
Returns:
| Type | Description |
|---|---|
CoverageValidationResult
|
Validation report with keys: - "valid": bool - Whether validation passed - "missing_items": list[UUID] - Items not assigned to any list - "duplicate_items": list[UUID] - Items assigned to multiple lists - "total_assigned": int - Total assignments across all lists |
Examples:
>>> from uuid import uuid4
>>> collection = ListCollection(
... name="test",
... source_items_id=uuid4(),
... partitioning_strategy="balanced"
... )
>>> item_id = uuid4()
>>> exp_list = ExperimentList(name="list_0", list_number=0)
>>> exp_list.add_item(item_id)
>>> collection.add_list(exp_list)
>>> result = collection.validate_coverage({item_id})
>>> result["valid"]
True
to_jsonl(path: Path | str) -> None
¶
Write lists to a JSONL file (one list per line).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str
|
Path to output JSONL file. |
required |
Examples:
from_jsonl(path: Path | str, name: str = 'loaded_lists', source_items_id: UUID | None = None, partitioning_strategy: str = 'unknown') -> ListCollection
classmethod
¶
Load lists from a JSONL file (one list per line).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str
|
Path to JSONL file containing experiment lists. |
required |
name
|
str
|
Name for the collection (default: "loaded_lists"). |
'loaded_lists'
|
source_items_id
|
UUID | None
|
Source items UUID. If None, uses a nil UUID. |
None
|
partitioning_strategy
|
str
|
Strategy name (default: "unknown"). |
'unknown'
|
Returns:
| Type | Description |
|---|---|
ListCollection
|
Collection containing the loaded lists. |
Examples:
Constraints¶
constraints
¶
Constraint models for experimental list composition.
This module defines constraints that can be applied to experimental lists to ensure balanced, well-distributed item selections. Constraints can specify: - Uniqueness: No duplicate property values - Balance: Balanced distribution across categories - Quantile: Uniform distribution across quantiles - Size: List size requirements - Ordering: Item presentation order constraints (runtime enforcement)
All constraints inherit from BeadBaseModel and use Pydantic discriminated unions for type-safe deserialization.
UniquenessConstraint
¶
Bases: BeadBaseModel
Constraint requiring unique values for a property.
Ensures that no two items in a list have the same value for the specified property. Useful for preventing duplicate target verbs, sentence structures, or other experimental materials.
Attributes:
| Name | Type | Description |
|---|---|---|
constraint_type |
Literal['uniqueness']
|
Discriminator field for constraint type (always "uniqueness"). |
property_expression |
str
|
DSL expression that extracts the value that must be unique. The item is available as 'item' in the expression. Examples: "item.metadata.target_verb", "item.templates.sentence.text" |
context |
dict[str, ContextValue]
|
Additional context variables for DSL evaluation. |
allow_null |
bool, default=False
|
Whether to allow null/None values. If False, None values count as duplicates. If True, multiple None values are allowed. |
priority |
int, default=1
|
Constraint priority (higher = more important). When partitioning, violations of higher-priority constraints are penalized more heavily. |
Examples:
>>> # No two items with same target verb (high priority)
>>> constraint = UniquenessConstraint(
... property_expression="item.metadata.target_verb",
... allow_null=False,
... priority=5
... )
>>> constraint.priority
5
validate_property_expression(v: str) -> str
classmethod
¶
Validate property expression is non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
Property expression to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Validated property expression. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If property expression is empty or contains only whitespace. |
BalanceConstraint
¶
Bases: BeadBaseModel
Constraint requiring balanced distribution.
Ensures balanced distribution of a categorical property across items in a list. Can specify target counts for each category or request equal distribution.
Attributes:
| Name | Type | Description |
|---|---|---|
constraint_type |
Literal['balance']
|
Discriminator field for constraint type (always "balance"). |
property_expression |
str
|
DSL expression that extracts the category value to balance. The item is available as 'item' in the expression. Example: "item.metadata.transitivity" |
context |
dict[str, ContextValue]
|
Additional context variables for DSL evaluation. |
target_counts |
dict[str, int] | None, default=None
|
Target counts for each category value. If None, equal distribution is assumed. Keys are category values, values are target counts. |
tolerance |
float, default=0.1
|
Allowed deviation from target as a proportion (0.0-1.0). For example, 0.1 means up to 10% deviation is acceptable. |
priority |
int, default=1
|
Constraint priority (higher = more important). When partitioning, violations of higher-priority constraints are penalized more heavily. |
Examples:
>>> # Equal number of transitive and intransitive verbs
>>> constraint = BalanceConstraint(
... property_expression="item.metadata.transitivity",
... tolerance=0.1
... )
>>> # 2:1 ratio with high priority
>>> constraint2 = BalanceConstraint(
... property_expression="item.metadata.grammatical",
... target_counts={"true": 20, "false": 10},
... tolerance=0.05,
... priority=3
... )
validate_property_expression(v: str) -> str
classmethod
¶
Validate property expression is non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
Property expression to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Validated property expression. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If property expression is empty or contains only whitespace. |
validate_target_counts(v: dict[str, int] | None) -> dict[str, int] | None
classmethod
¶
Validate target counts are non-negative.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
dict[str, int] | None
|
Target counts to validate. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, int] | None
|
Validated target counts. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If any count is negative. |
QuantileConstraint
¶
Bases: BeadBaseModel
Constraint requiring uniform distribution across quantiles.
Ensures uniform distribution of items across quantiles of a numeric property. Useful for balancing language model probabilities, word frequencies, or other continuous variables. Supports complex DSL expressions for computing derived metrics.
Attributes:
| Name | Type | Description |
|---|---|---|
constraint_type |
Literal['quantile']
|
Discriminator field for constraint type (always "quantile"). |
property_expression |
str
|
DSL expression that computes the numeric value to quantile. The item is available as 'item' in the expression. Can be simple (e.g., "item.metadata.lm_prob") or complex (e.g., "variance([item['val1'], item['val2'], item['val3']])") |
context |
dict[str, ContextValue]
|
Additional context variables for DSL evaluation. Example: {"hyp_keys": ["hyp1", "hyp2", "hyp3"]} |
n_quantiles |
int, default=5
|
Number of quantiles to create (must be >= 2). |
items_per_quantile |
int, default=2
|
Target number of items per quantile (must be >= 1). |
priority |
int, default=1
|
Constraint priority (higher = more important). When partitioning, violations of higher-priority constraints are penalized more heavily. |
Examples:
>>> # Uniform distribution of LM probabilities across 5 quantiles
>>> constraint = QuantileConstraint(
... property_expression="item.metadata.lm_prob",
... n_quantiles=5,
... items_per_quantile=2
... )
>>> # Variance of precomputed NLI scores
>>> constraint2 = QuantileConstraint(
... property_expression="item['nli_variance']",
... n_quantiles=5,
... items_per_quantile=2
... )
validate_property_expression(v: str) -> str
classmethod
¶
Validate property expression is non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
Property expression to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Validated property expression. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If property expression is empty or contains only whitespace. |
GroupedQuantileConstraint
¶
Bases: BeadBaseModel
Constraint requiring uniform quantile distribution within groups.
Ensures uniform distribution across quantiles of a numeric property within each group defined by a grouping property. Useful for balancing a continuous variable independently within categorical groups.
Attributes:
| Name | Type | Description |
|---|---|---|
constraint_type |
Literal['grouped_quantile']
|
Discriminator field for constraint type (always "grouped_quantile"). |
property_expression |
str
|
DSL expression that computes the numeric value to quantile. The item is available as 'item' in the expression. Example: "item.metadata.lm_prob" |
group_by_expression |
str
|
DSL expression that computes the grouping key. The item is available as 'item' in the expression. Example: "item.metadata.condition" |
context |
dict[str, ContextValue]
|
Additional context variables for DSL evaluation. |
n_quantiles |
int, default=5
|
Number of quantiles to create per group (must be >= 2). |
items_per_quantile |
int, default=2
|
Target number of items per quantile per group (must be >= 1). |
priority |
int, default=1
|
Constraint priority (higher = more important). When partitioning, violations of higher-priority constraints are penalized more heavily. |
Examples:
>>> # Balance LM probability quantiles within each condition
>>> constraint = GroupedQuantileConstraint(
... property_expression="item.metadata.lm_prob",
... group_by_expression="item.metadata.condition",
... n_quantiles=5,
... items_per_quantile=2
... )
>>> # Balance embedding similarity IQR within semantic categories
>>> constraint2 = GroupedQuantileConstraint(
... property_expression="item['embedding_iqr']",
... group_by_expression="item['semantic_category']",
... n_quantiles=4,
... items_per_quantile=3
... )
validate_expression(v: str) -> str
classmethod
¶
Validate expression is non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
Expression to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Validated expression. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If expression is empty or contains only whitespace. |
ConditionalUniquenessConstraint
¶
Bases: BeadBaseModel
Constraint requiring uniqueness when a condition is met.
Ensures that values are unique only when a boolean condition is satisfied. Useful for enforcing uniqueness on a subset of items while allowing duplicates in others.
Attributes:
| Name | Type | Description |
|---|---|---|
constraint_type |
Literal['conditional_uniqueness']
|
Discriminator field for constraint type (always "conditional_uniqueness"). |
property_expression |
str
|
DSL expression that computes the value that must be unique. The item is available as 'item' in the expression. Example: "item.metadata.target_word" |
condition_expression |
str
|
DSL boolean expression that determines if constraint applies. The item is available as 'item' in the expression. Example: "item.metadata.is_critical == True" |
context |
dict[str, ContextValue]
|
Additional context variables for DSL evaluation. |
allow_null |
bool, default=False
|
Whether to allow multiple null values when condition is true. |
priority |
int, default=1
|
Constraint priority (higher = more important). When partitioning, violations of higher-priority constraints are penalized more heavily. |
Examples:
>>> # Unique target words only for critical items
>>> constraint = ConditionalUniquenessConstraint(
... property_expression="item.metadata.target_word",
... condition_expression="item.metadata.is_critical == True",
... allow_null=False,
... priority=3
... )
>>> # Unique sentences only when grammaticality is tested
>>> constraint2 = ConditionalUniquenessConstraint(
... property_expression="item.templates.sentence.text",
... condition_expression="item.metadata.test_type in test_grammaticality",
... context={"test_grammaticality": {"gram", "acceptability"}},
... allow_null=True
... )
validate_expression(v: str) -> str
classmethod
¶
Validate expression is non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
Expression to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Validated expression. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If expression is empty or contains only whitespace. |
DiversityConstraint
¶
Bases: BeadBaseModel
Constraint requiring minimum diversity (unique values) for a property.
Ensures that a list contains at least a minimum number of unique values for a specified property. Useful for ensuring template diversity, verb diversity, or other experimental richness requirements.
Attributes:
| Name | Type | Description |
|---|---|---|
constraint_type |
Literal['diversity']
|
Discriminator field for constraint type (always "diversity"). |
property_expression |
str
|
DSL expression that extracts the value to count for diversity. The item is available as 'item' in the expression. Examples: "item.metadata.template_id", "item.metadata.verb_lemma" |
min_unique_values |
int
|
Minimum number of unique values required in the list. |
context |
dict[str, ContextValue]
|
Additional context variables for DSL evaluation. |
priority |
int, default=1
|
Constraint priority (higher = more important). When partitioning, violations of higher-priority constraints are penalized more heavily. |
Examples:
>>> # Ensure at least 15 unique templates per list
>>> constraint = DiversityConstraint(
... property_expression="item.metadata.template_id",
... min_unique_values=15,
... priority=2
... )
>>> constraint.min_unique_values
15
validate_property_expression(v: str) -> str
classmethod
¶
Validate property expression is non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
Property expression to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Validated property expression. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If property expression is empty or contains only whitespace. |
SizeConstraint
¶
Bases: BeadBaseModel
Constraint on list size.
Specifies size requirements for a list. Can specify exact size, minimum size, maximum size, or a range (min and max).
Often used with high priority to ensure participants do equal work.
Attributes:
| Name | Type | Description |
|---|---|---|
constraint_type |
Literal['size']
|
Discriminator field for constraint type (always "size"). |
min_size |
int | None, default=None
|
Minimum list size (must be >= 0 if set). |
max_size |
int | None, default=None
|
Maximum list size (must be >= 0 if set). |
exact_size |
int | None, default=None
|
Exact required size (must be >= 0 if set). Cannot be used with min_size or max_size. |
priority |
int, default=1
|
Constraint priority (higher = more important). When partitioning, violations of higher-priority constraints are penalized more heavily. Size constraints often use high priority (e.g., 10) to ensure participants do exactly equal amounts of work. |
Examples:
>>> # Exactly 40 items per list (highest priority)
>>> constraint = SizeConstraint(exact_size=40, priority=10)
>>> # Between 30-50 items per list
>>> constraint2 = SizeConstraint(min_size=30, max_size=50)
>>> # At least 20 items
>>> constraint3 = SizeConstraint(min_size=20)
>>> # At most 100 items
>>> constraint4 = SizeConstraint(max_size=100)
validate_size_params() -> SizeConstraint
¶
Validate size parameter combinations.
Ensures that: - At least one size parameter is set - exact_size is not used with min_size or max_size - min_size <= max_size if both are set
Returns:
| Type | Description |
|---|---|
SizeConstraint
|
Validated constraint. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If validation fails. |
OrderingConstraint
¶
Bases: BeadBaseModel
Constraint on item presentation order.
CRITICAL: This constraint is primarily enforced at jsPsych runtime, not during static list construction. The Python data model stores the constraint specification, which is then translated to JavaScript code for runtime enforcement during per-participant randomization.
Attributes:
| Name | Type | Description |
|---|---|---|
constraint_type |
Literal['ordering']
|
Discriminator for constraint type. |
precedence_pairs |
list[tuple[UUID, UUID]]
|
Pairs of (item_a_id, item_b_id) where item_a must appear before item_b. |
no_adjacent_property |
str | None
|
Property path; items with same value cannot be adjacent. Example: "item_metadata.condition" prevents AA, BB patterns. |
block_by_property |
str | None
|
Property path to group items into contiguous blocks. Example: "item_metadata.block_type" creates blocked design. |
min_distance |
int | None
|
Minimum number of items between items with same no_adjacent_property value. |
max_distance |
int | None
|
Maximum number of items between start and end of items with same block_by_property value (enforces tight blocking). |
practice_item_property |
str | None
|
Property path identifying practice items (should appear first). Example: "item_metadata.is_practice" with value True. |
randomize_within_blocks |
bool
|
Whether to randomize order within blocks (default True). Only applies when block_by_property is set. |
Examples:
>>> # No adjacent items with same condition
>>> constraint = OrderingConstraint(
... no_adjacent_property="item_metadata.condition"
... )
>>> # Practice items first, then main items
>>> constraint = OrderingConstraint(
... practice_item_property="item_metadata.is_practice"
... )
>>> # Blocked by condition, randomized within blocks
>>> constraint = OrderingConstraint(
... block_by_property="item_metadata.condition",
... randomize_within_blocks=True
... )
>>> # Item A before Item B
>>> from uuid import uuid4
>>> item_a, item_b = uuid4(), uuid4()
>>> constraint = OrderingConstraint(
... precedence_pairs=[(item_a, item_b)]
... )
validate_distance_constraints() -> OrderingConstraint
¶
Validate distance constraint combinations.
Returns:
| Type | Description |
|---|---|
OrderingConstraint
|
Validated constraint. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If validation fails. |
BatchCoverageConstraint
¶
Bases: BeadBaseModel
Constraint ensuring all values appear somewhere in the batch.
Ensures that all values of a property appear across the collection of lists. Useful for guaranteeing coverage of experimental conditions, templates, or stimulus categories across all participants.
Attributes:
| Name | Type | Description |
|---|---|---|
constraint_type |
Literal['coverage']
|
Discriminator field for constraint type (always "coverage"). |
property_expression |
str
|
DSL expression that extracts the property value to check coverage. The item is available as 'item' in the expression (metadata dict). Example: "item['template_id']" |
context |
dict[str, ContextValue]
|
Additional context variables for DSL evaluation. |
target_values |
list[str | int | float] | None
|
Target values that must be covered. If None, uses all observed values. |
min_coverage |
float, default=1.0
|
Minimum coverage fraction (0.0-1.0). 1.0 means 100% of target values must appear. |
priority |
int, default=1
|
Constraint priority (higher = more important). |
Examples:
>>> # Ensure all 26 templates appear across all lists
>>> constraint = BatchCoverageConstraint(
... property_expression="item['template_id']",
... target_values=list(range(26)),
... min_coverage=1.0
... )
>>> # Ensure at least 90% of verbs are covered
>>> constraint = BatchCoverageConstraint(
... property_expression="item['verb_lemma']",
... target_values=["run", "jump", "eat", "sleep", "think"],
... min_coverage=0.9
... )
validate_property_expression(v: str) -> str
classmethod
¶
Validate property expression is non-empty.
BatchBalanceConstraint
¶
Bases: BeadBaseModel
Constraint ensuring balanced distribution across the entire batch.
Ensures balanced distribution of a categorical property across all lists combined. Unlike per-list balance constraints, this operates on the aggregate distribution across the entire batch.
Attributes:
| Name | Type | Description |
|---|---|---|
constraint_type |
Literal['balance']
|
Discriminator field for constraint type (always "balance"). |
property_expression |
str
|
DSL expression that extracts the category value to balance. Example: "item['pair_type']" |
context |
dict[str, ContextValue]
|
Additional context variables for DSL evaluation. |
target_distribution |
dict[str, float]
|
Target distribution (values sum to 1.0). Keys are category values, values are target proportions. |
tolerance |
float, default=0.1
|
Allowed deviation from target as a proportion (0.0-1.0). |
priority |
int, default=1
|
Constraint priority (higher = more important). |
Examples:
>>> # Ensure 50/50 balance of pair types across all lists
>>> constraint = BatchBalanceConstraint(
... property_expression="item['pair_type']",
... target_distribution={"same_verb": 0.5, "different_verb": 0.5},
... tolerance=0.05
... )
>>> # Three-way split across conditions
>>> constraint = BatchBalanceConstraint(
... property_expression="item['condition']",
... target_distribution={"A": 0.333, "B": 0.333, "C": 0.334},
... tolerance=0.1
... )
BatchDiversityConstraint
¶
Bases: BeadBaseModel
Constraint preventing values from appearing in too many lists.
Ensures that no single value of a property appears in too many lists, promoting diversity across lists. Useful for ensuring that stimuli (e.g., verbs, nouns) are distributed across participants rather than concentrated in a few lists.
Attributes:
| Name | Type | Description |
|---|---|---|
constraint_type |
Literal['diversity']
|
Discriminator field for constraint type (always "diversity"). |
property_expression |
str
|
DSL expression that extracts the property value to check diversity. Example: "item['verb_lemma']" |
context |
dict[str, ContextValue]
|
Additional context variables for DSL evaluation. |
max_lists_per_value |
int
|
Maximum number of lists any value can appear in. |
priority |
int, default=1
|
Constraint priority (higher = more important). |
Examples:
>>> # No verb should appear in more than 3 out of 8 lists
>>> constraint = BatchDiversityConstraint(
... property_expression="item['verb_lemma']",
... max_lists_per_value=3
... )
>>> # No template in more than half the lists
>>> constraint = BatchDiversityConstraint(
... property_expression="item['template_id']",
... max_lists_per_value=4
... )
validate_property_expression(v: str) -> str
classmethod
¶
Validate property expression is non-empty.
BatchMinOccurrenceConstraint
¶
Bases: BeadBaseModel
Constraint ensuring minimum representation across the batch.
Ensures that each value of a property appears at least a minimum number of times across all lists. Useful for guaranteeing sufficient data for each experimental condition or stimulus category.
Attributes:
| Name | Type | Description |
|---|---|---|
constraint_type |
Literal['min_occurrence']
|
Discriminator field for constraint type (always "min_occurrence"). |
property_expression |
str
|
DSL expression that extracts the property value to check occurrences. Example: "item['quantile']" |
context |
dict[str, ContextValue]
|
Additional context variables for DSL evaluation. |
min_occurrences |
int
|
Minimum number of times each value must appear across all lists. |
priority |
int, default=1
|
Constraint priority (higher = more important). |
Examples:
>>> # Each quantile appears at least 50 times across all lists
>>> constraint = BatchMinOccurrenceConstraint(
... property_expression="item['quantile']",
... min_occurrences=50
... )
>>> # Each template at least 5 times
>>> constraint = BatchMinOccurrenceConstraint(
... property_expression="item['template_id']",
... min_occurrences=5
... )
validate_property_expression(v: str) -> str
classmethod
¶
Validate property expression is non-empty.
Partitioning¶
partitioner
¶
List partitioning for experimental item distribution.
This module provides the ListPartitioner class for partitioning items into balanced experimental lists. Implements three strategies: random, balanced, and stratified. Uses stand-off annotation (works with UUIDs only).
ListPartitioner
¶
Partitions items into balanced experimental lists.
Uses stand-off annotation: only stores UUIDs, not full item objects. Requires item metadata dict for constraint checking and balancing.
Implements three partitioning strategies: - Random: Simple round-robin after shuffling - Balanced: Greedy algorithm to minimize constraint violations - Stratified: Quantile-based stratification with balanced distribution
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
random_seed
|
int | None
|
Random seed for reproducibility. |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
random_seed |
int | None
|
Random seed for reproducibility. |
Examples:
>>> from uuid import uuid4
>>> partitioner = ListPartitioner(random_seed=42)
>>> items = [uuid4() for _ in range(100)]
>>> metadata = {uid: {"property": i} for i, uid in enumerate(items)}
>>> lists = partitioner.partition(items, n_lists=5, metadata=metadata)
>>> len(lists)
5
partition(items: list[UUID], n_lists: int, constraints: list[ListConstraint] | None = None, strategy: str = 'balanced', metadata: MetadataDict | None = None) -> list[ExperimentList]
¶
Partition items into lists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
items
|
list[UUID]
|
Item UUIDs to partition. |
required |
n_lists
|
int
|
Number of lists to create. |
required |
constraints
|
list[ListConstraint] | None
|
Constraints to satisfy. |
None
|
strategy
|
str
|
Partitioning strategy ("balanced", "random", "stratified"). |
"balanced"
|
metadata
|
dict[UUID, dict[str, Any]] | None
|
Metadata for each item UUID. Required for constraint checking. |
None
|
Returns:
| Type | Description |
|---|---|
list[ExperimentList]
|
The partitioned lists. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If strategy is unknown or n_lists < 1. |
partition_with_batch_constraints(items: list[UUID], n_lists: int, list_constraints: list[ListConstraint] | None = None, batch_constraints: list[BatchConstraint] | None = None, strategy: str = 'balanced', metadata: MetadataDict | None = None, max_iterations: int = 1000, tolerance: float = 0.05) -> list[ExperimentList]
¶
Partition items with batch-level constraints.
Creates initial partition using standard partitioning, then iteratively refines to satisfy batch constraints through item swaps between lists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
items
|
list[UUID]
|
Item UUIDs to partition. |
required |
n_lists
|
int
|
Number of lists to create. |
required |
list_constraints
|
list[ListConstraint] | None
|
Per-list constraints to satisfy. |
None
|
batch_constraints
|
list[BatchConstraint] | None
|
Batch-level constraints to satisfy. |
None
|
strategy
|
str
|
Initial partitioning strategy ("balanced", "random", "stratified"). |
"balanced"
|
metadata
|
dict[UUID, dict[str, Any]] | None
|
Metadata for each item UUID. |
None
|
max_iterations
|
int
|
Maximum refinement iterations. |
1000
|
tolerance
|
float
|
Tolerance for batch constraint satisfaction (score >= 1.0 - tolerance). |
0.05
|
Returns:
| Type | Description |
|---|---|
list[ExperimentList]
|
Partitioned lists satisfying both list and batch constraints. |
Examples:
>>> from bead.lists.constraints import BatchCoverageConstraint
>>> partitioner = ListPartitioner(random_seed=42)
>>> constraint = BatchCoverageConstraint(
... property_expression="item['template_id']",
... target_values=list(range(26)),
... min_coverage=1.0
... )
>>> lists = partitioner.partition_with_batch_constraints(
... items=item_uids,
... n_lists=8,
... batch_constraints=[constraint],
... metadata=metadata_dict,
... max_iterations=500
... )
stratification
¶
Stratification utilities for quantile-based item assignment.
This module provides utilities for assigning items to quantile bins based on numeric properties, with optional stratification by grouping variables.
assign_quantiles(items: list[T], property_getter: Callable[[T], float], n_quantiles: int = 10, stratify_by: Callable[[T], Hashable] | None = None) -> dict[T, int]
¶
Assign quantile bins to items based on numeric property.
Divides items into n_quantiles bins based on the distribution of a numeric property extracted via property_getter. Optionally stratifies by a grouping variable, computing separate quantiles for each group.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
items
|
list[T]
|
List of items to assign to quantile bins. |
required |
property_getter
|
Callable[[T], float]
|
Function that extracts a numeric value from each item. This value is used to compute quantiles. |
required |
n_quantiles
|
int
|
Number of quantile bins (default: 10 for deciles). Must be >= 2. |
10
|
stratify_by
|
Callable[[T], Hashable] | None
|
Optional function that extracts a grouping variable from each item. If provided, quantiles are computed separately for each group. Groups must be hashable (str, int, UUID, tuple, etc.). |
None
|
Returns:
| Type | Description |
|---|---|
dict[T, int]
|
Dictionary mapping each item to its quantile bin (0 to n_quantiles-1). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If n_quantiles < 2 or items list is empty. |
Examples:
Basic usage with simple numeric values:
>>> items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> result = assign_quantiles(
... items,
... property_getter=lambda x: x,
... n_quantiles=4
... )
>>> result[1] # First item in lowest quartile
0
>>> result[10] # Last item in highest quartile
3
With Item objects and stratification:
>>> from bead.items.item import Item
>>> from uuid import uuid4
>>> items = [
... Item(item_template_id=uuid4(), item_metadata={"score": 10.5, "group": "A"}),
... Item(item_template_id=uuid4(), item_metadata={"score": 5.2, "group": "A"}),
... Item(item_template_id=uuid4(), item_metadata={"score": 8.1, "group": "B"}),
... Item(item_template_id=uuid4(), item_metadata={"score": 3.3, "group": "B"}),
... ]
>>> result = assign_quantiles(
... items,
... property_getter=lambda x: x.item_metadata["score"],
... n_quantiles=2,
... stratify_by=lambda x: x.item_metadata["group"]
... )
With UUID keys (common pattern):
assign_quantiles_by_uuid(item_ids: list[UUID], item_metadata: dict[UUID, dict[str, MetadataValue]], property_key: str, n_quantiles: int = 10, stratify_by_key: str | None = None) -> dict[UUID, int]
¶
Assign quantile bins to items by UUID with metadata lookup.
Convenience function for the common pattern of working with UUIDs and metadata dictionaries (stand-off annotation pattern).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item_ids
|
list[UUID]
|
List of item UUIDs. |
required |
item_metadata
|
dict[UUID, dict[str, MetadataValue]]
|
Metadata dictionary mapping UUIDs to their metadata dicts. |
required |
property_key
|
str
|
Key in item_metadata[uuid] dict to use for quantile computation. |
required |
n_quantiles
|
int
|
Number of quantile bins (default: 10). |
10
|
stratify_by_key
|
str | None
|
Optional key in metadata dict to use for stratification. |
None
|
Returns:
| Type | Description |
|---|---|
dict[UUID, int]
|
Dictionary mapping each UUID to its quantile bin (0 to n_quantiles-1). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If property_key missing from any item's metadata. |
KeyError
|
If any UUID not found in item_metadata. |
Examples:
>>> from uuid import uuid4
>>> uuids = [uuid4() for _ in range(100)]
>>> metadata = {
... uid: {"score": float(i), "group": "A" if i < 50 else "B"}
... for i, uid in enumerate(uuids)
... }
>>> result = assign_quantiles_by_uuid(
... uuids,
... metadata,
... property_key="score",
... n_quantiles=4,
... stratify_by_key="group"
... )
Balancing¶
balancer
¶
Quantile balancing for experimental list partitioning.
This module provides the QuantileBalancer class for ensuring uniform distribution of items across quantiles of a numeric property. Uses NumPy for efficient quantile computation and maintains stand-off annotation pattern (works with UUIDs).
QuantileBalancer
¶
Ensures uniform distribution of items across quantiles.
Used by stratified partitioning strategy to create balanced distribution of numeric properties (e.g., LM probabilities, word frequencies).
Works with UUIDs only (stand-off annotation). Requires value_func callable to extract numeric values from items via their UUIDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_quantiles
|
int
|
Number of quantiles to create (must be >= 2). |
5
|
random_seed
|
int | None
|
Random seed for reproducibility. If None, uses non-deterministic RNG. |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
n_quantiles |
int
|
Number of quantiles to create. |
random_seed |
int | None
|
Random seed for reproducibility. |
Examples:
>>> from uuid import uuid4
>>> import numpy as np
>>> balancer = QuantileBalancer(n_quantiles=5, random_seed=42)
>>> # Create items with known values
>>> items = [uuid4() for _ in range(100)]
>>> values = {item: float(i) for i, item in enumerate(items)}
>>> value_func = lambda uid: values[uid]
>>> # Balance across 4 lists, 5 items per quantile per list
>>> lists = balancer.balance(items, value_func, n_lists=4,
... items_per_quantile_per_list=5)
>>> len(lists)
4
balance(item_ids: list[UUID], value_func: Callable[[UUID], float], n_lists: int, items_per_quantile_per_list: int) -> list[list[UUID]]
¶
Balance items across lists and quantiles.
Distributes items uniformly across quantiles and lists to ensure balanced representation of the numeric property across all lists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item_ids
|
list[UUID]
|
UUIDs of items to balance. |
required |
value_func
|
Callable[[UUID], float]
|
Function to extract numeric value from item UUID. |
required |
n_lists
|
int
|
Number of lists to create. |
required |
items_per_quantile_per_list
|
int
|
Target number of items per quantile per list. |
required |
Returns:
| Type | Description |
|---|---|
list[list[UUID]]
|
Balanced lists of item UUIDs. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If n_lists < 1 or items_per_quantile_per_list < 1. |
Examples:
>>> from uuid import uuid4
>>> balancer = QuantileBalancer(n_quantiles=5, random_seed=42)
>>> items = [uuid4() for _ in range(100)]
>>> values = {item: float(i) for i, item in enumerate(items)}
>>> lists = balancer.balance(items, lambda uid: values[uid], 4, 5)
>>> all(len(lst) == 25 for lst in lists) # 5 quantiles * 5 items
True
Notes
- Items are assigned to quantiles using np.percentile and np.digitize
- Within each quantile, items are shuffled before distribution
- If insufficient items exist in a quantile, fewer items are assigned
compute_balance_score(item_ids: list[UUID], value_func: Callable[[UUID], float]) -> float
¶
Compute balance score for items.
Score is 1.0 for perfect balance (uniform distribution across quantiles), lower for imbalanced distributions. Score is based on deviation from expected uniform distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item_ids
|
list[UUID]
|
UUIDs of items to score. |
required |
value_func
|
Callable[[UUID], float]
|
Function to extract numeric values. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Balance score (0.0-1.0, higher is better). |
Examples:
>>> from uuid import uuid4
>>> balancer = QuantileBalancer(n_quantiles=5)
>>> # Uniformly distributed values
>>> items = [uuid4() for _ in range(100)]
>>> values = {item: float(i) for i, item in enumerate(items)}
>>> score = balancer.compute_balance_score(items, lambda uid: values[uid])
>>> score > 0.9 # Should be close to 1.0
True
Notes
- Returns 0.0 for empty item lists
- Uses mean absolute deviation from expected uniform count