bead.lists¶
Stage 4 of the bead pipeline: list partitioning with constraint satisfaction.
Core Classes¶
experiment_list
¶
Experiment list data model for organizing experimental items.
The ExperimentList model uses stand-off annotation: it stores only
item UUIDs, not full Item objects. Items are looked up by UUID
against an ItemCollection or Repository.
ConstraintSatisfaction
¶
Bases: BeadBaseModel
Whether a single constraint is satisfied for the list.
Attributes:
| Name | Type | Description |
|---|---|---|
constraint_id |
UUID
|
UUID of the constraint. |
satisfied |
bool
|
Whether the constraint holds. |
ExperimentList
¶
Bases: BeadBaseModel
A list of experimental items selected for participant presentation.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
List name (e.g. |
list_number |
int
|
Numeric identifier (>= 0). |
item_refs |
tuple[UUID, ...]
|
UUIDs of the items in this list, in insertion order. |
list_constraints |
tuple[ListConstraint, ...]
|
Constraints the list must satisfy. |
constraint_satisfaction |
tuple[ConstraintSatisfaction, ...]
|
Per-constraint satisfaction records. |
presentation_order |
tuple[UUID, ...] | None
|
Explicit presentation order; |
list_metadata |
dict[str, MetadataValue]
|
Metadata for this list. |
balance_metrics |
dict[str, MetadataValue]
|
Metrics about list balance. |
with_item(item_id: UUID) -> Self
¶
Return a new list with item_id appended.
without_item(item_id: UUID) -> Self
¶
Return a new list with item_id removed.
Raises:
| Type | Description |
|---|---|
ValueError
|
If item_id is not in the list. |
with_shuffled_order(seed: int | None = None) -> Self
¶
Return a new list whose presentation_order is a shuffle of items.
get_presentation_order() -> tuple[UUID, ...]
¶
Return presentation_order if set, else item_refs.
validate_presentation_order(experiment_list: ExperimentList) -> None
¶
Raise ValueError if presentation_order and item_refs disagree.
The order must be a permutation of the item refs (no missing, extra, or duplicated UUIDs).
list_collection
¶
List collection data model.
The ListCollection model groups multiple ExperimentList instances
together with metadata describing the partitioning process that produced
them.
CoverageValidationResult
¶
Bases: TypedDict
Outcome of a coverage check across a ListCollection.
ListCollection
¶
Bases: BeadBaseModel
A collection of experimental lists with partitioning metadata.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Collection name. |
source_items_id |
UUID
|
UUID of the source |
partitioning_strategy |
str
|
Strategy name (e.g. |
lists |
tuple[ExperimentList, ...]
|
Member experiment lists. |
partitioning_config |
dict[str, MetadataValue]
|
Configuration for the partitioning process. |
partitioning_stats |
dict[str, MetadataValue]
|
Statistics from the partitioning process. |
with_list(exp_list: ExperimentList) -> Self
¶
Return a new collection with exp_list appended.
get_list_by_number(list_number: int) -> ExperimentList | None
¶
Return the list with the matching list_number, or None.
get_all_item_refs() -> tuple[UUID, ...]
¶
Return every distinct item UUID referenced across all lists.
validate_coverage(all_item_ids: set[UUID]) -> CoverageValidationResult
¶
Check that every item in all_item_ids is assigned exactly once.
Returns a report with keys valid, missing_items,
duplicate_items, and total_assigned.
to_jsonl(path: Path | str) -> None
¶
Write each contained list as a JSONL line at path.
from_jsonl(path: Path | str, name: str = 'loaded_lists', source_items_id: UUID | None = None, partitioning_strategy: str = 'unknown') -> ListCollection
classmethod
¶
Build a collection from a JSONL file of experiment lists.
Constraints¶
constraints
¶
Constraint models for experimental list composition.
List-level constraints govern composition of a single list (uniqueness,
balance, quantile distribution, size, ordering, etc.). Batch-level
constraints govern composition across a collection of lists (coverage,
balance, diversity, minimum occurrence). Each family is a discriminated
union rooted at ListConstraint / BatchConstraint; subclass
construction takes the matching constraint_type value.
ListConstraint
¶
BatchConstraint
¶
UniquenessConstraint
¶
Bases: ListConstraint
No two items in a list share the same value of property_expression.
Attributes:
| Name | Type | Description |
|---|---|---|
property_expression |
str
|
DSL expression returning the value that must be unique across the list. |
context |
dict[str, ContextValue]
|
Extra DSL evaluation context. |
allow_null |
bool
|
Allow multiple items with a |
priority |
int
|
Higher values are weighted more heavily during partitioning. |
ConditionalUniquenessConstraint
¶
Bases: ListConstraint
Uniqueness applied only when condition_expression evaluates true.
Attributes:
| Name | Type | Description |
|---|---|---|
property_expression |
str
|
DSL expression returning the value that must be unique. |
condition_expression |
str
|
DSL boolean expression gating constraint application. |
context |
dict[str, ContextValue]
|
Extra DSL evaluation context. |
allow_null |
bool
|
Allow multiple items with a |
priority |
int
|
Constraint priority. |
BalanceConstraint
¶
Bases: ListConstraint
Balanced distribution of a categorical property within a list.
Attributes:
| Name | Type | Description |
|---|---|---|
property_expression |
str
|
DSL expression returning the category value. |
context |
dict[str, ContextValue]
|
Extra DSL evaluation context. |
target_counts |
dict[str, int] | None
|
Target counts per category. |
tolerance |
float
|
Allowed deviation from target as a proportion (0.0-1.0). |
priority |
int
|
Constraint priority. |
QuantileConstraint
¶
Bases: ListConstraint
Uniform distribution of items across quantiles of a numeric property.
Attributes:
| Name | Type | Description |
|---|---|---|
property_expression |
str
|
DSL expression returning the numeric value to quantile. |
context |
dict[str, ContextValue]
|
Extra DSL evaluation context. |
n_quantiles |
int
|
Number of quantiles to create (>= 2). |
items_per_quantile |
int
|
Target items per quantile (>= 1). |
priority |
int
|
Constraint priority. |
GroupedQuantileConstraint
¶
Bases: ListConstraint
Quantile uniformity applied within groups defined by another expression.
Attributes:
| Name | Type | Description |
|---|---|---|
property_expression |
str
|
DSL expression returning the numeric value to quantile. |
group_by_expression |
str
|
DSL expression returning the grouping key. |
context |
dict[str, ContextValue]
|
Extra DSL evaluation context. |
n_quantiles |
int
|
Quantiles per group. |
items_per_quantile |
int
|
Target items per quantile per group. |
priority |
int
|
Constraint priority. |
DiversityConstraint
¶
Bases: ListConstraint
Minimum number of unique values for a property within a list.
Attributes:
| Name | Type | Description |
|---|---|---|
property_expression |
str
|
DSL expression returning the value to count for diversity. |
min_unique_values |
int
|
Minimum number of unique values required (>= 1). |
context |
dict[str, ContextValue]
|
Extra DSL evaluation context. |
priority |
int
|
Constraint priority. |
SizeConstraint
¶
Bases: ListConstraint
Size requirements for a list.
Specify exact_size, or min_size and/or max_size.
Attributes:
| Name | Type | Description |
|---|---|---|
min_size |
int | None
|
Minimum list size. |
max_size |
int | None
|
Maximum list size. |
exact_size |
int | None
|
Exact required size; mutually exclusive with |
priority |
int
|
Constraint priority. |
OrderingPair
¶
Bases: BeadBaseModel
Precedence relation between two items in a list.
Attributes:
| Name | Type | Description |
|---|---|---|
before |
UUID
|
Item that must appear earlier in the list. |
after |
UUID
|
Item that must appear later in the list. |
OrderingConstraint
¶
Bases: ListConstraint
Item presentation order requirements.
Enforced primarily at jsPsych runtime; the Python model stores the specification.
Attributes:
| Name | Type | Description |
|---|---|---|
precedence_pairs |
tuple[OrderingPair, ...]
|
Pairs |
no_adjacent_property |
str | None
|
Property path; items sharing a value cannot be adjacent. |
block_by_property |
str | None
|
Property path used to group items into contiguous blocks. |
min_distance |
int | None
|
Minimum item separation between equal-property neighbours. |
max_distance |
int | None
|
Maximum span between start and end of a property block. |
practice_item_property |
str | None
|
Property identifying practice items, which precede main items. |
randomize_within_blocks |
bool
|
Randomize order within property blocks. |
priority |
int
|
Constraint priority (unused for static partitioning). |
BatchCoverageConstraint
¶
Bases: BatchConstraint
All values of property_expression appear somewhere in the batch.
Attributes:
| Name | Type | Description |
|---|---|---|
property_expression |
str
|
DSL expression returning the property value to cover. |
context |
dict[str, ContextValue]
|
Extra DSL evaluation context. |
target_values |
tuple[str | int | float, ...] | None
|
Values that must be covered. |
min_coverage |
float
|
Minimum fraction of target values that must appear (0.0-1.0). |
priority |
int
|
Constraint priority. |
BatchBalanceConstraint
¶
Bases: BatchConstraint
Balanced distribution of a categorical property across the entire batch.
Attributes:
| Name | Type | Description |
|---|---|---|
property_expression |
str
|
DSL expression returning the category value. |
target_distribution |
dict[str, float]
|
Target proportions per category (values sum to ~1.0). |
context |
dict[str, ContextValue]
|
Extra DSL evaluation context. |
tolerance |
float
|
Allowed deviation from target. |
priority |
int
|
Constraint priority. |
BatchDiversityConstraint
¶
Bases: BatchConstraint
No single value appears in too many lists.
Attributes:
| Name | Type | Description |
|---|---|---|
property_expression |
str
|
DSL expression returning the property value. |
max_lists_per_value |
int
|
Maximum lists any value may appear in (>= 1). |
context |
dict[str, ContextValue]
|
Extra DSL evaluation context. |
priority |
int
|
Constraint priority. |
BatchMinOccurrenceConstraint
¶
Bases: BatchConstraint
Each value of property_expression appears at least min_occurrences times.
Attributes:
| Name | Type | Description |
|---|---|---|
property_expression |
str
|
DSL expression returning the property value. |
min_occurrences |
int
|
Minimum total occurrences across all lists (>= 1). |
context |
dict[str, ContextValue]
|
Extra DSL evaluation context. |
priority |
int
|
Constraint priority. |
Partitioning¶
partitioner
¶
List partitioning for experimental item distribution.
This module provides the ListPartitioner class for partitioning items into balanced experimental lists. Implements three strategies: random, balanced, and stratified. Uses stand-off annotation (works with UUIDs only).
ListPartitioner
¶
Partitions items into balanced experimental lists.
Uses stand-off annotation: only stores UUIDs, not full item objects. Requires item metadata dict for constraint checking and balancing.
Implements three partitioning strategies: - Random: Simple round-robin after shuffling - Balanced: Greedy algorithm to minimize constraint violations - Stratified: Quantile-based stratification with balanced distribution
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
random_seed
|
int | None
|
Random seed for reproducibility. |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
random_seed |
int | None
|
Random seed for reproducibility. |
Examples:
>>> from uuid import uuid4
>>> partitioner = ListPartitioner(random_seed=42)
>>> items = [uuid4() for _ in range(100)]
>>> metadata = {uid: {"property": i} for i, uid in enumerate(items)}
>>> lists = partitioner.partition(items, n_lists=5, metadata=metadata)
>>> len(lists)
5
partition(items: list[UUID], n_lists: int, constraints: list[ListConstraint] | None = None, strategy: str = 'balanced', metadata: MetadataDict | None = None) -> list[ExperimentList]
¶
Partition items into lists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
items
|
list[UUID]
|
Item UUIDs to partition. |
required |
n_lists
|
int
|
Number of lists to create. |
required |
constraints
|
list[ListConstraint] | None
|
Constraints to satisfy. |
None
|
strategy
|
str
|
Partitioning strategy ("balanced", "random", "stratified"). |
"balanced"
|
metadata
|
dict[UUID, dict[str, Any]] | None
|
Metadata for each item UUID. Required for constraint checking. |
None
|
Returns:
| Type | Description |
|---|---|
list[ExperimentList]
|
The partitioned lists. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If strategy is unknown or n_lists < 1. |
partition_with_batch_constraints(items: list[UUID], n_lists: int, list_constraints: list[ListConstraint] | None = None, batch_constraints: list[BatchConstraint] | None = None, strategy: str = 'balanced', metadata: MetadataDict | None = None, max_iterations: int = 1000, tolerance: float = 0.05) -> list[ExperimentList]
¶
Partition items with batch-level constraints.
Creates initial partition using standard partitioning, then iteratively refines to satisfy batch constraints through item swaps between lists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
items
|
list[UUID]
|
Item UUIDs to partition. |
required |
n_lists
|
int
|
Number of lists to create. |
required |
list_constraints
|
list[ListConstraint] | None
|
Per-list constraints to satisfy. |
None
|
batch_constraints
|
list[BatchConstraint] | None
|
Batch-level constraints to satisfy. |
None
|
strategy
|
str
|
Initial partitioning strategy ("balanced", "random", "stratified"). |
"balanced"
|
metadata
|
dict[UUID, dict[str, Any]] | None
|
Metadata for each item UUID. |
None
|
max_iterations
|
int
|
Maximum refinement iterations. |
1000
|
tolerance
|
float
|
Tolerance for batch constraint satisfaction (score >= 1.0 - tolerance). |
0.05
|
Returns:
| Type | Description |
|---|---|
list[ExperimentList]
|
Partitioned lists satisfying both list and batch constraints. |
Examples:
>>> from bead.lists.constraints import BatchCoverageConstraint
>>> partitioner = ListPartitioner(random_seed=42)
>>> constraint = BatchCoverageConstraint(
... property_expression="item['template_id']",
... target_values=list(range(26)),
... min_coverage=1.0
... )
>>> lists = partitioner.partition_with_batch_constraints(
... items=item_uids,
... n_lists=8,
... batch_constraints=[constraint],
... metadata=metadata_dict,
... max_iterations=500
... )
stratification
¶
Stratification utilities for quantile-based item assignment.
This module provides utilities for assigning items to quantile bins based on numeric properties, with optional stratification by grouping variables.
assign_quantiles(items: list[T], property_getter: Callable[[T], float], n_quantiles: int = 10, stratify_by: Callable[[T], Hashable] | None = None) -> dict[T, int]
¶
Assign quantile bins to items based on numeric property.
Divides items into n_quantiles bins based on the distribution of a numeric property extracted via property_getter. Optionally stratifies by a grouping variable, computing separate quantiles for each group.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
items
|
list[T]
|
List of items to assign to quantile bins. |
required |
property_getter
|
Callable[[T], float]
|
Function that extracts a numeric value from each item. This value is used to compute quantiles. |
required |
n_quantiles
|
int
|
Number of quantile bins (default: 10 for deciles). Must be >= 2. |
10
|
stratify_by
|
Callable[[T], Hashable] | None
|
Optional function that extracts a grouping variable from each item. If provided, quantiles are computed separately for each group. Groups must be hashable (str, int, UUID, tuple, etc.). |
None
|
Returns:
| Type | Description |
|---|---|
dict[T, int]
|
Dictionary mapping each item to its quantile bin (0 to n_quantiles-1). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If n_quantiles < 2 or items list is empty. |
Examples:
Basic usage with simple numeric values:
>>> items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> result = assign_quantiles(
... items,
... property_getter=lambda x: x,
... n_quantiles=4
... )
>>> result[1] # First item in lowest quartile
0
>>> result[10] # Last item in highest quartile
3
With Item objects and stratification:
>>> from bead.items.item import Item
>>> from uuid import uuid4
>>> items = [
... Item(item_template_id=uuid4(), item_metadata={"score": 10.5, "group": "A"}),
... Item(item_template_id=uuid4(), item_metadata={"score": 5.2, "group": "A"}),
... Item(item_template_id=uuid4(), item_metadata={"score": 8.1, "group": "B"}),
... Item(item_template_id=uuid4(), item_metadata={"score": 3.3, "group": "B"}),
... ]
>>> result = assign_quantiles(
... items,
... property_getter=lambda x: x.item_metadata["score"],
... n_quantiles=2,
... stratify_by=lambda x: x.item_metadata["group"]
... )
With UUID keys (common pattern):
>>> from uuid import UUID
>>> item_uuids = [uuid4() for _ in range(100)]
>>> item_scores = {uid: float(i) for i, uid in enumerate(item_uuids)}
>>> result = assign_quantiles(
... item_uuids,
... property_getter=lambda uid: item_scores[uid],
... n_quantiles=10
... )
assign_quantiles_by_uuid(item_ids: list[UUID], item_metadata: dict[UUID, dict[str, MetadataValue]], property_key: str, n_quantiles: int = 10, stratify_by_key: str | None = None) -> dict[UUID, int]
¶
Assign quantile bins to items by UUID with metadata lookup.
Convenience function for the common pattern of working with UUIDs and metadata dictionaries (stand-off annotation pattern).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item_ids
|
list[UUID]
|
List of item UUIDs. |
required |
item_metadata
|
dict[UUID, dict[str, MetadataValue]]
|
Metadata dictionary mapping UUIDs to their metadata dicts. |
required |
property_key
|
str
|
Key in item_metadata[uuid] dict to use for quantile computation. |
required |
n_quantiles
|
int
|
Number of quantile bins (default: 10). |
10
|
stratify_by_key
|
str | None
|
Optional key in metadata dict to use for stratification. |
None
|
Returns:
| Type | Description |
|---|---|
dict[UUID, int]
|
Dictionary mapping each UUID to its quantile bin (0 to n_quantiles-1). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If property_key missing from any item's metadata. |
KeyError
|
If any UUID not found in item_metadata. |
Examples:
>>> from uuid import uuid4
>>> uuids = [uuid4() for _ in range(100)]
>>> metadata = {
... uid: {"score": float(i), "group": "A" if i < 50 else "B"}
... for i, uid in enumerate(uuids)
... }
>>> result = assign_quantiles_by_uuid(
... uuids,
... metadata,
... property_key="score",
... n_quantiles=4,
... stratify_by_key="group"
... )
Balancing¶
balancer
¶
Quantile balancing for experimental list partitioning.
This module provides the QuantileBalancer class for ensuring uniform distribution of items across quantiles of a numeric property. Uses NumPy for efficient quantile computation and maintains stand-off annotation pattern (works with UUIDs).
QuantileBalancer
¶
Ensures uniform distribution of items across quantiles.
Used by stratified partitioning strategy to create balanced distribution of numeric properties (e.g., LM probabilities, word frequencies).
Works with UUIDs only (stand-off annotation). Requires value_func callable to extract numeric values from items via their UUIDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_quantiles
|
int
|
Number of quantiles to create (must be >= 2). |
5
|
random_seed
|
int | None
|
Random seed for reproducibility. If None, uses non-deterministic RNG. |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
n_quantiles |
int
|
Number of quantiles to create. |
random_seed |
int | None
|
Random seed for reproducibility. |
Examples:
>>> from uuid import uuid4
>>> import numpy as np
>>> balancer = QuantileBalancer(n_quantiles=5, random_seed=42)
>>> # Create items with known values
>>> items = [uuid4() for _ in range(100)]
>>> values = {item: float(i) for i, item in enumerate(items)}
>>> value_func = lambda uid: values[uid]
>>> # Balance across 4 lists, 5 items per quantile per list
>>> lists = balancer.balance(items, value_func, n_lists=4,
... items_per_quantile_per_list=5)
>>> len(lists)
4
balance(item_ids: list[UUID], value_func: Callable[[UUID], float], n_lists: int, items_per_quantile_per_list: int) -> list[list[UUID]]
¶
Balance items across lists and quantiles.
Distributes items uniformly across quantiles and lists to ensure balanced representation of the numeric property across all lists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item_ids
|
list[UUID]
|
UUIDs of items to balance. |
required |
value_func
|
Callable[[UUID], float]
|
Function to extract numeric value from item UUID. |
required |
n_lists
|
int
|
Number of lists to create. |
required |
items_per_quantile_per_list
|
int
|
Target number of items per quantile per list. |
required |
Returns:
| Type | Description |
|---|---|
list[list[UUID]]
|
Balanced lists of item UUIDs. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If n_lists < 1 or items_per_quantile_per_list < 1. |
Examples:
>>> from uuid import uuid4
>>> balancer = QuantileBalancer(n_quantiles=5, random_seed=42)
>>> items = [uuid4() for _ in range(100)]
>>> values = {item: float(i) for i, item in enumerate(items)}
>>> lists = balancer.balance(items, lambda uid: values[uid], 4, 5)
>>> all(len(lst) == 25 for lst in lists) # 5 quantiles * 5 items
True
Notes
- Items are assigned to quantiles using np.percentile and np.digitize
- Within each quantile, items are shuffled before distribution
- If insufficient items exist in a quantile, fewer items are assigned
compute_balance_score(item_ids: list[UUID], value_func: Callable[[UUID], float]) -> float
¶
Compute balance score for items.
Score is 1.0 for perfect balance (uniform distribution across quantiles), lower for imbalanced distributions. Score is based on deviation from expected uniform distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item_ids
|
list[UUID]
|
UUIDs of items to score. |
required |
value_func
|
Callable[[UUID], float]
|
Function to extract numeric values. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Balance score (0.0-1.0, higher is better). |
Examples:
>>> from uuid import uuid4
>>> balancer = QuantileBalancer(n_quantiles=5)
>>> # Uniformly distributed values
>>> items = [uuid4() for _ in range(100)]
>>> values = {item: float(i) for i, item in enumerate(items)}
>>> score = balancer.compute_balance_score(items, lambda uid: values[uid])
>>> score > 0.9 # Should be close to 1.0
True
Notes
- Returns 0.0 for empty item lists
- Uses mean absolute deviation from expected uniform count