bead.simulation¶

Framework for simulating annotator responses with configurable noise models and task-specific strategies.

Runner¶

`runner` ¶

Simulation runner for orchestrating multi-annotator simulations.

`SimulationRunner` ¶

Orchestrates multi-annotator simulation.

Can simulate: - Multiple independent annotators - Correlated annotators (shared noise component) - Mixed strategies (some LM-based, some random)

Parameters:

Name	Type	Description	Default
`config`	`SimulationRunnerConfig`	Configuration for simulation.	required

Examples:

>>> from bead.config.simulation import (
...     SimulationRunnerConfig,
...     SimulatedAnnotatorConfig,
... )
>>> config = SimulationRunnerConfig(
...     annotator_configs=[
...         SimulatedAnnotatorConfig(strategy="lm_score", random_state=1),
...         SimulatedAnnotatorConfig(strategy="lm_score", random_state=2),
...     ],
...     n_annotators=2
... )
>>> runner = SimulationRunner(config)
>>> # results = runner.run(items, templates)

`run(items: list[Item], templates: list[ItemTemplate] | ItemTemplate) -> dict[str, list[str | int | float | list[str]]]` ¶

Run simulation with all annotators.

Parameters:

Name	Type	Description	Default
`items`	`list[Item]`	Items to annotate.	required
`templates`	`list[ItemTemplate] \| ItemTemplate`	Templates (one per item or shared).	required

Returns:

Type	Description
`dict[str, list[str \| int \| float \| list[str]]]`	Results: { "item_ids": [...], "annotator_0": [...], "annotator_1": [...], ... }

`save_results(results: dict[str, list[str | int | float | list[str]]]) -> None` ¶

Save results to file.

Parameters:

Name	Type	Description	Default
`results`	`dict[str, list[str \| int \| float \| list[str]]]`	Simulation results.	required

Annotators¶

`base` ¶

Base class for simulated annotators.

`SimulatedAnnotator` ¶

Bases: ABC

Abstract base for simulated annotators.

An annotator combines: - Task-specific strategy (how to respond to each task type) - Noise model (how to add human-like variability) - Configuration (model output keys, random seed, etc.)

The annotator orchestrates the simulation process and provides a unified interface for generating judgments.

Parameters:

Name	Type	Description	Default
`config`	`SimulatedAnnotatorConfig`	Configuration for annotator.	required
`random_state`	`int \| None`	Random seed (overrides config if provided).	`None`

`from_config(config: SimulatedAnnotatorConfig) -> SimulatedAnnotator` `classmethod` ¶

Create annotator from configuration.

Parameters:

Name	Type	Description	Default
`config`	`SimulatedAnnotatorConfig`	Configuration specifying annotator type and parameters.	required

Returns:

Type	Description
`SimulatedAnnotator`	Configured annotator instance.

Raises:

Type	Description
`ValueError`	If strategy is unknown.

Examples:

>>> from bead.config.simulation import SimulatedAnnotatorConfig
>>> config = SimulatedAnnotatorConfig(strategy="lm_score")
>>> annotator = SimulatedAnnotator.from_config(config)

`annotate(item: Item, item_template: ItemTemplate) -> str | int | float | list[str]` `abstractmethod` ¶

Generate annotation for single item.

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to annotate.	required
`item_template`	`ItemTemplate`	Template defining task structure.	required

Returns:

Type	Description
`str \| int \| float \| list[str]`	Annotation (format depends on task type).

`annotate_batch(items: list[Item], item_templates: list[ItemTemplate] | ItemTemplate) -> dict[str, str | int | float | list[str]]` ¶

Generate annotations for batch of items.

Parameters:

Name	Type	Description	Default
`items`	`list[Item]`	Items to annotate.	required
`item_templates`	`list[ItemTemplate] \| ItemTemplate`	Templates (one per item or single template for all).	required

Returns:

Type	Description
`dict[str, str \| int \| float \| list[str]]`	Mapping from item ID to annotation.

Examples:

>>> annotations = annotator.annotate_batch(items, template)
>>> annotations[str(items[0].id)]
'option_a'

`get_strategy(task_type: str) -> SimulationStrategy` ¶

Get strategy for task type.

Parameters:

Name	Type	Description	Default
`task_type`	`str`	Task type (e.g., "forced_choice").	required

Returns:

Type	Description
`SimulationStrategy`	Strategy for this task type.

Raises:

Type	Description
`ValueError`	If task type not supported.

`oracle` ¶

Oracle (perfect performance) annotator.

`OracleAnnotator` ¶

Bases: SimulatedAnnotator

Perfect performance annotator using ground truth.

Returns ground truth labels from item.item_metadata['ground_truth']. Falls back to random when ground truth is not available.

Useful for establishing upper bound on performance.

Parameters:

Name	Type	Description	Default
`config`	`SimulatedAnnotatorConfig`	Configuration for annotator.	required

Examples:

>>> from bead.config.simulation import SimulatedAnnotatorConfig
>>> config = SimulatedAnnotatorConfig(strategy="oracle", random_state=42)
>>> annotator = OracleAnnotator(config)
>>> # judgment = annotator.annotate(item, template)

`annotate(item: Item, item_template: ItemTemplate) -> str | int | float | bool | list[str]` ¶

Generate oracle annotation using ground truth.

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to annotate.	required
`item_template`	`ItemTemplate`	Template defining task.	required

Returns:

Type	Description
`str \| int \| float \| bool \| list[str]`	Ground truth annotation or random fallback.

`random` ¶

Random baseline annotator.

`RandomAnnotator` ¶

Bases: SimulatedAnnotator

Pure random baseline annotator.

Generates random responses that respect task spec constraints (options, scale ranges, etc.) but are otherwise uninformed.

Useful for establishing baseline performance.

Parameters:

Name	Type	Description	Default
`config`	`SimulatedAnnotatorConfig`	Configuration for annotator.	required

Examples:

>>> from bead.config.simulation import SimulatedAnnotatorConfig
>>> config = SimulatedAnnotatorConfig(strategy="random", random_state=42)
>>> annotator = RandomAnnotator(config)
>>> # judgment = annotator.annotate(item, template)

`annotate(item: Item, item_template: ItemTemplate) -> str | int | float | bool | list[str]` ¶

Generate random annotation.

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to annotate (ignored).	required
`item_template`	`ItemTemplate`	Template defining task constraints.	required

Returns:

Type	Description
`str \| int \| float \| bool \| list[str]`	Random annotation (format depends on task type).

Raises:

Type	Description
`ValueError`	If task type is not supported.

`lm_based` ¶

LM score-based annotator.

`LMBasedAnnotator` ¶

Bases: SimulatedAnnotator

Annotator using language model scores for decisions.

Uses LM log probabilities or scores from Item.model_outputs to make informed decisions. Applies noise model for variability.

Supports all task types via pluggable strategies.

Parameters:

Name	Type	Description	Default
`config`	`SimulatedAnnotatorConfig`	Configuration for annotator.	required

Examples:

>>> from bead.config.simulation import SimulatedAnnotatorConfig, NoiseModelConfig
>>> config = SimulatedAnnotatorConfig(
...     strategy="lm_score",
...     model_output_key="lm_score",
...     noise_model=NoiseModelConfig(noise_type="temperature", temperature=1.5)
... )
>>> annotator = LMBasedAnnotator(config)
>>> # judgment = annotator.annotate(item, template)

`annotate(item: Item, item_template: ItemTemplate) -> str | int | float | list[str]` ¶

Generate annotation using LM scores.

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to annotate.	required
`item_template`	`ItemTemplate`	Template defining task.	required

Returns:

Type	Description
`str \| int \| float \| list[str]`	Annotation (format depends on task type).

`distance_based` ¶

Distance-based annotator using embeddings.

`DistanceBasedAnnotator` ¶

Bases: SimulatedAnnotator

Annotator using embedding distances for decisions.

Uses embeddings from Item.model_outputs to compute similarity/distance metrics, then makes decisions based on those distances.

For forced choice, selects option with lowest distance (highest similarity). For ordinal scales, maps distance to scale values. For binary, thresholds distance.

Parameters:

Name	Type	Description	Default
`config`	`SimulatedAnnotatorConfig`	Configuration for annotator.	required

Examples:

>>> from bead.config.simulation import SimulatedAnnotatorConfig, NoiseModelConfig
>>> config = SimulatedAnnotatorConfig(
...     strategy="distance",
...     model_output_key="embedding",
...     noise_model=NoiseModelConfig(noise_type="none")
... )
>>> annotator = DistanceBasedAnnotator(config)
>>> # judgment = annotator.annotate(item, template)

`annotate(item: Item, item_template: ItemTemplate) -> str | int | float | bool | list[str]` ¶

Generate annotation using embedding distances.

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to annotate.	required
`item_template`	`ItemTemplate`	Template defining task.	required

Returns:

Type	Description
`str \| int \| float \| bool \| list[str]`	Annotation (format depends on task type).

Notes

For distance-based decisions, we convert embeddings to scores: - Cosine similarity ranges from -1 (opposite) to 1 (identical) - We convert to "score" by: score = similarity * 10 - This allows reuse of existing strategies

Noise Models¶

`base` ¶

Base class for noise models.

`NoiseModel` ¶

Bases: ABC

Abstract base for noise models.

Noise models add human-like variability to simulated responses. They can: - Scale probabilities by temperature - Add systematic biases (length, frequency, position) - Inject random noise

`apply(value: str | int | float | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | list[str]` `abstractmethod` ¶

Apply noise to value.

Parameters:

Name	Type	Description	Default
`value`	`str \| int \| float \| list[str]`	Original value (probability, score, choice, etc.).	required
`context`	`dict[str, str \| int \| float \| bool \| list[str]]`	Additional context (item, template, strategy, etc.).	required
`rng`	`RandomState`	Random number generator.	required

Returns:

Type	Description
`str \| int \| float \| list[str]`	Value with noise applied.

`temperature` ¶

Temperature-based noise model.

`TemperatureNoiseModel` ¶

Bases: NoiseModel

Temperature scaling for probability distributions.

Scales logits or probabilities by temperature before sampling: - temperature < 1.0: More deterministic (sharper distribution) - temperature = 1.0: No change - temperature > 1.0: More random (flatter distribution)

For forced choice, modifies the softmax: P_i = exp(score_i / T) / sum(exp(score_j / T))

Parameters:

Name	Type	Description	Default
`temperature`	`float`	Temperature scaling factor (> 0). Default: 1.0.	`1.0`

Raises:

Type	Description
`ValueError`	If temperature <= 0.

Examples:

>>> noise_model = TemperatureNoiseModel(temperature=2.0)
>>> # More random decisions

`apply(value: str | int | float | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | list[str]` ¶

Apply temperature scaling.

For forced_choice, re-samples with scaled probabilities. For ordinal_scale, adds scaled noise to value.

Parameters:

Name	Type	Description	Default
`value`	`str \| int \| float \| list[str]`	Original value (choice, rating, etc.).	required
`context`	`dict[str, str \| int \| float \| bool \| list[str]]`	Context with item, template, strategy.	required
`rng`	`RandomState`	Random number generator.	required

Returns:

Type	Description
`str \| int \| float \| list[str]`	Value with temperature applied.

`random_noise` ¶

Random noise injection model.

`RandomNoiseModel` ¶

Bases: NoiseModel

Random noise injection model.

Adds random noise to responses: - Gaussian noise for numeric values - Uniform noise for numeric values - Random flipping for choice tasks

Parameters:

Name	Type	Description	Default
`noise_type`	`str`	Type of noise ("gaussian" or "uniform"). Default: "gaussian".	`'gaussian'`
`strength`	`float`	Noise strength (stddev for gaussian, range for uniform). Default: 1.0.	`1.0`

Examples:

>>> noise_model = RandomNoiseModel(noise_type="gaussian", strength=0.5)
>>> # Adds gaussian noise with stddev=0.5 to numeric responses

`apply(value: str | int | float | bool | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | bool | list[str]` ¶

Apply random noise.

Parameters:

Name	Type	Description	Default
`value`	`str \| int \| float \| bool \| list[str]`	Original value.	required
`context`	`dict`	Context with item, template, strategy.	required
`rng`	`RandomState`	Random number generator.	required

Returns:

Type	Description
`str \| int \| float \| bool \| list[str]`	Value with noise applied.

`systematic` ¶

Systematic bias noise model.

`SystematicNoiseModel` ¶

Bases: NoiseModel

Systematic bias noise model.

Adds consistent biases to responses: - length: Prefer shorter/longer options - frequency: Prefer common/rare words - position: Prefer first/last option - endpoint: Prefer endpoints on ordinal scales - midpoint: Prefer midpoint on ordinal scales

Parameters:

Name	Type	Description	Default
`bias_type`	`str`	Type of bias ("length", "frequency", "position", "endpoint", "midpoint"). Default: "position".	`'position'`
`bias_strength`	`float`	Strength of bias (0.0-1.0). Default: 0.0.	`0.0`

Examples:

>>> noise_model = SystematicNoiseModel(bias_type="position", bias_strength=0.3)
>>> # Adds 30% bias toward first option in forced choice

`apply(value: str | int | float | bool | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | bool | list[str]` ¶

Apply systematic bias.

Parameters:

Name	Type	Description	Default
`value`	`str \| int \| float \| bool \| list[str]`	Original value.	required
`context`	`dict`	Context with item, template, strategy.	required
`rng`	`RandomState`	Random number generator.	required

Returns:

Type	Description
`str \| int \| float \| bool \| list[str]`	Value with bias applied.

Task-Specific Strategies¶

`base` ¶

Base class for simulation strategies.

`SimulationStrategy` ¶

Bases: ABC

Abstract base for task-specific simulation strategies.

Each strategy handles one task type (forced_choice, ordinal_scale, etc.) and converts model outputs into appropriate responses.

Strategies should: 1. Validate item compatibility with task type 2. Extract relevant model outputs 3. Generate response in correct format for task 4. Handle missing model outputs gracefully

`supported_task_type: str` `abstractmethod` `property` ¶

Return supported task type (e.g., 'forced_choice').

Returns:

Type	Description
`str`	Task type identifier.

`validate_item(item: Item, item_template: ItemTemplate) -> None` `abstractmethod` ¶

Validate item is compatible with this strategy.

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to validate.	required
`item_template`	`ItemTemplate`	Template defining task structure.	required

Raises:

Type	Description
`ValueError`	If item incompatible with this strategy.

`simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str | int | float | list[str]` `abstractmethod` ¶

Generate simulated response for item.

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to respond to.	required
`item_template`	`ItemTemplate`	Template defining task structure.	required
`model_output_key`	`str`	Key to extract from model outputs.	required
`rng`	`RandomState`	Random number generator.	required

Returns:

Type	Description
`str \| int \| float \| list[str]`	Simulated response (format depends on task type).

`extract_model_outputs(item: Item, key: str, required_count: int | None = None) -> list[float] | None` ¶

Extract model outputs from item.

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to extract from.	required
`key`	`str`	Key to look for.	required
`required_count`	`int \| None`	Expected number of outputs.	`None`

Returns:

Type	Description
`list[float] \| None`	Extracted values or None if missing.

`forced_choice` ¶

Forced choice simulation strategy.

`ForcedChoiceStrategy` ¶

Bases: SimulationStrategy

Strategy for forced_choice tasks (n-AFC).

Handles 2AFC, 3AFC, 4AFC, etc. Uses model outputs to compute preference probabilities, then samples categorically.

For 2AFC with LM scores: P(choose A) = sigmoid((score_A - score_B) / temperature)

For n-AFC with LM scores: P(choose i) = softmax([score_1, ..., score_n] / temperature)[i]

Examples:

>>> strategy = ForcedChoiceStrategy()
>>> strategy.supported_task_type
'forced_choice'

`supported_task_type: str` `property` ¶

Return 'forced_choice'.

Returns:

Type	Description
`str`	Task type identifier.

`validate_item(item: Item, item_template: ItemTemplate) -> None` ¶

Validate item for forced choice.

Checks: - task_type is 'forced_choice' - task_spec.options is defined - Item has appropriate model outputs OR can fall back

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to validate.	required
`item_template`	`ItemTemplate`	Template defining task.	required

Raises:

Type	Description
`ValueError`	If validation fails.

`simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str` ¶

Generate forced choice response.

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to respond to.	required
`item_template`	`ItemTemplate`	Template defining task.	required
`model_output_key`	`str`	Key for model outputs (e.g., "lm_score").	required
`rng`	`RandomState`	Random number generator.	required

Returns:

Type	Description
`str`	Chosen option name.

`ordinal_scale` ¶

Ordinal scale simulation strategy.

`OrdinalScaleStrategy` ¶

Bases: SimulationStrategy

Strategy for ordinal_scale tasks (Likert scales).

Handles discrete ordinal scales (e.g., 1-7, 1-5). Maps model outputs to scale positions, then samples with noise around that position.

For ordinal scales with LM score: - Map score to continuous position on scale - Add noise - Round to nearest integer within bounds

Examples:

>>> strategy = OrdinalScaleStrategy()
>>> strategy.supported_task_type
'ordinal_scale'

`supported_task_type: str` `property` ¶

Return 'ordinal_scale'.

Returns:

Type	Description
`str`	Task type identifier.

`validate_item(item: Item, item_template: ItemTemplate) -> None` ¶

Validate item for ordinal scale.

Checks: - task_type is 'ordinal_scale' - task_spec.scale_bounds is defined - scale_bounds has valid min/max

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to validate.	required
`item_template`	`ItemTemplate`	Template defining task.	required

Raises:

Type	Description
`ValueError`	If validation fails.

`simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> int` ¶

Generate ordinal scale response.

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to respond to.	required
`item_template`	`ItemTemplate`	Template defining task.	required
`model_output_key`	`str`	Key for model outputs (e.g., "lm_score").	required
`rng`	`RandomState`	Random number generator.	required

Returns:

Type	Description
`int`	Rating on ordinal scale.

`binary` ¶

Binary choice simulation strategy.

`BinaryStrategy` ¶

Bases: SimulationStrategy

Strategy for binary tasks (yes/no, true/false).

Uses model outputs to compute probability of "yes" response, then samples from Bernoulli distribution.

For binary tasks with LM score: P(yes) = sigmoid(score / temperature)

Examples:

>>> strategy = BinaryStrategy()
>>> strategy.supported_task_type
'binary'

`supported_task_type: str` `property` ¶

Return 'binary'.

Returns:

Type	Description
`str`	Task type identifier.

`validate_item(item: Item, item_template: ItemTemplate) -> None` ¶

Validate item for binary choice.

Checks: - task_type is 'binary' - Item has appropriate model outputs OR can fall back

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to validate.	required
`item_template`	`ItemTemplate`	Template defining task.	required

Raises:

Type	Description
`ValueError`	If validation fails.

`simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> bool` ¶

Generate binary response.

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to respond to.	required
`item_template`	`ItemTemplate`	Template defining task.	required
`model_output_key`	`str`	Key for model outputs (e.g., "lm_score").	required
`rng`	`RandomState`	Random number generator.	required

Returns:

Type	Description
`bool`	Binary response (True/False).

`categorical` ¶

Categorical choice simulation strategy.

`CategoricalStrategy` ¶

Bases: SimulationStrategy

Strategy for categorical tasks (unordered multi-class).

Similar to forced_choice but for unordered categories (e.g., NLI labels, sentiment classes). Uses softmax over model outputs.

For categorical with LM scores: P(category_i) = softmax([score_1, ..., score_n] / temperature)[i]

Examples:

>>> strategy = CategoricalStrategy()
>>> strategy.supported_task_type
'categorical'

`supported_task_type: str` `property` ¶

Return 'categorical'.

Returns:

Type	Description
`str`	Task type identifier.

`validate_item(item: Item, item_template: ItemTemplate) -> None` ¶

Validate item for categorical choice.

Checks: - task_type is 'categorical' - task_spec.options is defined - At least 2 options available

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to validate.	required
`item_template`	`ItemTemplate`	Template defining task.	required

Raises:

Type	Description
`ValueError`	If validation fails.

`simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str` ¶

Generate categorical response.

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to respond to.	required
`item_template`	`ItemTemplate`	Template defining task.	required
`model_output_key`	`str`	Key for model outputs (e.g., "lm_score").	required
`rng`	`RandomState`	Random number generator.	required

Returns:

Type	Description
`str`	Chosen category name.

`multi_select` ¶

Multi-select simulation strategy.

`MultiSelectStrategy` ¶

Bases: SimulationStrategy

Strategy for multi_select tasks.

Handles tasks where multiple options can be selected independently. Uses model outputs to compute independent selection probabilities for each option via sigmoid.

For each option i: P(select option i) = sigmoid(score_i / temperature)

Parameters:

Name	Type	Description	Default
`threshold`	`float`	Probability threshold for selection. Default: 0.5.	`0.5`
`temperature`	`float`	Temperature for scaling decisions. Default: 1.0.	`1.0`

Examples:

>>> strategy = MultiSelectStrategy()
>>> strategy.supported_task_type
'multi_select'

`supported_task_type: str` `property` ¶

Return 'multi_select'.

`validate_item(item: Item, item_template: ItemTemplate) -> None` ¶

Validate item for multi-select.

Checks: - task_type is 'multi_select' - task_spec.options is defined - At least 2 options

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to validate.	required
`item_template`	`ItemTemplate`	Template defining task.	required

Raises:

Type	Description
`ValueError`	If validation fails.

`simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> list[str]` ¶

Generate multi-select response.

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to respond to.	required
`item_template`	`ItemTemplate`	Template defining task.	required
`model_output_key`	`str`	Key for model outputs (e.g., "lm_score").	required
`rng`	`RandomState`	Random number generator.	required

Returns:

Type	Description
`list[str]`	List of selected option names.

`magnitude` ¶

Magnitude estimation simulation strategy.

`MagnitudeStrategy` ¶

Bases: SimulationStrategy

Strategy for magnitude estimation tasks.

Handles unbounded numeric magnitude estimation. Converts model outputs (typically LM scores) to positive magnitude values.

For LM scores (typically negative log probabilities): magnitude = exp(-score / scale_factor)

This maps: - Better scores (less negative) -> larger magnitudes - Worse scores (more negative) -> smaller magnitudes

Parameters:

Name	Type	Description	Default
`scale_factor`	`float`	Scaling factor for converting scores to magnitudes. Higher values produce more variation. Default: 10.0.	`10.0`

Examples:

>>> strategy = MagnitudeStrategy()
>>> strategy.supported_task_type
'magnitude'

`supported_task_type: str` `property` ¶

Return 'magnitude'.

`validate_item(item: Item, item_template: ItemTemplate) -> None` ¶

Validate item for magnitude estimation.

Checks: - task_type is 'magnitude' - Item has model outputs OR can fall back

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to validate.	required
`item_template`	`ItemTemplate`	Template defining task.	required

Raises:

Type	Description
`ValueError`	If validation fails.

`simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> float` ¶

Generate magnitude estimation response.

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to respond to.	required
`item_template`	`ItemTemplate`	Template defining task.	required
`model_output_key`	`str`	Key for model outputs (e.g., "lm_score").	required
`rng`	`RandomState`	Random number generator.	required

Returns:

Type	Description
`float`	Estimated magnitude (positive value).

`free_text` ¶

Free text simulation strategy.

`FreeTextStrategy` ¶

Bases: SimulationStrategy

Strategy for free_text tasks.

Handles free text generation using rule-based approaches. For simulations, this typically: - Extracts text from rendered_elements - Uses templates if provided - Falls back to simple defaults

Note: This is a simplified implementation for simulation purposes. For realistic free text generation, consider using LLMs.

Examples:

>>> strategy = FreeTextStrategy()
>>> strategy.supported_task_type
'free_text'

`supported_task_type: str` `property` ¶

Return 'free_text'.

`validate_item(item: Item, item_template: ItemTemplate) -> None` ¶

Validate item for free text.

Checks: - task_type is 'free_text'

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to validate.	required
`item_template`	`ItemTemplate`	Template defining task.	required

Raises:

Type	Description
`ValueError`	If validation fails.

`simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str` ¶

Generate free text response.

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to respond to.	required
`item_template`	`ItemTemplate`	Template defining task.	required
`model_output_key`	`str`	Key for model outputs (unused for free text).	required
`rng`	`RandomState`	Random number generator.	required

Returns:

Type	Description
`str`	Generated text response.

`cloze` ¶

Cloze (fill-in-the-blank) simulation strategy using MLM scores.

`ClozeStrategy` ¶

Bases: SimulationStrategy

MLM-based strategy for cloze (fill-in-the-blank) tasks.

Uses masked language model scores to select fillers for unfilled slots. For constrained slots (with specific options), selects highest-scoring option. For unconstrained slots, uses rendered_elements or metadata as fallback.

The strategy expects model outputs to contain MLM scores for each slot, stored as separate ModelOutput instances with operation="mlm_score" and inputs containing {"slot_name": slot_name, "candidate": candidate_value}.

Examples:

>>> from bead.simulation.strategies.cloze import ClozeStrategy
>>> strategy = ClozeStrategy()
>>> # item with unfilled_slots and model_outputs with MLM scores
>>> # response = strategy.simulate_response(item, template, "mlm_score", rng)
>>> # Returns: {"determiner": "the", "verb": "chased", "object": "mouse"}

`supported_task_type: str` `property` ¶

Get supported task type.

Returns:

Type	Description
`str`	Always returns "cloze".

`validate_item(item: Item, item_template: ItemTemplate) -> None` ¶

Validate item is compatible with cloze strategy.

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to validate.	required
`item_template`	`ItemTemplate`	Template defining task.	required

Raises:

Type	Description
`ValueError`	If validation fails.

`simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> dict[str, str]` ¶

Simulate cloze response using MLM scores.

For each unfilled slot, selects the filler with highest MLM score. Falls back to random selection or metadata if MLM scores unavailable.

Parameters:

Name	Type	Description	Default
`item`	`Item`	Item to annotate.	required
`item_template`	`ItemTemplate`	Template defining task constraints.	required
`model_output_key`	`str`	Key identifying which model outputs to use (e.g., "mlm_score").	required
`rng`	`RandomState`	Random number generator for stochasticity.	required

Returns:

Type	Description
`dict[str, str]`	Dictionary mapping slot names to selected fillers.

Examples:

>>> response = {"determiner": "the", "verb": "chased", "object": "mouse"}

bead.simulation¶

Runner¶

runner ¶

SimulationRunner ¶

run(items: list[Item], templates: list[ItemTemplate] | ItemTemplate) -> dict[str, list[str | int | float | list[str]]] ¶

save_results(results: dict[str, list[str | int | float | list[str]]]) -> None ¶

Annotators¶

base ¶

SimulatedAnnotator ¶

from_config(config: SimulatedAnnotatorConfig) -> SimulatedAnnotator classmethod ¶

annotate(item: Item, item_template: ItemTemplate) -> str | int | float | list[str] abstractmethod ¶

annotate_batch(items: list[Item], item_templates: list[ItemTemplate] | ItemTemplate) -> dict[str, str | int | float | list[str]] ¶

get_strategy(task_type: str) -> SimulationStrategy ¶

oracle ¶

OracleAnnotator ¶

annotate(item: Item, item_template: ItemTemplate) -> str | int | float | bool | list[str] ¶

random ¶

RandomAnnotator ¶

annotate(item: Item, item_template: ItemTemplate) -> str | int | float | bool | list[str] ¶

lm_based ¶

LMBasedAnnotator ¶

annotate(item: Item, item_template: ItemTemplate) -> str | int | float | list[str] ¶

distance_based ¶

DistanceBasedAnnotator ¶

annotate(item: Item, item_template: ItemTemplate) -> str | int | float | bool | list[str] ¶

Noise Models¶

base ¶

NoiseModel ¶

apply(value: str | int | float | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | list[str] abstractmethod ¶

temperature ¶

TemperatureNoiseModel ¶

apply(value: str | int | float | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | list[str] ¶

random_noise ¶

RandomNoiseModel ¶

apply(value: str | int | float | bool | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | bool | list[str] ¶

systematic ¶

SystematicNoiseModel ¶

apply(value: str | int | float | bool | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | bool | list[str] ¶

Task-Specific Strategies¶

base ¶

SimulationStrategy ¶

supported_task_type: str abstractmethod property ¶

validate_item(item: Item, item_template: ItemTemplate) -> None abstractmethod ¶

simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str | int | float | list[str] abstractmethod ¶

extract_model_outputs(item: Item, key: str, required_count: int | None = None) -> list[float] | None ¶

forced_choice ¶

ForcedChoiceStrategy ¶

supported_task_type: str property ¶

validate_item(item: Item, item_template: ItemTemplate) -> None ¶

simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str ¶

ordinal_scale ¶

OrdinalScaleStrategy ¶

supported_task_type: str property ¶

validate_item(item: Item, item_template: ItemTemplate) -> None ¶

simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> int ¶

binary ¶

BinaryStrategy ¶

supported_task_type: str property ¶

validate_item(item: Item, item_template: ItemTemplate) -> None ¶

simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> bool ¶

categorical ¶

CategoricalStrategy ¶

supported_task_type: str property ¶

validate_item(item: Item, item_template: ItemTemplate) -> None ¶

simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str ¶

multi_select ¶

MultiSelectStrategy ¶

supported_task_type: str property ¶

validate_item(item: Item, item_template: ItemTemplate) -> None ¶

simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> list[str] ¶

magnitude ¶

MagnitudeStrategy ¶

supported_task_type: str property ¶

validate_item(item: Item, item_template: ItemTemplate) -> None ¶

simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> float ¶

free_text ¶

FreeTextStrategy ¶

supported_task_type: str property ¶

validate_item(item: Item, item_template: ItemTemplate) -> None ¶

simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str ¶

`runner` ¶

`SimulationRunner` ¶

`run(items: list[Item], templates: list[ItemTemplate] | ItemTemplate) -> dict[str, list[str | int | float | list[str]]]` ¶

`save_results(results: dict[str, list[str | int | float | list[str]]]) -> None` ¶

`base` ¶

`SimulatedAnnotator` ¶

`from_config(config: SimulatedAnnotatorConfig) -> SimulatedAnnotator` `classmethod` ¶

`annotate(item: Item, item_template: ItemTemplate) -> str | int | float | list[str]` `abstractmethod` ¶

`annotate_batch(items: list[Item], item_templates: list[ItemTemplate] | ItemTemplate) -> dict[str, str | int | float | list[str]]` ¶

`get_strategy(task_type: str) -> SimulationStrategy` ¶

`oracle` ¶

`OracleAnnotator` ¶

`annotate(item: Item, item_template: ItemTemplate) -> str | int | float | bool | list[str]` ¶

`random` ¶

`RandomAnnotator` ¶

`annotate(item: Item, item_template: ItemTemplate) -> str | int | float | bool | list[str]` ¶

`lm_based` ¶

`LMBasedAnnotator` ¶

`annotate(item: Item, item_template: ItemTemplate) -> str | int | float | list[str]` ¶

`distance_based` ¶

`DistanceBasedAnnotator` ¶

`annotate(item: Item, item_template: ItemTemplate) -> str | int | float | bool | list[str]` ¶

`base` ¶

`NoiseModel` ¶

`apply(value: str | int | float | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | list[str]` `abstractmethod` ¶

`temperature` ¶

`TemperatureNoiseModel` ¶

`apply(value: str | int | float | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | list[str]` ¶

`random_noise` ¶

`RandomNoiseModel` ¶

`apply(value: str | int | float | bool | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | bool | list[str]` ¶

`systematic` ¶

`SystematicNoiseModel` ¶

`apply(value: str | int | float | bool | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | bool | list[str]` ¶

`base` ¶

`SimulationStrategy` ¶

`supported_task_type: str` `abstractmethod` `property` ¶

`validate_item(item: Item, item_template: ItemTemplate) -> None` `abstractmethod` ¶

`simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str | int | float | list[str]` `abstractmethod` ¶

`extract_model_outputs(item: Item, key: str, required_count: int | None = None) -> list[float] | None` ¶

`forced_choice` ¶

`ForcedChoiceStrategy` ¶

`supported_task_type: str` `property` ¶

`validate_item(item: Item, item_template: ItemTemplate) -> None` ¶

`simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str` ¶

`ordinal_scale` ¶

`OrdinalScaleStrategy` ¶

`supported_task_type: str` `property` ¶

`validate_item(item: Item, item_template: ItemTemplate) -> None` ¶

`simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> int` ¶

`binary` ¶

`BinaryStrategy` ¶

`supported_task_type: str` `property` ¶

`validate_item(item: Item, item_template: ItemTemplate) -> None` ¶

`simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> bool` ¶

`categorical` ¶

`CategoricalStrategy` ¶

`supported_task_type: str` `property` ¶

`validate_item(item: Item, item_template: ItemTemplate) -> None` ¶

`simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str` ¶

`multi_select` ¶

`MultiSelectStrategy` ¶

`supported_task_type: str` `property` ¶

`validate_item(item: Item, item_template: ItemTemplate) -> None` ¶

`simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> list[str]` ¶

`magnitude` ¶

`MagnitudeStrategy` ¶

`supported_task_type: str` `property` ¶

`validate_item(item: Item, item_template: ItemTemplate) -> None` ¶

`simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> float` ¶

`free_text` ¶

`FreeTextStrategy` ¶

`supported_task_type: str` `property` ¶

`validate_item(item: Item, item_template: ItemTemplate) -> None` ¶

`simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str` ¶

`cloze` ¶

`ClozeStrategy` ¶

`supported_task_type: str` `property` ¶

`validate_item(item: Item, item_template: ItemTemplate) -> None` ¶

`simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> dict[str, str]` ¶