bead.simulation¶
Framework for simulating annotator responses with configurable noise models and task-specific strategies.
Runner¶
runner
¶
Simulation runner for orchestrating multi-annotator simulations.
SimulationRunner
¶
Orchestrates multi-annotator simulation.
Can simulate: - Multiple independent annotators - Correlated annotators (shared noise component) - Mixed strategies (some LM-based, some random)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
SimulationRunnerConfig
|
Configuration for simulation. |
required |
Examples:
>>> from bead.config.simulation import (
... SimulationRunnerConfig,
... SimulatedAnnotatorConfig,
... )
>>> config = SimulationRunnerConfig(
... annotator_configs=[
... SimulatedAnnotatorConfig(strategy="lm_score", random_state=1),
... SimulatedAnnotatorConfig(strategy="lm_score", random_state=2),
... ],
... n_annotators=2
... )
>>> runner = SimulationRunner(config)
>>> # results = runner.run(items, templates)
run(items: list[Item], templates: list[ItemTemplate] | ItemTemplate) -> dict[str, list[str | int | float | list[str]]]
¶
Run simulation with all annotators.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
items
|
list[Item]
|
Items to annotate. |
required |
templates
|
list[ItemTemplate] | ItemTemplate
|
Templates (one per item or shared). |
required |
Returns:
| Type | Description |
|---|---|
dict[str, list[str | int | float | list[str]]]
|
Results: { "item_ids": [...], "annotator_0": [...], "annotator_1": [...], ... } |
save_results(results: dict[str, list[str | int | float | list[str]]]) -> None
¶
Save results to file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results
|
dict[str, list[str | int | float | list[str]]]
|
Simulation results. |
required |
Annotators¶
base
¶
Base class for simulated annotators.
SimulatedAnnotator
¶
Bases: ABC
Abstract base for simulated annotators.
An annotator combines: - Task-specific strategy (how to respond to each task type) - Noise model (how to add human-like variability) - Configuration (model output keys, random seed, etc.)
The annotator orchestrates the simulation process and provides a unified interface for generating judgments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
SimulatedAnnotatorConfig
|
Configuration for annotator. |
required |
random_state
|
int | None
|
Random seed (overrides config if provided). |
None
|
from_config(config: SimulatedAnnotatorConfig) -> SimulatedAnnotator
classmethod
¶
Create annotator from configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
SimulatedAnnotatorConfig
|
Configuration specifying annotator type and parameters. |
required |
Returns:
| Type | Description |
|---|---|
SimulatedAnnotator
|
Configured annotator instance. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If strategy is unknown. |
Examples:
annotate(item: Item, item_template: ItemTemplate) -> str | int | float | list[str]
abstractmethod
¶
Generate annotation for single item.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to annotate. |
required |
item_template
|
ItemTemplate
|
Template defining task structure. |
required |
Returns:
| Type | Description |
|---|---|
str | int | float | list[str]
|
Annotation (format depends on task type). |
annotate_batch(items: list[Item], item_templates: list[ItemTemplate] | ItemTemplate) -> dict[str, str | int | float | list[str]]
¶
Generate annotations for batch of items.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
items
|
list[Item]
|
Items to annotate. |
required |
item_templates
|
list[ItemTemplate] | ItemTemplate
|
Templates (one per item or single template for all). |
required |
Returns:
| Type | Description |
|---|---|
dict[str, str | int | float | list[str]]
|
Mapping from item ID to annotation. |
Examples:
get_strategy(task_type: str) -> SimulationStrategy
¶
Get strategy for task type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_type
|
str
|
Task type (e.g., "forced_choice"). |
required |
Returns:
| Type | Description |
|---|---|
SimulationStrategy
|
Strategy for this task type. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If task type not supported. |
oracle
¶
Oracle (perfect performance) annotator.
OracleAnnotator
¶
Bases: SimulatedAnnotator
Perfect performance annotator using ground truth.
Returns ground truth labels from item.item_metadata['ground_truth']. Falls back to random when ground truth is not available.
Useful for establishing upper bound on performance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
SimulatedAnnotatorConfig
|
Configuration for annotator. |
required |
Examples:
>>> from bead.config.simulation import SimulatedAnnotatorConfig
>>> config = SimulatedAnnotatorConfig(strategy="oracle", random_state=42)
>>> annotator = OracleAnnotator(config)
>>> # judgment = annotator.annotate(item, template)
annotate(item: Item, item_template: ItemTemplate) -> str | int | float | bool | list[str]
¶
Generate oracle annotation using ground truth.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to annotate. |
required |
item_template
|
ItemTemplate
|
Template defining task. |
required |
Returns:
| Type | Description |
|---|---|
str | int | float | bool | list[str]
|
Ground truth annotation or random fallback. |
random
¶
Random baseline annotator.
RandomAnnotator
¶
Bases: SimulatedAnnotator
Pure random baseline annotator.
Generates random responses that respect task spec constraints (options, scale ranges, etc.) but are otherwise uninformed.
Useful for establishing baseline performance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
SimulatedAnnotatorConfig
|
Configuration for annotator. |
required |
Examples:
>>> from bead.config.simulation import SimulatedAnnotatorConfig
>>> config = SimulatedAnnotatorConfig(strategy="random", random_state=42)
>>> annotator = RandomAnnotator(config)
>>> # judgment = annotator.annotate(item, template)
annotate(item: Item, item_template: ItemTemplate) -> str | int | float | bool | list[str]
¶
Generate random annotation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to annotate (ignored). |
required |
item_template
|
ItemTemplate
|
Template defining task constraints. |
required |
Returns:
| Type | Description |
|---|---|
str | int | float | bool | list[str]
|
Random annotation (format depends on task type). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If task type is not supported. |
lm_based
¶
LM score-based annotator.
LMBasedAnnotator
¶
Bases: SimulatedAnnotator
Annotator using language model scores for decisions.
Uses LM log probabilities or scores from Item.model_outputs to make informed decisions. Applies noise model for variability.
Supports all task types via pluggable strategies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
SimulatedAnnotatorConfig
|
Configuration for annotator. |
required |
Examples:
>>> from bead.config.simulation import SimulatedAnnotatorConfig, NoiseModelConfig
>>> config = SimulatedAnnotatorConfig(
... strategy="lm_score",
... model_output_key="lm_score",
... noise_model=NoiseModelConfig(noise_type="temperature", temperature=1.5)
... )
>>> annotator = LMBasedAnnotator(config)
>>> # judgment = annotator.annotate(item, template)
annotate(item: Item, item_template: ItemTemplate) -> str | int | float | list[str]
¶
Generate annotation using LM scores.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to annotate. |
required |
item_template
|
ItemTemplate
|
Template defining task. |
required |
Returns:
| Type | Description |
|---|---|
str | int | float | list[str]
|
Annotation (format depends on task type). |
distance_based
¶
Distance-based annotator using embeddings.
DistanceBasedAnnotator
¶
Bases: SimulatedAnnotator
Annotator using embedding distances for decisions.
Uses embeddings from Item.model_outputs to compute similarity/distance metrics, then makes decisions based on those distances.
For forced choice, selects option with lowest distance (highest similarity). For ordinal scales, maps distance to scale values. For binary, thresholds distance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
SimulatedAnnotatorConfig
|
Configuration for annotator. |
required |
Examples:
>>> from bead.config.simulation import SimulatedAnnotatorConfig, NoiseModelConfig
>>> config = SimulatedAnnotatorConfig(
... strategy="distance",
... model_output_key="embedding",
... noise_model=NoiseModelConfig(noise_type="none")
... )
>>> annotator = DistanceBasedAnnotator(config)
>>> # judgment = annotator.annotate(item, template)
annotate(item: Item, item_template: ItemTemplate) -> str | int | float | bool | list[str]
¶
Generate annotation using embedding distances.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to annotate. |
required |
item_template
|
ItemTemplate
|
Template defining task. |
required |
Returns:
| Type | Description |
|---|---|
str | int | float | bool | list[str]
|
Annotation (format depends on task type). |
Notes
For distance-based decisions, we convert embeddings to scores: - Cosine similarity ranges from -1 (opposite) to 1 (identical) - We convert to "score" by: score = similarity * 10 - This allows reuse of existing strategies
Noise Models¶
base
¶
Base class for noise models.
NoiseModel
¶
Bases: ABC
Abstract base for noise models.
Noise models add human-like variability to simulated responses. They can: - Scale probabilities by temperature - Add systematic biases (length, frequency, position) - Inject random noise
apply(value: str | int | float | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | list[str]
abstractmethod
¶
Apply noise to value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str | int | float | list[str]
|
Original value (probability, score, choice, etc.). |
required |
context
|
dict[str, str | int | float | bool | list[str]]
|
Additional context (item, template, strategy, etc.). |
required |
rng
|
RandomState
|
Random number generator. |
required |
Returns:
| Type | Description |
|---|---|
str | int | float | list[str]
|
Value with noise applied. |
temperature
¶
Temperature-based noise model.
TemperatureNoiseModel
¶
Bases: NoiseModel
Temperature scaling for probability distributions.
Scales logits or probabilities by temperature before sampling: - temperature < 1.0: More deterministic (sharper distribution) - temperature = 1.0: No change - temperature > 1.0: More random (flatter distribution)
For forced choice, modifies the softmax: P_i = exp(score_i / T) / sum(exp(score_j / T))
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
temperature
|
float
|
Temperature scaling factor (> 0). Default: 1.0. |
1.0
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If temperature <= 0. |
Examples:
apply(value: str | int | float | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | list[str]
¶
Apply temperature scaling.
For forced_choice, re-samples with scaled probabilities. For ordinal_scale, adds scaled noise to value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str | int | float | list[str]
|
Original value (choice, rating, etc.). |
required |
context
|
dict[str, str | int | float | bool | list[str]]
|
Context with item, template, strategy. |
required |
rng
|
RandomState
|
Random number generator. |
required |
Returns:
| Type | Description |
|---|---|
str | int | float | list[str]
|
Value with temperature applied. |
random_noise
¶
Random noise injection model.
RandomNoiseModel
¶
Bases: NoiseModel
Random noise injection model.
Adds random noise to responses: - Gaussian noise for numeric values - Uniform noise for numeric values - Random flipping for choice tasks
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
noise_type
|
str
|
Type of noise ("gaussian" or "uniform"). Default: "gaussian". |
'gaussian'
|
strength
|
float
|
Noise strength (stddev for gaussian, range for uniform). Default: 1.0. |
1.0
|
Examples:
>>> noise_model = RandomNoiseModel(noise_type="gaussian", strength=0.5)
>>> # Adds gaussian noise with stddev=0.5 to numeric responses
apply(value: str | int | float | bool | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | bool | list[str]
¶
Apply random noise.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str | int | float | bool | list[str]
|
Original value. |
required |
context
|
dict
|
Context with item, template, strategy. |
required |
rng
|
RandomState
|
Random number generator. |
required |
Returns:
| Type | Description |
|---|---|
str | int | float | bool | list[str]
|
Value with noise applied. |
systematic
¶
Systematic bias noise model.
SystematicNoiseModel
¶
Bases: NoiseModel
Systematic bias noise model.
Adds consistent biases to responses: - length: Prefer shorter/longer options - frequency: Prefer common/rare words - position: Prefer first/last option - endpoint: Prefer endpoints on ordinal scales - midpoint: Prefer midpoint on ordinal scales
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bias_type
|
str
|
Type of bias ("length", "frequency", "position", "endpoint", "midpoint"). Default: "position". |
'position'
|
bias_strength
|
float
|
Strength of bias (0.0-1.0). Default: 0.0. |
0.0
|
Examples:
>>> noise_model = SystematicNoiseModel(bias_type="position", bias_strength=0.3)
>>> # Adds 30% bias toward first option in forced choice
apply(value: str | int | float | bool | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | bool | list[str]
¶
Apply systematic bias.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str | int | float | bool | list[str]
|
Original value. |
required |
context
|
dict
|
Context with item, template, strategy. |
required |
rng
|
RandomState
|
Random number generator. |
required |
Returns:
| Type | Description |
|---|---|
str | int | float | bool | list[str]
|
Value with bias applied. |
Task-Specific Strategies¶
base
¶
Base class for simulation strategies.
SimulationStrategy
¶
Bases: ABC
Abstract base for task-specific simulation strategies.
Each strategy handles one task type (forced_choice, ordinal_scale, etc.) and converts model outputs into appropriate responses.
Strategies should: 1. Validate item compatibility with task type 2. Extract relevant model outputs 3. Generate response in correct format for task 4. Handle missing model outputs gracefully
supported_task_type: str
abstractmethod
property
¶
Return supported task type (e.g., 'forced_choice').
Returns:
| Type | Description |
|---|---|
str
|
Task type identifier. |
validate_item(item: Item, item_template: ItemTemplate) -> None
abstractmethod
¶
Validate item is compatible with this strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to validate. |
required |
item_template
|
ItemTemplate
|
Template defining task structure. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If item incompatible with this strategy. |
simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str | int | float | list[str]
abstractmethod
¶
Generate simulated response for item.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to respond to. |
required |
item_template
|
ItemTemplate
|
Template defining task structure. |
required |
model_output_key
|
str
|
Key to extract from model outputs. |
required |
rng
|
RandomState
|
Random number generator. |
required |
Returns:
| Type | Description |
|---|---|
str | int | float | list[str]
|
Simulated response (format depends on task type). |
extract_model_outputs(item: Item, key: str, required_count: int | None = None) -> list[float] | None
¶
Extract model outputs from item.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to extract from. |
required |
key
|
str
|
Key to look for. |
required |
required_count
|
int | None
|
Expected number of outputs. |
None
|
Returns:
| Type | Description |
|---|---|
list[float] | None
|
Extracted values or None if missing. |
forced_choice
¶
Forced choice simulation strategy.
ForcedChoiceStrategy
¶
Bases: SimulationStrategy
Strategy for forced_choice tasks (n-AFC).
Handles 2AFC, 3AFC, 4AFC, etc. Uses model outputs to compute preference probabilities, then samples categorically.
For 2AFC with LM scores: P(choose A) = sigmoid((score_A - score_B) / temperature)
For n-AFC with LM scores: P(choose i) = softmax([score_1, ..., score_n] / temperature)[i]
Examples:
supported_task_type: str
property
¶
Return 'forced_choice'.
Returns:
| Type | Description |
|---|---|
str
|
Task type identifier. |
validate_item(item: Item, item_template: ItemTemplate) -> None
¶
Validate item for forced choice.
Checks: - task_type is 'forced_choice' - task_spec.options is defined - Item has appropriate model outputs OR can fall back
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to validate. |
required |
item_template
|
ItemTemplate
|
Template defining task. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If validation fails. |
simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str
¶
Generate forced choice response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to respond to. |
required |
item_template
|
ItemTemplate
|
Template defining task. |
required |
model_output_key
|
str
|
Key for model outputs (e.g., "lm_score"). |
required |
rng
|
RandomState
|
Random number generator. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Chosen option name. |
ordinal_scale
¶
Ordinal scale simulation strategy.
OrdinalScaleStrategy
¶
Bases: SimulationStrategy
Strategy for ordinal_scale tasks (Likert scales).
Handles discrete ordinal scales (e.g., 1-7, 1-5). Maps model outputs to scale positions, then samples with noise around that position.
For ordinal scales with LM score: - Map score to continuous position on scale - Add noise - Round to nearest integer within bounds
Examples:
supported_task_type: str
property
¶
Return 'ordinal_scale'.
Returns:
| Type | Description |
|---|---|
str
|
Task type identifier. |
validate_item(item: Item, item_template: ItemTemplate) -> None
¶
Validate item for ordinal scale.
Checks: - task_type is 'ordinal_scale' - task_spec.scale_bounds is defined - scale_bounds has valid min/max
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to validate. |
required |
item_template
|
ItemTemplate
|
Template defining task. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If validation fails. |
simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> int
¶
Generate ordinal scale response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to respond to. |
required |
item_template
|
ItemTemplate
|
Template defining task. |
required |
model_output_key
|
str
|
Key for model outputs (e.g., "lm_score"). |
required |
rng
|
RandomState
|
Random number generator. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Rating on ordinal scale. |
binary
¶
Binary choice simulation strategy.
BinaryStrategy
¶
Bases: SimulationStrategy
Strategy for binary tasks (yes/no, true/false).
Uses model outputs to compute probability of "yes" response, then samples from Bernoulli distribution.
For binary tasks with LM score: P(yes) = sigmoid(score / temperature)
Examples:
supported_task_type: str
property
¶
Return 'binary'.
Returns:
| Type | Description |
|---|---|
str
|
Task type identifier. |
validate_item(item: Item, item_template: ItemTemplate) -> None
¶
Validate item for binary choice.
Checks: - task_type is 'binary' - Item has appropriate model outputs OR can fall back
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to validate. |
required |
item_template
|
ItemTemplate
|
Template defining task. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If validation fails. |
simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> bool
¶
Generate binary response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to respond to. |
required |
item_template
|
ItemTemplate
|
Template defining task. |
required |
model_output_key
|
str
|
Key for model outputs (e.g., "lm_score"). |
required |
rng
|
RandomState
|
Random number generator. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
Binary response (True/False). |
categorical
¶
Categorical choice simulation strategy.
CategoricalStrategy
¶
Bases: SimulationStrategy
Strategy for categorical tasks (unordered multi-class).
Similar to forced_choice but for unordered categories (e.g., NLI labels, sentiment classes). Uses softmax over model outputs.
For categorical with LM scores: P(category_i) = softmax([score_1, ..., score_n] / temperature)[i]
Examples:
supported_task_type: str
property
¶
Return 'categorical'.
Returns:
| Type | Description |
|---|---|
str
|
Task type identifier. |
validate_item(item: Item, item_template: ItemTemplate) -> None
¶
Validate item for categorical choice.
Checks: - task_type is 'categorical' - task_spec.options is defined - At least 2 options available
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to validate. |
required |
item_template
|
ItemTemplate
|
Template defining task. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If validation fails. |
simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str
¶
Generate categorical response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to respond to. |
required |
item_template
|
ItemTemplate
|
Template defining task. |
required |
model_output_key
|
str
|
Key for model outputs (e.g., "lm_score"). |
required |
rng
|
RandomState
|
Random number generator. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Chosen category name. |
multi_select
¶
Multi-select simulation strategy.
MultiSelectStrategy
¶
Bases: SimulationStrategy
Strategy for multi_select tasks.
Handles tasks where multiple options can be selected independently. Uses model outputs to compute independent selection probabilities for each option via sigmoid.
For each option i: P(select option i) = sigmoid(score_i / temperature)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
threshold
|
float
|
Probability threshold for selection. Default: 0.5. |
0.5
|
temperature
|
float
|
Temperature for scaling decisions. Default: 1.0. |
1.0
|
Examples:
supported_task_type: str
property
¶
Return 'multi_select'.
validate_item(item: Item, item_template: ItemTemplate) -> None
¶
Validate item for multi-select.
Checks: - task_type is 'multi_select' - task_spec.options is defined - At least 2 options
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to validate. |
required |
item_template
|
ItemTemplate
|
Template defining task. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If validation fails. |
simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> list[str]
¶
Generate multi-select response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to respond to. |
required |
item_template
|
ItemTemplate
|
Template defining task. |
required |
model_output_key
|
str
|
Key for model outputs (e.g., "lm_score"). |
required |
rng
|
RandomState
|
Random number generator. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
List of selected option names. |
magnitude
¶
Magnitude estimation simulation strategy.
MagnitudeStrategy
¶
Bases: SimulationStrategy
Strategy for magnitude estimation tasks.
Handles unbounded numeric magnitude estimation. Converts model outputs (typically LM scores) to positive magnitude values.
For LM scores (typically negative log probabilities): magnitude = exp(-score / scale_factor)
This maps: - Better scores (less negative) -> larger magnitudes - Worse scores (more negative) -> smaller magnitudes
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scale_factor
|
float
|
Scaling factor for converting scores to magnitudes. Higher values produce more variation. Default: 10.0. |
10.0
|
Examples:
supported_task_type: str
property
¶
Return 'magnitude'.
validate_item(item: Item, item_template: ItemTemplate) -> None
¶
Validate item for magnitude estimation.
Checks: - task_type is 'magnitude' - Item has model outputs OR can fall back
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to validate. |
required |
item_template
|
ItemTemplate
|
Template defining task. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If validation fails. |
simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> float
¶
Generate magnitude estimation response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to respond to. |
required |
item_template
|
ItemTemplate
|
Template defining task. |
required |
model_output_key
|
str
|
Key for model outputs (e.g., "lm_score"). |
required |
rng
|
RandomState
|
Random number generator. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Estimated magnitude (positive value). |
free_text
¶
Free text simulation strategy.
FreeTextStrategy
¶
Bases: SimulationStrategy
Strategy for free_text tasks.
Handles free text generation using rule-based approaches. For simulations, this typically: - Extracts text from rendered_elements - Uses templates if provided - Falls back to simple defaults
Note: This is a simplified implementation for simulation purposes. For realistic free text generation, consider using LLMs.
Examples:
supported_task_type: str
property
¶
Return 'free_text'.
validate_item(item: Item, item_template: ItemTemplate) -> None
¶
Validate item for free text.
Checks: - task_type is 'free_text'
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to validate. |
required |
item_template
|
ItemTemplate
|
Template defining task. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If validation fails. |
simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str
¶
Generate free text response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to respond to. |
required |
item_template
|
ItemTemplate
|
Template defining task. |
required |
model_output_key
|
str
|
Key for model outputs (unused for free text). |
required |
rng
|
RandomState
|
Random number generator. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Generated text response. |
cloze
¶
Cloze (fill-in-the-blank) simulation strategy using MLM scores.
ClozeStrategy
¶
Bases: SimulationStrategy
MLM-based strategy for cloze (fill-in-the-blank) tasks.
Uses masked language model scores to select fillers for unfilled slots. For constrained slots (with specific options), selects highest-scoring option. For unconstrained slots, uses rendered_elements or metadata as fallback.
The strategy expects model outputs to contain MLM scores for each slot, stored as separate ModelOutput instances with operation="mlm_score" and inputs containing {"slot_name": slot_name, "candidate": candidate_value}.
Examples:
>>> from bead.simulation.strategies.cloze import ClozeStrategy
>>> strategy = ClozeStrategy()
>>> # item with unfilled_slots and model_outputs with MLM scores
>>> # response = strategy.simulate_response(item, template, "mlm_score", rng)
>>> # Returns: {"determiner": "the", "verb": "chased", "object": "mouse"}
supported_task_type: str
property
¶
Get supported task type.
Returns:
| Type | Description |
|---|---|
str
|
Always returns "cloze". |
validate_item(item: Item, item_template: ItemTemplate) -> None
¶
Validate item is compatible with cloze strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to validate. |
required |
item_template
|
ItemTemplate
|
Template defining task. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If validation fails. |
simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> dict[str, str]
¶
Simulate cloze response using MLM scores.
For each unfilled slot, selects the filler with highest MLM score. Falls back to random selection or metadata if MLM scores unavailable.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Item
|
Item to annotate. |
required |
item_template
|
ItemTemplate
|
Template defining task constraints. |
required |
model_output_key
|
str
|
Key identifying which model outputs to use (e.g., "mlm_score"). |
required |
rng
|
RandomState
|
Random number generator for stochasticity. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, str]
|
Dictionary mapping slot names to selected fillers. |
Examples: