Skip to content

bead.simulation

Framework for simulating annotator responses with configurable noise models and task-specific strategies.

Runner

runner

Simulation runner for orchestrating multi-annotator simulations.

SimulationRunner

Orchestrates multi-annotator simulation.

Can simulate: - Multiple independent annotators - Correlated annotators (shared noise component) - Mixed strategies (some LM-based, some random)

Parameters:

Name Type Description Default
config SimulationRunnerConfig

Configuration for simulation.

required

Examples:

>>> from bead.config.simulation import (
...     SimulationRunnerConfig,
...     SimulatedAnnotatorConfig,
... )
>>> config = SimulationRunnerConfig(
...     annotator_configs=[
...         SimulatedAnnotatorConfig(strategy="lm_score", random_state=1),
...         SimulatedAnnotatorConfig(strategy="lm_score", random_state=2),
...     ],
...     n_annotators=2
... )
>>> runner = SimulationRunner(config)
>>> # results = runner.run(items, templates)

run(items: list[Item], templates: list[ItemTemplate] | ItemTemplate) -> dict[str, list[str | int | float | list[str]]]

Run simulation with all annotators.

Parameters:

Name Type Description Default
items list[Item]

Items to annotate.

required
templates list[ItemTemplate] | ItemTemplate

Templates (one per item or shared).

required

Returns:

Type Description
dict[str, list[str | int | float | list[str]]]

Results: { "item_ids": [...], "annotator_0": [...], "annotator_1": [...], ... }

save_results(results: dict[str, list[str | int | float | list[str]]]) -> None

Save results to file.

Parameters:

Name Type Description Default
results dict[str, list[str | int | float | list[str]]]

Simulation results.

required

Annotators

base

Base class for simulated annotators.

SimulatedAnnotator

Bases: ABC

Abstract base for simulated annotators.

An annotator combines: - Task-specific strategy (how to respond to each task type) - Noise model (how to add human-like variability) - Configuration (model output keys, random seed, etc.)

The annotator orchestrates the simulation process and provides a unified interface for generating judgments.

Parameters:

Name Type Description Default
config SimulatedAnnotatorConfig

Configuration for annotator.

required
random_state int | None

Random seed (overrides config if provided).

None

from_config(config: SimulatedAnnotatorConfig) -> SimulatedAnnotator classmethod

Create annotator from configuration.

Parameters:

Name Type Description Default
config SimulatedAnnotatorConfig

Configuration specifying annotator type and parameters.

required

Returns:

Type Description
SimulatedAnnotator

Configured annotator instance.

Raises:

Type Description
ValueError

If strategy is unknown.

Examples:

>>> from bead.config.simulation import SimulatedAnnotatorConfig
>>> config = SimulatedAnnotatorConfig(strategy="lm_score")
>>> annotator = SimulatedAnnotator.from_config(config)

annotate(item: Item, item_template: ItemTemplate) -> str | int | float | list[str] abstractmethod

Generate annotation for single item.

Parameters:

Name Type Description Default
item Item

Item to annotate.

required
item_template ItemTemplate

Template defining task structure.

required

Returns:

Type Description
str | int | float | list[str]

Annotation (format depends on task type).

annotate_batch(items: list[Item], item_templates: list[ItemTemplate] | ItemTemplate) -> dict[str, str | int | float | list[str]]

Generate annotations for batch of items.

Parameters:

Name Type Description Default
items list[Item]

Items to annotate.

required
item_templates list[ItemTemplate] | ItemTemplate

Templates (one per item or single template for all).

required

Returns:

Type Description
dict[str, str | int | float | list[str]]

Mapping from item ID to annotation.

Examples:

>>> annotations = annotator.annotate_batch(items, template)
>>> annotations[str(items[0].id)]
'option_a'

get_strategy(task_type: str) -> SimulationStrategy

Get strategy for task type.

Parameters:

Name Type Description Default
task_type str

Task type (e.g., "forced_choice").

required

Returns:

Type Description
SimulationStrategy

Strategy for this task type.

Raises:

Type Description
ValueError

If task type not supported.

oracle

Oracle (perfect performance) annotator.

OracleAnnotator

Bases: SimulatedAnnotator

Perfect performance annotator using ground truth.

Returns ground truth labels from item.item_metadata['ground_truth']. Falls back to random when ground truth is not available.

Useful for establishing upper bound on performance.

Parameters:

Name Type Description Default
config SimulatedAnnotatorConfig

Configuration for annotator.

required

Examples:

>>> from bead.config.simulation import SimulatedAnnotatorConfig
>>> config = SimulatedAnnotatorConfig(strategy="oracle", random_state=42)
>>> annotator = OracleAnnotator(config)
>>> # judgment = annotator.annotate(item, template)

annotate(item: Item, item_template: ItemTemplate) -> str | int | float | bool | list[str]

Generate oracle annotation using ground truth.

Parameters:

Name Type Description Default
item Item

Item to annotate.

required
item_template ItemTemplate

Template defining task.

required

Returns:

Type Description
str | int | float | bool | list[str]

Ground truth annotation or random fallback.

random

Random baseline annotator.

RandomAnnotator

Bases: SimulatedAnnotator

Pure random baseline annotator.

Generates random responses that respect task spec constraints (options, scale ranges, etc.) but are otherwise uninformed.

Useful for establishing baseline performance.

Parameters:

Name Type Description Default
config SimulatedAnnotatorConfig

Configuration for annotator.

required

Examples:

>>> from bead.config.simulation import SimulatedAnnotatorConfig
>>> config = SimulatedAnnotatorConfig(strategy="random", random_state=42)
>>> annotator = RandomAnnotator(config)
>>> # judgment = annotator.annotate(item, template)

annotate(item: Item, item_template: ItemTemplate) -> str | int | float | bool | list[str]

Generate random annotation.

Parameters:

Name Type Description Default
item Item

Item to annotate (ignored).

required
item_template ItemTemplate

Template defining task constraints.

required

Returns:

Type Description
str | int | float | bool | list[str]

Random annotation (format depends on task type).

Raises:

Type Description
ValueError

If task type is not supported.

lm_based

LM score-based annotator.

LMBasedAnnotator

Bases: SimulatedAnnotator

Annotator using language model scores for decisions.

Uses LM log probabilities or scores from Item.model_outputs to make informed decisions. Applies noise model for variability.

Supports all task types via pluggable strategies.

Parameters:

Name Type Description Default
config SimulatedAnnotatorConfig

Configuration for annotator.

required

Examples:

>>> from bead.config.simulation import SimulatedAnnotatorConfig, NoiseModelConfig
>>> config = SimulatedAnnotatorConfig(
...     strategy="lm_score",
...     model_output_key="lm_score",
...     noise_model=NoiseModelConfig(noise_type="temperature", temperature=1.5)
... )
>>> annotator = LMBasedAnnotator(config)
>>> # judgment = annotator.annotate(item, template)

annotate(item: Item, item_template: ItemTemplate) -> str | int | float | list[str]

Generate annotation using LM scores.

Parameters:

Name Type Description Default
item Item

Item to annotate.

required
item_template ItemTemplate

Template defining task.

required

Returns:

Type Description
str | int | float | list[str]

Annotation (format depends on task type).

distance_based

Distance-based annotator using embeddings.

DistanceBasedAnnotator

Bases: SimulatedAnnotator

Annotator using embedding distances for decisions.

Uses embeddings from Item.model_outputs to compute similarity/distance metrics, then makes decisions based on those distances.

For forced choice, selects option with lowest distance (highest similarity). For ordinal scales, maps distance to scale values. For binary, thresholds distance.

Parameters:

Name Type Description Default
config SimulatedAnnotatorConfig

Configuration for annotator.

required

Examples:

>>> from bead.config.simulation import SimulatedAnnotatorConfig, NoiseModelConfig
>>> config = SimulatedAnnotatorConfig(
...     strategy="distance",
...     model_output_key="embedding",
...     noise_model=NoiseModelConfig(noise_type="none")
... )
>>> annotator = DistanceBasedAnnotator(config)
>>> # judgment = annotator.annotate(item, template)

annotate(item: Item, item_template: ItemTemplate) -> str | int | float | bool | list[str]

Generate annotation using embedding distances.

Parameters:

Name Type Description Default
item Item

Item to annotate.

required
item_template ItemTemplate

Template defining task.

required

Returns:

Type Description
str | int | float | bool | list[str]

Annotation (format depends on task type).

Notes

For distance-based decisions, we convert embeddings to scores: - Cosine similarity ranges from -1 (opposite) to 1 (identical) - We convert to "score" by: score = similarity * 10 - This allows reuse of existing strategies

Noise Models

base

Base class for noise models.

NoiseModel

Bases: ABC

Abstract base for noise models.

Noise models add human-like variability to simulated responses. They can: - Scale probabilities by temperature - Add systematic biases (length, frequency, position) - Inject random noise

apply(value: str | int | float | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | list[str] abstractmethod

Apply noise to value.

Parameters:

Name Type Description Default
value str | int | float | list[str]

Original value (probability, score, choice, etc.).

required
context dict[str, str | int | float | bool | list[str]]

Additional context (item, template, strategy, etc.).

required
rng RandomState

Random number generator.

required

Returns:

Type Description
str | int | float | list[str]

Value with noise applied.

temperature

Temperature-based noise model.

TemperatureNoiseModel

Bases: NoiseModel

Temperature scaling for probability distributions.

Scales logits or probabilities by temperature before sampling: - temperature < 1.0: More deterministic (sharper distribution) - temperature = 1.0: No change - temperature > 1.0: More random (flatter distribution)

For forced choice, modifies the softmax: P_i = exp(score_i / T) / sum(exp(score_j / T))

Parameters:

Name Type Description Default
temperature float

Temperature scaling factor (> 0). Default: 1.0.

1.0

Raises:

Type Description
ValueError

If temperature <= 0.

Examples:

>>> noise_model = TemperatureNoiseModel(temperature=2.0)
>>> # More random decisions

apply(value: str | int | float | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | list[str]

Apply temperature scaling.

For forced_choice, re-samples with scaled probabilities. For ordinal_scale, adds scaled noise to value.

Parameters:

Name Type Description Default
value str | int | float | list[str]

Original value (choice, rating, etc.).

required
context dict[str, str | int | float | bool | list[str]]

Context with item, template, strategy.

required
rng RandomState

Random number generator.

required

Returns:

Type Description
str | int | float | list[str]

Value with temperature applied.

random_noise

Random noise injection model.

RandomNoiseModel

Bases: NoiseModel

Random noise injection model.

Adds random noise to responses: - Gaussian noise for numeric values - Uniform noise for numeric values - Random flipping for choice tasks

Parameters:

Name Type Description Default
noise_type str

Type of noise ("gaussian" or "uniform"). Default: "gaussian".

'gaussian'
strength float

Noise strength (stddev for gaussian, range for uniform). Default: 1.0.

1.0

Examples:

>>> noise_model = RandomNoiseModel(noise_type="gaussian", strength=0.5)
>>> # Adds gaussian noise with stddev=0.5 to numeric responses

apply(value: str | int | float | bool | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | bool | list[str]

Apply random noise.

Parameters:

Name Type Description Default
value str | int | float | bool | list[str]

Original value.

required
context dict

Context with item, template, strategy.

required
rng RandomState

Random number generator.

required

Returns:

Type Description
str | int | float | bool | list[str]

Value with noise applied.

systematic

Systematic bias noise model.

SystematicNoiseModel

Bases: NoiseModel

Systematic bias noise model.

Adds consistent biases to responses: - length: Prefer shorter/longer options - frequency: Prefer common/rare words - position: Prefer first/last option - endpoint: Prefer endpoints on ordinal scales - midpoint: Prefer midpoint on ordinal scales

Parameters:

Name Type Description Default
bias_type str

Type of bias ("length", "frequency", "position", "endpoint", "midpoint"). Default: "position".

'position'
bias_strength float

Strength of bias (0.0-1.0). Default: 0.0.

0.0

Examples:

>>> noise_model = SystematicNoiseModel(bias_type="position", bias_strength=0.3)
>>> # Adds 30% bias toward first option in forced choice

apply(value: str | int | float | bool | list[str], context: dict[str, str | int | float | bool | list[str]], rng: np.random.RandomState) -> str | int | float | bool | list[str]

Apply systematic bias.

Parameters:

Name Type Description Default
value str | int | float | bool | list[str]

Original value.

required
context dict

Context with item, template, strategy.

required
rng RandomState

Random number generator.

required

Returns:

Type Description
str | int | float | bool | list[str]

Value with bias applied.

Task-Specific Strategies

base

Base class for simulation strategies.

SimulationStrategy

Bases: ABC

Abstract base for task-specific simulation strategies.

Each strategy handles one task type (forced_choice, ordinal_scale, etc.) and converts model outputs into appropriate responses.

Strategies should: 1. Validate item compatibility with task type 2. Extract relevant model outputs 3. Generate response in correct format for task 4. Handle missing model outputs gracefully

supported_task_type: str abstractmethod property

Return supported task type (e.g., 'forced_choice').

Returns:

Type Description
str

Task type identifier.

validate_item(item: Item, item_template: ItemTemplate) -> None abstractmethod

Validate item is compatible with this strategy.

Parameters:

Name Type Description Default
item Item

Item to validate.

required
item_template ItemTemplate

Template defining task structure.

required

Raises:

Type Description
ValueError

If item incompatible with this strategy.

simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str | int | float | list[str] abstractmethod

Generate simulated response for item.

Parameters:

Name Type Description Default
item Item

Item to respond to.

required
item_template ItemTemplate

Template defining task structure.

required
model_output_key str

Key to extract from model outputs.

required
rng RandomState

Random number generator.

required

Returns:

Type Description
str | int | float | list[str]

Simulated response (format depends on task type).

extract_model_outputs(item: Item, key: str, required_count: int | None = None) -> list[float] | None

Extract model outputs from item.

Parameters:

Name Type Description Default
item Item

Item to extract from.

required
key str

Key to look for.

required
required_count int | None

Expected number of outputs.

None

Returns:

Type Description
list[float] | None

Extracted values or None if missing.

forced_choice

Forced choice simulation strategy.

ForcedChoiceStrategy

Bases: SimulationStrategy

Strategy for forced_choice tasks (n-AFC).

Handles 2AFC, 3AFC, 4AFC, etc. Uses model outputs to compute preference probabilities, then samples categorically.

For 2AFC with LM scores: P(choose A) = sigmoid((score_A - score_B) / temperature)

For n-AFC with LM scores: P(choose i) = softmax([score_1, ..., score_n] / temperature)[i]

Examples:

>>> strategy = ForcedChoiceStrategy()
>>> strategy.supported_task_type
'forced_choice'

supported_task_type: str property

Return 'forced_choice'.

Returns:

Type Description
str

Task type identifier.

validate_item(item: Item, item_template: ItemTemplate) -> None

Validate item for forced choice.

Checks: - task_type is 'forced_choice' - task_spec.options is defined - Item has appropriate model outputs OR can fall back

Parameters:

Name Type Description Default
item Item

Item to validate.

required
item_template ItemTemplate

Template defining task.

required

Raises:

Type Description
ValueError

If validation fails.

simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str

Generate forced choice response.

Parameters:

Name Type Description Default
item Item

Item to respond to.

required
item_template ItemTemplate

Template defining task.

required
model_output_key str

Key for model outputs (e.g., "lm_score").

required
rng RandomState

Random number generator.

required

Returns:

Type Description
str

Chosen option name.

ordinal_scale

Ordinal scale simulation strategy.

OrdinalScaleStrategy

Bases: SimulationStrategy

Strategy for ordinal_scale tasks (Likert scales).

Handles discrete ordinal scales (e.g., 1-7, 1-5). Maps model outputs to scale positions, then samples with noise around that position.

For ordinal scales with LM score: - Map score to continuous position on scale - Add noise - Round to nearest integer within bounds

Examples:

>>> strategy = OrdinalScaleStrategy()
>>> strategy.supported_task_type
'ordinal_scale'

supported_task_type: str property

Return 'ordinal_scale'.

Returns:

Type Description
str

Task type identifier.

validate_item(item: Item, item_template: ItemTemplate) -> None

Validate item for ordinal scale.

Checks: - task_type is 'ordinal_scale' - task_spec.scale_bounds is defined - scale_bounds has valid min/max

Parameters:

Name Type Description Default
item Item

Item to validate.

required
item_template ItemTemplate

Template defining task.

required

Raises:

Type Description
ValueError

If validation fails.

simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> int

Generate ordinal scale response.

Parameters:

Name Type Description Default
item Item

Item to respond to.

required
item_template ItemTemplate

Template defining task.

required
model_output_key str

Key for model outputs (e.g., "lm_score").

required
rng RandomState

Random number generator.

required

Returns:

Type Description
int

Rating on ordinal scale.

binary

Binary choice simulation strategy.

BinaryStrategy

Bases: SimulationStrategy

Strategy for binary tasks (yes/no, true/false).

Uses model outputs to compute probability of "yes" response, then samples from Bernoulli distribution.

For binary tasks with LM score: P(yes) = sigmoid(score / temperature)

Examples:

>>> strategy = BinaryStrategy()
>>> strategy.supported_task_type
'binary'

supported_task_type: str property

Return 'binary'.

Returns:

Type Description
str

Task type identifier.

validate_item(item: Item, item_template: ItemTemplate) -> None

Validate item for binary choice.

Checks: - task_type is 'binary' - Item has appropriate model outputs OR can fall back

Parameters:

Name Type Description Default
item Item

Item to validate.

required
item_template ItemTemplate

Template defining task.

required

Raises:

Type Description
ValueError

If validation fails.

simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> bool

Generate binary response.

Parameters:

Name Type Description Default
item Item

Item to respond to.

required
item_template ItemTemplate

Template defining task.

required
model_output_key str

Key for model outputs (e.g., "lm_score").

required
rng RandomState

Random number generator.

required

Returns:

Type Description
bool

Binary response (True/False).

categorical

Categorical choice simulation strategy.

CategoricalStrategy

Bases: SimulationStrategy

Strategy for categorical tasks (unordered multi-class).

Similar to forced_choice but for unordered categories (e.g., NLI labels, sentiment classes). Uses softmax over model outputs.

For categorical with LM scores: P(category_i) = softmax([score_1, ..., score_n] / temperature)[i]

Examples:

>>> strategy = CategoricalStrategy()
>>> strategy.supported_task_type
'categorical'

supported_task_type: str property

Return 'categorical'.

Returns:

Type Description
str

Task type identifier.

validate_item(item: Item, item_template: ItemTemplate) -> None

Validate item for categorical choice.

Checks: - task_type is 'categorical' - task_spec.options is defined - At least 2 options available

Parameters:

Name Type Description Default
item Item

Item to validate.

required
item_template ItemTemplate

Template defining task.

required

Raises:

Type Description
ValueError

If validation fails.

simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str

Generate categorical response.

Parameters:

Name Type Description Default
item Item

Item to respond to.

required
item_template ItemTemplate

Template defining task.

required
model_output_key str

Key for model outputs (e.g., "lm_score").

required
rng RandomState

Random number generator.

required

Returns:

Type Description
str

Chosen category name.

multi_select

Multi-select simulation strategy.

MultiSelectStrategy

Bases: SimulationStrategy

Strategy for multi_select tasks.

Handles tasks where multiple options can be selected independently. Uses model outputs to compute independent selection probabilities for each option via sigmoid.

For each option i: P(select option i) = sigmoid(score_i / temperature)

Parameters:

Name Type Description Default
threshold float

Probability threshold for selection. Default: 0.5.

0.5
temperature float

Temperature for scaling decisions. Default: 1.0.

1.0

Examples:

>>> strategy = MultiSelectStrategy()
>>> strategy.supported_task_type
'multi_select'

supported_task_type: str property

Return 'multi_select'.

validate_item(item: Item, item_template: ItemTemplate) -> None

Validate item for multi-select.

Checks: - task_type is 'multi_select' - task_spec.options is defined - At least 2 options

Parameters:

Name Type Description Default
item Item

Item to validate.

required
item_template ItemTemplate

Template defining task.

required

Raises:

Type Description
ValueError

If validation fails.

simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> list[str]

Generate multi-select response.

Parameters:

Name Type Description Default
item Item

Item to respond to.

required
item_template ItemTemplate

Template defining task.

required
model_output_key str

Key for model outputs (e.g., "lm_score").

required
rng RandomState

Random number generator.

required

Returns:

Type Description
list[str]

List of selected option names.

magnitude

Magnitude estimation simulation strategy.

MagnitudeStrategy

Bases: SimulationStrategy

Strategy for magnitude estimation tasks.

Handles unbounded numeric magnitude estimation. Converts model outputs (typically LM scores) to positive magnitude values.

For LM scores (typically negative log probabilities): magnitude = exp(-score / scale_factor)

This maps: - Better scores (less negative) -> larger magnitudes - Worse scores (more negative) -> smaller magnitudes

Parameters:

Name Type Description Default
scale_factor float

Scaling factor for converting scores to magnitudes. Higher values produce more variation. Default: 10.0.

10.0

Examples:

>>> strategy = MagnitudeStrategy()
>>> strategy.supported_task_type
'magnitude'

supported_task_type: str property

Return 'magnitude'.

validate_item(item: Item, item_template: ItemTemplate) -> None

Validate item for magnitude estimation.

Checks: - task_type is 'magnitude' - Item has model outputs OR can fall back

Parameters:

Name Type Description Default
item Item

Item to validate.

required
item_template ItemTemplate

Template defining task.

required

Raises:

Type Description
ValueError

If validation fails.

simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> float

Generate magnitude estimation response.

Parameters:

Name Type Description Default
item Item

Item to respond to.

required
item_template ItemTemplate

Template defining task.

required
model_output_key str

Key for model outputs (e.g., "lm_score").

required
rng RandomState

Random number generator.

required

Returns:

Type Description
float

Estimated magnitude (positive value).

free_text

Free text simulation strategy.

FreeTextStrategy

Bases: SimulationStrategy

Strategy for free_text tasks.

Handles free text generation using rule-based approaches. For simulations, this typically: - Extracts text from rendered_elements - Uses templates if provided - Falls back to simple defaults

Note: This is a simplified implementation for simulation purposes. For realistic free text generation, consider using LLMs.

Examples:

>>> strategy = FreeTextStrategy()
>>> strategy.supported_task_type
'free_text'

supported_task_type: str property

Return 'free_text'.

validate_item(item: Item, item_template: ItemTemplate) -> None

Validate item for free text.

Checks: - task_type is 'free_text'

Parameters:

Name Type Description Default
item Item

Item to validate.

required
item_template ItemTemplate

Template defining task.

required

Raises:

Type Description
ValueError

If validation fails.

simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> str

Generate free text response.

Parameters:

Name Type Description Default
item Item

Item to respond to.

required
item_template ItemTemplate

Template defining task.

required
model_output_key str

Key for model outputs (unused for free text).

required
rng RandomState

Random number generator.

required

Returns:

Type Description
str

Generated text response.

cloze

Cloze (fill-in-the-blank) simulation strategy using MLM scores.

ClozeStrategy

Bases: SimulationStrategy

MLM-based strategy for cloze (fill-in-the-blank) tasks.

Uses masked language model scores to select fillers for unfilled slots. For constrained slots (with specific options), selects highest-scoring option. For unconstrained slots, uses rendered_elements or metadata as fallback.

The strategy expects model outputs to contain MLM scores for each slot, stored as separate ModelOutput instances with operation="mlm_score" and inputs containing {"slot_name": slot_name, "candidate": candidate_value}.

Examples:

>>> from bead.simulation.strategies.cloze import ClozeStrategy
>>> strategy = ClozeStrategy()
>>> # item with unfilled_slots and model_outputs with MLM scores
>>> # response = strategy.simulate_response(item, template, "mlm_score", rng)
>>> # Returns: {"determiner": "the", "verb": "chased", "object": "mouse"}

supported_task_type: str property

Get supported task type.

Returns:

Type Description
str

Always returns "cloze".

validate_item(item: Item, item_template: ItemTemplate) -> None

Validate item is compatible with cloze strategy.

Parameters:

Name Type Description Default
item Item

Item to validate.

required
item_template ItemTemplate

Template defining task.

required

Raises:

Type Description
ValueError

If validation fails.

simulate_response(item: Item, item_template: ItemTemplate, model_output_key: str, rng: np.random.RandomState) -> dict[str, str]

Simulate cloze response using MLM scores.

For each unfilled slot, selects the filler with highest MLM score. Falls back to random selection or metadata if MLM scores unavailable.

Parameters:

Name Type Description Default
item Item

Item to annotate.

required
item_template ItemTemplate

Template defining task constraints.

required
model_output_key str

Key identifying which model outputs to use (e.g., "mlm_score").

required
rng RandomState

Random number generator for stochasticity.

required

Returns:

Type Description
dict[str, str]

Dictionary mapping slot names to selected fillers.

Examples:

>>> response = {"determiner": "the", "verb": "chased", "object": "mouse"}