Skip to content

bead.items

Stage 3 of the bead pipeline: experimental item construction with 9 task types.

Core Classes

item

Data models for constructed experimental items.

UnfilledSlot

Bases: BeadBaseModel

An unfilled slot in a cloze task item.

Represents a slot in a partially filled template where the participant must provide a response. The UI widget for collecting the response is inferred from the slot's constraints at deployment time.

Attributes:

Name Type Description
slot_name str

Name of the unfilled template slot.

position int

Token index position in the rendered text.

constraint_ids list[UUID]

UUIDs of constraints that apply to this slot.

Examples:

>>> from uuid import UUID
>>> # Extensional constraint slot (will render as dropdown)
>>> UnfilledSlot(
...     slot_name="determiner",
...     position=0,
...     constraint_ids=[UUID("12345678-1234-5678-1234-567812345678")]
... )
>>> # Unconstrained slot (will render as text input)
>>> UnfilledSlot(
...     slot_name="adjective",
...     position=2,
...     constraint_ids=[]
... )

validate_slot_name(v: str) -> str classmethod

Validate slot name is not empty.

Parameters:

Name Type Description Default
v str

Slot name to validate.

required

Returns:

Type Description
str

Validated slot name.

Raises:

Type Description
ValueError

If slot name is empty or contains only whitespace.

ModelOutput

Bases: BeadBaseModel

Output from a model computation.

Attributes:

Name Type Description
model_name str

Name/identifier of the model.

model_version str

Version of the model.

operation str

Operation performed (e.g., "log_probability", "nli", "embedding").

inputs dict[str, MetadataValue]

Inputs to the model.

output MetadataValue

Model output.

cache_key str

Cache key for this computation.

computation_metadata dict[str, MetadataValue]

Metadata about the computation (timestamp, device, etc.).

Examples:

>>> output = ModelOutput(
...     model_name="gpt2",
...     model_version="latest",
...     operation="log_probability",
...     inputs={"text": "The cat broke the vase"},
...     output=-12.4,
...     cache_key="abc123..."
... )

validate_non_empty_strings(v: str) -> str classmethod

Validate required string fields are not empty.

Parameters:

Name Type Description Default
v str

String value to validate.

required

Returns:

Type Description
str

Validated string.

Raises:

Type Description
ValueError

If string is empty or contains only whitespace.

Item

Bases: BeadBaseModel

A constructed experimental item.

Items are discrete stimuli presented to participants or models for judgment collection. They are constructed from item templates and filled templates.

Attributes:

Name Type Description
item_template_id UUID

UUID of the item template this was constructed from.

filled_template_refs list[UUID]

UUIDs of filled templates used in this item.

rendered_elements dict[str, str]

Rendered text for each element (by element_name).

options list[str]

Choice options for forced_choice/multi_select tasks. Each string is one option text. Order matters (first option is displayed first).

unfilled_slots list[UnfilledSlot]

Unfilled slots for cloze tasks (UI widgets inferred from constraints).

model_outputs list[ModelOutput]

All model computations for this item.

constraint_satisfaction dict[UUID, bool]

Constraint UUIDs mapped to satisfaction status.

item_metadata dict[str, MetadataValue]

Additional metadata for this item.

spans list[Span]

Span annotations for this item (default: empty).

span_relations list[SpanRelation]

Relations between spans, directed or undirected (default: empty).

tokenized_elements dict[str, list[str]]

Tokenized text for span indexing, keyed by element name (default: empty).

token_space_after dict[str, list[bool]]

Per-token space_after flags for artifact-free rendering (default: empty).

Examples:

>>> # Simple item
>>> item = Item(
...     item_template_id=UUID("..."),
...     filled_template_refs=[UUID("...")],
...     rendered_elements={"sentence": "The cat broke the vase"}
... )
>>> # Forced-choice item with options
>>> fc_item = Item(
...     item_template_id=UUID("..."),
...     options=["The cat sat on the mat.", "The cats sat on the mat."],
...     item_metadata={"n_options": 2}
... )
>>> # Cloze item with unfilled slots
>>> cloze_item = Item(
...     item_template_id=UUID("..."),
...     rendered_elements={"sentence": "The ___ cat ___ the ___"},
...     unfilled_slots=[
...         UnfilledSlot(slot_name="determiner", position=0, constraint_ids=[...]),
...         UnfilledSlot(slot_name="verb", position=2, constraint_ids=[...])
...     ]
... )

validate_span_relations() -> Item

Validate all span_relations reference valid span_ids from spans.

Returns:

Type Description
Item

Validated item.

Raises:

Type Description
ValueError

If a relation references a span_id not present in spans.

get_model_output(model_name: str, operation: str, inputs: dict[str, MetadataValue] | None = None) -> ModelOutput | None

Get a specific model output.

Parameters:

Name Type Description Default
model_name str

Name of the model.

required
operation str

Operation type.

required
inputs dict[str, MetadataValue] | None

Optional input filter.

None

Returns:

Type Description
ModelOutput | None

The model output if found, None otherwise.

Examples:

>>> output = item.get_model_output("gpt2", "log_probability")
>>> if output:
...     print(f"Log prob: {output.output}")

add_model_output(output: ModelOutput) -> None

Add a model output to this item.

Parameters:

Name Type Description Default
output ModelOutput

Model output to add.

required

Examples:

>>> item.add_model_output(my_output)
>>> print(f"Item now has {len(item.model_outputs)} model outputs")

ItemCollection

Bases: BeadBaseModel

A collection of constructed items.

Attributes:

Name Type Description
name str

Name of this collection.

source_template_collection_id UUID

UUID of the source item template collection.

source_filled_collection_id UUID

UUID of the source filled template collection.

items list[Item]

The constructed items.

construction_stats dict[str, int]

Statistics about item construction.

Examples:

>>> collection = ItemCollection(
...     name="acceptability_items",
...     source_template_collection_id=UUID("..."),
...     source_filled_collection_id=UUID("...")
... )
>>> collection.add_item(item)

validate_name(v: str) -> str classmethod

Validate collection name is not empty.

Parameters:

Name Type Description Default
v str

Collection name to validate.

required

Returns:

Type Description
str

Validated collection name.

Raises:

Type Description
ValueError

If name is empty or contains only whitespace.

add_item(item: Item) -> None

Add an item to the collection.

Parameters:

Name Type Description Default
item Item

Item to add.

required

Examples:

>>> collection.add_item(my_item)
>>> print(f"Collection now has {len(collection.items)} items")

item_template

Data models for experimental item templates.

ChunkingSpec

Bases: BeadBaseModel

Specification for text segmentation in incremental presentation.

Defines how to segment text for self-paced reading or timed sequence presentation. Supports character-level, word-level, sentence-level, constituent-based (with parsing), or custom boundary segmentation.

Attributes:

Name Type Description
unit ChunkingUnit

Segmentation unit type. Defaults to "word".

parse_type ParseType | None

Type of parsing for constituent chunking ("constituency" or "dependency").

constituent_labels list[str] | None

Labels for constituent chunking. For constituency parsing, these are constituent types (e.g., ["NP", "VP", "S"]). For dependency parsing, these are dependency relations (e.g., ["nsubj", "dobj", "root"]).

parser Literal['stanza', 'spacy'] | None

Parser library to use for constituent chunking.

parse_language str | None

ISO 639 language code for parser (e.g., "en", "es", "zh").

custom_boundaries list[int] | None

Token indices for custom chunking boundaries.

Examples:

>>> # Word-by-word chunking
>>> ChunkingSpec(unit="word")
>>> # Chunk by noun phrases (constituency)
>>> ChunkingSpec(
...     unit="constituent",
...     parse_type="constituency",
...     constituent_labels=["NP"],
...     parser="stanza",
...     parse_language="en"
... )
>>> # Chunk by subjects and objects (dependency)
>>> ChunkingSpec(
...     unit="constituent",
...     parse_type="dependency",
...     constituent_labels=["nsubj", "dobj"],
...     parser="spacy",
...     parse_language="en"
... )
>>> # Custom boundaries at specific token positions
>>> ChunkingSpec(unit="custom", custom_boundaries=[0, 3, 7, 10])

TimingParams

Bases: BeadBaseModel

Timing parameters for stimulus presentation.

Defines timing constraints for timed sequence presentations, including per-chunk duration, inter-stimulus intervals, and response timeouts.

Attributes:

Name Type Description
duration_ms int | None

Duration in milliseconds to display each chunk (for timed sequences).

isi_ms int | None

Inter-stimulus interval in milliseconds between chunks.

timeout_ms int | None

Maximum time in milliseconds to wait for response.

mask_char str | None

Character to use for masking non-current chunks (e.g., "_").

cumulative bool

If True, show all previous chunks; if False, show only current chunk.

Examples:

>>> # RSVP (Rapid Serial Visual Presentation)
>>> TimingParams(
...     duration_ms=250,
...     isi_ms=50,
...     cumulative=False,
...     mask_char="_"
... )
>>> # Self-paced with timeout
>>> TimingParams(timeout_ms=5000, cumulative=True)

TaskSpec

Bases: BeadBaseModel

Parameters for the response collection task.

Specifies task-specific parameters like prompts, options, scale bounds, validation rules, etc. The appropriate parameters depend on the task_type specified in ItemTemplate. The task_type itself is not included here since it's part of the ItemTemplate structure.

Attributes:

Name Type Description
prompt str

Question or instruction shown to participants.

scale_bounds tuple[int, int] | None

Min and max values for ordinal_scale task.

scale_labels dict[int, str] | None

Optional labels for specific scale points (ordinal_scale).

options list[str] | None

Available options for forced_choice, multi_select, or categorical tasks. For forced_choice/multi_select: element names to choose from. For categorical: category labels.

min_selections int | None

Minimum number of selections required (multi_select only).

max_selections int | None

Maximum number of selections allowed (multi_select only).

text_validation_pattern str | None

Regular expression pattern for validating free_text responses.

max_length int | None

Maximum character length for free_text responses.

span_spec SpanSpec | None

Span labeling specification (for span_labeling tasks or composite tasks with span overlays).

Examples:

>>> # Ordinal scale task (e.g., acceptability rating)
>>> TaskSpec(
...     prompt="How natural does this sentence sound?",
...     scale_bounds=(1, 7),
...     scale_labels={1: "Very unnatural", 7: "Very natural"}
... )
>>> # Categorical task (e.g., NLI)
>>> TaskSpec(
...     prompt="What is the relationship?",
...     options=["Entailment", "Neutral", "Contradiction"]
... )
>>> # Binary task
>>> TaskSpec(
...     prompt="Is this sentence grammatical?"
... )
>>> # Forced choice task (e.g., minimal pair)
>>> TaskSpec(
...     prompt="Which sounds more natural?",
...     options=["sentence_a", "sentence_b"]
... )
>>> # Multi-select task (e.g., select all grammatical)
>>> TaskSpec(
...     prompt="Select all grammatical sentences:",
...     options=["sent_a", "sent_b", "sent_c"],
...     min_selections=1
... )
>>> # Free text task
>>> TaskSpec(
...     prompt="Who performed the action?",
...     max_length=50
... )

validate_prompt(v: str) -> str classmethod

Validate prompt is not empty.

Parameters:

Name Type Description Default
v str

Prompt to validate.

required

Returns:

Type Description
str

Validated prompt.

Raises:

Type Description
ValueError

If prompt is empty or contains only whitespace.

PresentationSpec

Bases: BeadBaseModel

Specification of stimulus presentation method.

Defines how stimuli are displayed to participants (static, self-paced, or timed sequence), including segmentation and timing parameters. Separate from judgment specification to maintain clean separation of concerns.

Attributes:

Name Type Description
mode PresentationMode

Presentation mode (static, self_paced, or timed_sequence). Defaults to "static".

chunking ChunkingSpec

Chunking specification for incremental presentations. Defaults to word-level chunking.

timing TimingParams

Timing parameters for timed presentations. Defaults to cumulative display with no fixed durations.

display_format dict[str, str | int | float | bool]

Additional display formatting options.

tokenizer_config TokenizerConfig | None

Display tokenizer configuration for span annotation. When set, controls how text is tokenized for span indexing and display.

Examples:

>>> # Static presentation (default)
>>> PresentationSpec()
>>> # Self-paced word-by-word reading
>>> PresentationSpec(
...     mode="self_paced",
...     chunking=ChunkingSpec(unit="word")
... )
>>> # Self-paced by noun phrases
>>> PresentationSpec(
...     mode="self_paced",
...     chunking=ChunkingSpec(
...         unit="constituent",
...         parse_type="constituency",
...         constituent_labels=["NP"],
...         parser="stanza",
...         parse_language="en"
...     )
... )
>>> # RSVP (timed sequence)
>>> PresentationSpec(
...     mode="timed_sequence",
...     chunking=ChunkingSpec(unit="word"),
...     timing=TimingParams(duration_ms=250, isi_ms=50, cumulative=False)
... )

ItemElement

Bases: BeadBaseModel

A structured element within an item template.

ItemElements represent distinct parts of a complex item, such as context, target sentence, question, or response options. Elements can be static text or references to filled templates.

Attributes:

Name Type Description
element_type ElementRefType

Type of element ("text" or "filled_template_ref").

element_name str

Unique name for this element within the item.

content str | None

Static text content (for text elements).

filled_template_ref_id UUID | None

UUID of filled template (for reference elements).

element_metadata dict[str, MetadataValue]

Additional element-specific metadata.

order int | None

Display order for this element (optional).

Examples:

>>> # Text element
>>> context = ItemElement(
...     element_type="text",
...     element_name="context",
...     content="Mary loves books.",
...     order=1
... )
>>> # Template reference element
>>> target = ItemElement(
...     element_type="filled_template_ref",
...     element_name="target",
...     filled_template_ref_id=UUID("..."),
...     order=2
... )

is_text: bool property

Check if this is a text element.

Returns:

Type Description
bool

True if element_type is "text".

is_template_ref: bool property

Check if this references a filled template.

Returns:

Type Description
bool

True if element_type is "filled_template_ref".

validate_element_name(v: str) -> str classmethod

Validate element name is not empty.

Parameters:

Name Type Description Default
v str

Element name to validate.

required

Returns:

Type Description
str

Validated element name.

Raises:

Type Description
ValueError

If name is empty or contains only whitespace.

ItemTemplate

Bases: BeadBaseModel

Template specification for constructing experimental items.

ItemTemplate defines how to construct an experimental item with three orthogonal dimensions: what semantic property to measure (judgment_type), how to collect the response (task_type), and how to present the stimulus (presentation_spec).

This is distinct from Template (in bead.resources.structures), which defines linguistic structure. ItemTemplate defines experimental structure.

Attributes:

Name Type Description
name str

Template name (e.g., "acceptability_rating").

description str | None

Human-readable description of this item template.

judgment_type JudgmentType

Semantic property being measured (acceptability, inference, etc.).

task_type TaskType

Response collection method (forced_choice, ordinal_scale, etc.).

elements list[ItemElement]

Elements that compose this item.

constraints list[UUID]

UUIDs of constraints on items (typically model-based).

task_spec TaskSpec

Task-specific parameters (prompt, options, scale bounds, etc.).

presentation_spec PresentationSpec

Specification of how to present stimuli.

presentation_order list[str] | None

Order to present elements (by element_name).

template_metadata dict[str, MetadataValue]

Additional template metadata.

Examples:

>>> # Acceptability judgment with ordinal scale task
>>> template = ItemTemplate(
...     name="acceptability_rating",
...     judgment_type="acceptability",
...     task_type="ordinal_scale",
...     task_spec=TaskSpec(
...         prompt="How natural is this sentence?",
...         scale_bounds=(1, 7),
...         scale_labels={1: "Very unnatural", 7: "Very natural"}
...     ),
...     presentation_spec=PresentationSpec(mode="static"),
...     elements=[
...         ItemElement(
...             element_type="filled_template_ref",
...             element_name="sentence",
...             filled_template_ref_id=UUID("...")
...         )
...     ]
... )
>>> # Minimal pair: acceptability judgment with forced choice task
>>> minimal_pair = ItemTemplate(
...     name="minimal_pair",
...     judgment_type="acceptability",
...     task_type="forced_choice",
...     elements=[
...         ItemElement(
...             element_type="text", element_name="sent_a", content="Who..."
...         ),
...         ItemElement(
...             element_type="text", element_name="sent_b", content="Whom..."
...         )
...     ],
...     task_spec=TaskSpec(
...         prompt="Which sounds more natural?",
...         options=["sent_a", "sent_b"]
...     ),
...     presentation_spec=PresentationSpec(mode="static")
... )
>>> # Odd-man-out: similarity judgment with forced choice task
>>> odd_man_out = ItemTemplate(
...     name="odd_man_out",
...     judgment_type="similarity",
...     task_type="forced_choice",
...     elements=[...],  # 4 elements
...     task_spec=TaskSpec(
...         prompt="Which is most different?",
...         options=["opt_a", "opt_b", "opt_c", "opt_d"]
...     ),
...     presentation_spec=PresentationSpec(mode="static")
... )

validate_name(v: str) -> str classmethod

Validate template name is not empty.

Parameters:

Name Type Description Default
v str

Template name to validate.

required

Returns:

Type Description
str

Validated template name.

Raises:

Type Description
ValueError

If name is empty or contains only whitespace.

validate_unique_element_names(v: list[ItemElement]) -> list[ItemElement] classmethod

Validate all element names are unique within template.

Parameters:

Name Type Description Default
v list[ItemElement]

List of elements to validate.

required

Returns:

Type Description
list[ItemElement]

Validated elements.

Raises:

Type Description
ValueError

If duplicate element names found.

validate_presentation_order(v: list[str] | None, info: ValidationInfo) -> list[str] | None classmethod

Validate presentation_order matches element names.

Parameters:

Name Type Description Default
v list[str] | None

Presentation order list to validate.

required
info ValidationInfo

Pydantic validation info containing other field values.

required

Returns:

Type Description
list[str] | None

Validated presentation order.

Raises:

Type Description
ValueError

If presentation_order contains names not in elements, or is missing names from elements.

get_element_by_name(name: str) -> ItemElement | None

Get an element by its name.

Parameters:

Name Type Description Default
name str

Element name to search for.

required

Returns:

Type Description
ItemElement | None

Element with matching name, or None if not found.

Examples:

>>> elem = template.get_element_by_name("sentence")
>>> if elem:
...     print(elem.element_type)

get_template_ref_elements() -> list[ItemElement]

Get all elements that reference filled templates.

Returns:

Type Description
list[ItemElement]

Elements with element_type="filled_template_ref".

Examples:

>>> refs = template.get_template_ref_elements()
>>> print(f"Found {len(refs)} template references")

ItemTemplateCollection

Bases: BeadBaseModel

A collection of item templates.

Attributes:

Name Type Description
name str

Name of this collection.

description str | None

Description of this collection.

templates list[ItemTemplate]

Item templates in this collection.

Examples:

>>> collection = ItemTemplateCollection(
...     name="acceptability_study",
...     description="Templates for acceptability judgments"
... )
>>> collection.add_template(template)

validate_name(v: str) -> str classmethod

Validate collection name is not empty.

Parameters:

Name Type Description Default
v str

Collection name to validate.

required

Returns:

Type Description
str

Validated collection name.

Raises:

Type Description
ValueError

If name is empty or contains only whitespace.

add_template(template: ItemTemplate) -> None

Add a template to the collection.

Parameters:

Name Type Description Default
template ItemTemplate

Template to add.

required

Examples:

>>> collection.add_template(my_template)
>>> print(f"Collection now has {len(collection.templates)} templates")

Task-Type Utilities

forced_choice

Utilities for creating N-AFC (forced-choice) experimental items.

This module provides language-agnostic utilities for creating forced-choice items where participants select from N alternatives (2AFC, 3AFC, 4AFC, etc.).

create_forced_choice_item(*options: str, item_template_id: UUID | None = None, metadata: dict[str, MetadataValue] | None = None) -> Item

Create an N-AFC (forced-choice) item from N text options.

Parameters:

Name Type Description Default
*options str

Text for each option (2 or more required).

()
item_template_id UUID | None

Template ID for the item. If None, generates new UUID.

None
metadata dict[str, MetadataValue] | None

Additional metadata for item_metadata field.

None

Returns:

Type Description
Item

Forced-choice item with options stored in the options field.

Raises:

Type Description
ValueError

If fewer than 2 options provided.

Examples:

>>> item = create_forced_choice_item(
...     "The cat sat on the mat.",
...     "The cats sat on the mat.",
...     metadata={"contrast": "number"}
... )
>>> item.options[0]
'The cat sat on the mat.'
>>> item.options[1]
'The cats sat on the mat.'
>>> # 4AFC item
>>> item = create_forced_choice_item(
...     "Option A text",
...     "Option B text",
...     "Option C text",
...     "Option D text"
... )
>>> len(item.options)
4

create_forced_choice_items_from_groups(items: list[Item], group_by: Callable[[Item], Any], n_alternatives: int = 2, *, extract_text: Callable[[Item], str] | None = None, include_group_metadata: bool = True, item_template_id: UUID | None = None) -> list[Item]

Create forced-choice items by grouping source items.

Groups items by a property, then creates all N-way combinations within each group as forced-choice items.

Parameters:

Name Type Description Default
items list[Item]

Source items to group and combine.

required
group_by Callable[[Item], Any]

Function to extract grouping key from items.

required
n_alternatives int

Number of alternatives per forced-choice item (default: 2 for 2AFC).

2
extract_text Callable[[Item], str] | None

Function to extract text from item. If None, tries common keys ("text", "sentence", "content") from rendered_elements.

None
include_group_metadata bool

Whether to include group key in item metadata.

True
item_template_id UUID | None

Template ID for all created items. If None, generates one per item.

None

Returns:

Type Description
list[Item]

Forced-choice items created from groupings.

Examples:

Create 2AFC items with same verb (same-verb minimal pairs):

>>> items = [
...     Item(
...         item_template_id=uuid4(),
...         rendered_elements={"text": "She walks."},
...         item_metadata={"verb": "walk", "frame": "intransitive"}
...     ),
...     Item(
...         item_template_id=uuid4(),
...         rendered_elements={"text": "She walks the dog."},
...         item_metadata={"verb": "walk", "frame": "transitive"}
...     )
... ]
>>> fc_items = create_forced_choice_items_from_groups(
...     items,
...     group_by=lambda item: item.item_metadata["verb"],
...     n_alternatives=2
... )
>>> len(fc_items)
1
>>> fc_items[0].rendered_elements["option_a"]
'She walks.'

Create 3AFC items grouped by template:

>>> fc_items = create_forced_choice_items_from_groups(
...     items,
...     group_by=lambda item: item.item_template_id,
...     n_alternatives=3
... )

create_forced_choice_items_cross_product(group1_items: list[Item], group2_items: list[Item], n_from_group1: int = 1, n_from_group2: int = 1, *, extract_text: Callable[[Item], str] | None = None, item_template_id: UUID | None = None, metadata_fn: Callable[[list[Item], list[Item]], dict[str, MetadataValue]] | None = None) -> list[Item]

Create forced-choice items from cross-product of two groups.

Combines n items from group1 with n items from group2 to create (n_from_group1 + n_from_group2)-AFC items.

Parameters:

Name Type Description Default
group1_items list[Item]

Items in first group.

required
group2_items list[Item]

Items in second group.

required
n_from_group1 int

Number of items to select from group1 per combination (default: 1).

1
n_from_group2 int

Number of items to select from group2 per combination (default: 1).

1
extract_text Callable[[Item], str] | None

Function to extract text from items.

None
item_template_id UUID | None

Template ID for all created items.

None
metadata_fn Callable[[list[Item], list[Item]], dict[str, MetadataValue]] | None

Function to generate metadata from (group1_items_used, group2_items_used).

None

Returns:

Type Description
list[Item]

Forced-choice items from cross-product.

Examples:

Create 2AFC items pairing grammatical with ungrammatical:

>>> grammatical = [
...     Item(
...         uuid4(),
...         rendered_elements={"text": "She walks."},
...         item_metadata={"grammatical": True}
...     )
... ]
>>> ungrammatical = [
...     Item(
...         uuid4(),
...         rendered_elements={"text": "She walk."},
...         item_metadata={"grammatical": False}
...     )
... ]
>>> fc_items = create_forced_choice_items_cross_product(
...     grammatical,
...     ungrammatical,
...     n_from_group1=1,
...     n_from_group2=1
... )
>>> len(fc_items)
1

create_filtered_forced_choice_items(items: list[Item], group_by: Callable[[Item], Any], n_alternatives: int = 2, *, item_filter: Callable[[Item], bool] | None = None, group_filter: Callable[[Any, list[Item]], bool] | None = None, combination_filter: Callable[[tuple[Item, ...]], bool] | None = None, extract_text: Callable[[Item], str] | None = None, item_template_id: UUID | None = None) -> list[Item]

Create forced-choice items with multi-level filtering.

Parameters:

Name Type Description Default
items list[Item]

Source items.

required
group_by Callable[[Item], Any]

Grouping function.

required
n_alternatives int

Number of alternatives per item.

2
item_filter Callable[[Item], bool] | None

Filter individual items before grouping.

None
group_filter Callable[[Any, list[Item]], bool] | None

Filter groups (receives group_key and group_items).

None
combination_filter Callable[[tuple[Item, ...]], bool] | None

Filter specific combinations.

None
extract_text Callable[[Item], str] | None

Text extraction function.

None
item_template_id UUID | None

Template ID for created items.

None

Returns:

Type Description
list[Item]

Filtered forced-choice items.

Examples:

>>> fc_items = create_filtered_forced_choice_items(
...     items,
...     group_by=lambda i: i.item_metadata["verb"],
...     n_alternatives=2,
...     item_filter=lambda i: i.item_metadata.get("valid", True),
...     group_filter=lambda key, items: len(items) >= 2,
...     combination_filter=lambda combo: combo[0].id != combo[1].id
... )

ordinal_scale

Utilities for creating ordinal scale experimental items.

This module provides language-agnostic utilities for creating ordinal scale items where participants rate a single stimulus on an ordered discrete scale (e.g., 1-7 Likert scale, acceptability ratings).

Integration Points
  • Active Learning: bead/active_learning/models/ordinal_scale.py
  • Simulation: bead/simulation/strategies/ordinal_scale.py
  • Deployment: bead/deployment/jspsych/ (slider or radio buttons)

create_ordinal_scale_item(text: str, scale_bounds: tuple[int, int] = (1, 7), prompt: str | None = None, scale_labels: dict[int, str] | None = None, item_template_id: UUID | None = None, metadata: dict[str, MetadataValue] | None = None) -> Item

Create an ordinal scale rating item.

Parameters:

Name Type Description Default
text str

The stimulus text to rate.

required
scale_bounds tuple[int, int]

Tuple of (min, max) for the scale. Both must be integers with min < max. Default: (1, 7) for a 7-point scale.

(1, 7)
prompt str | None

Optional question/prompt for the rating. If None, uses "Rate this item:".

None
scale_labels dict[int, str] | None

Optional labels for specific scale values (e.g., {1: "Bad", 7: "Good"}). All keys must be within [scale_min, scale_max].

None
item_template_id UUID | None

Template ID for the item. If None, generates new UUID.

None
metadata dict[str, MetadataValue] | None

Additional metadata for item_metadata field.

None

Returns:

Type Description
Item

Ordinal scale item with text and prompt in rendered_elements.

Raises:

Type Description
ValueError

If text is empty, if scale_bounds are invalid, or if scale_labels contain values outside scale bounds.

Examples:

>>> item = create_ordinal_scale_item(
...     text="The cat sat on the mat.",
...     scale_bounds=(1, 7),
...     prompt="How natural is this sentence?",
...     metadata={"task": "acceptability"}
... )
>>> item.rendered_elements["text"]
'The cat sat on the mat.'
>>> item.item_metadata["scale_min"]
1
>>> item.item_metadata["scale_max"]
7
>>> # 5-point Likert with labels
>>> item = create_ordinal_scale_item(
...     text="I enjoy linguistics.",
...     scale_bounds=(1, 5),
...     scale_labels={1: "Strongly Disagree", 5: "Strongly Agree"}
... )
>>> item.item_metadata["scale_labels"][1]
'Strongly Disagree'

create_ordinal_scale_items_from_texts(texts: list[str], scale_bounds: tuple[int, int] = (1, 7), prompt: str | None = None, scale_labels: dict[int, str] | None = None, *, item_template_id: UUID | None = None, metadata_fn: Callable[[str], dict[str, MetadataValue]] | None = None) -> list[Item]

Create ordinal scale items from a list of texts.

Parameters:

Name Type Description Default
texts list[str]

List of stimulus texts.

required
scale_bounds tuple[int, int]

Scale bounds (min, max) for all items.

(1, 7)
prompt str | None

The question/prompt for all items.

None
scale_labels dict[int, str] | None

Optional scale labels for all items.

None
item_template_id UUID | None

Template ID for all created items. If None, generates one per item.

None
metadata_fn Callable[[str], dict[str, MetadataValue]] | None

Function to generate metadata from each text.

None

Returns:

Type Description
list[Item]

Ordinal scale items for each text.

Examples:

>>> texts = ["She walks.", "She walk.", "They walk."]
>>> items = create_ordinal_scale_items_from_texts(
...     texts,
...     scale_bounds=(1, 5),
...     prompt="How acceptable is this sentence?",
...     metadata_fn=lambda t: {"text_length": len(t)}
... )
>>> len(items)
3
>>> items[0].item_metadata["scale_min"]
1

create_ordinal_scale_items_from_groups(items: list[Item], group_by: Callable[[Item], Hashable], scale_bounds: tuple[int, int] = (1, 7), prompt: str | None = None, scale_labels: dict[int, str] | None = None, *, extract_text: Callable[[Item], str] | None = None, include_group_metadata: bool = True, item_template_id: UUID | None = None) -> list[Item]

Create ordinal scale items from grouped source items.

Groups items and creates one ordinal scale item per source item, preserving group information in metadata.

Parameters:

Name Type Description Default
items list[Item]

Source items to process.

required
group_by Callable[[Item], Hashable]

Function to extract grouping key from items.

required
scale_bounds tuple[int, int]

Scale bounds (min, max) for all items.

(1, 7)
prompt str | None

The question/prompt for all items.

None
scale_labels dict[int, str] | None

Optional scale labels for all items.

None
extract_text Callable[[Item], str] | None

Function to extract text from item. If None, tries common keys.

None
include_group_metadata bool

Whether to include group key in item metadata.

True
item_template_id UUID | None

Template ID for all created items. If None, generates one per item.

None

Returns:

Type Description
list[Item]

Ordinal scale items from source items.

Examples:

>>> source_items = [
...     Item(
...         uuid4(),
...         rendered_elements={"text": "She walks."},
...         item_metadata={"verb": "walk"}
...     )
... ]
>>> ordinal_items = create_ordinal_scale_items_from_groups(
...     source_items,
...     group_by=lambda i: i.item_metadata["verb"],
...     scale_bounds=(1, 7),
...     prompt="Rate the acceptability:"
... )
>>> len(ordinal_items)
1

create_ordinal_scale_items_cross_product(texts: list[str], prompts: list[str], scale_bounds: tuple[int, int] = (1, 7), scale_labels: dict[int, str] | None = None, *, item_template_id: UUID | None = None, metadata_fn: Callable[[str, str], dict[str, MetadataValue]] | None = None) -> list[Item]

Create ordinal scale items from cross-product of texts and prompts.

Useful when you want to apply multiple prompts to each text.

Parameters:

Name Type Description Default
texts list[str]

List of stimulus texts.

required
prompts list[str]

List of prompts to apply.

required
scale_bounds tuple[int, int]

Scale bounds (min, max) for all items.

(1, 7)
scale_labels dict[int, str] | None

Optional scale labels for all items.

None
item_template_id UUID | None

Template ID for all created items.

None
metadata_fn Callable[[str, str], dict[str, MetadataValue]] | None

Function to generate metadata from (text, prompt).

None

Returns:

Type Description
list[Item]

Ordinal scale items from cross-product.

Examples:

>>> texts = ["The cat sat.", "The dog ran."]
>>> prompts = ["How natural is this?", "How acceptable is this?"]
>>> items = create_ordinal_scale_items_cross_product(
...     texts, prompts, scale_bounds=(1, 5)
... )
>>> len(items)
4

create_filtered_ordinal_scale_items(items: list[Item], scale_bounds: tuple[int, int] = (1, 7), prompt: str | None = None, scale_labels: dict[int, str] | None = None, *, item_filter: Callable[[Item], bool] | None = None, extract_text: Callable[[Item], str] | None = None, item_template_id: UUID | None = None) -> list[Item]

Create ordinal scale items with filtering.

Parameters:

Name Type Description Default
items list[Item]

Source items.

required
scale_bounds tuple[int, int]

Scale bounds (min, max) for all items.

(1, 7)
prompt str | None

The question/prompt for all items.

None
scale_labels dict[int, str] | None

Optional scale labels for all items.

None
item_filter Callable[[Item], bool] | None

Filter individual items.

None
extract_text Callable[[Item], str] | None

Text extraction function.

None
item_template_id UUID | None

Template ID for created items.

None

Returns:

Type Description
list[Item]

Filtered ordinal scale items.

Examples:

>>> ordinal_items = create_filtered_ordinal_scale_items(
...     items,
...     scale_bounds=(1, 7),
...     prompt="Rate the acceptability:",
...     item_filter=lambda i: i.item_metadata.get("valid", True)
... )

create_likert_5_item(text: str, prompt: str | None = None, item_template_id: UUID | None = None, metadata: dict[str, MetadataValue] | None = None) -> Item

Create a 5-point Likert scale item.

Convenience function for standard 5-point Likert scale with "Strongly Disagree" to "Strongly Agree" labels.

Parameters:

Name Type Description Default
text str

The stimulus text (statement) to rate.

required
prompt str | None

Optional prompt. If None, uses "Rate your agreement:".

None
item_template_id UUID | None

Template ID for the item. If None, generates new UUID.

None
metadata dict[str, MetadataValue] | None

Additional metadata for item_metadata field.

None

Returns:

Type Description
Item

5-point Likert scale item.

Examples:

>>> item = create_likert_5_item("I enjoy studying linguistics.")
>>> item.item_metadata["scale_min"]
1
>>> item.item_metadata["scale_max"]
5

create_likert_7_item(text: str, prompt: str | None = None, item_template_id: UUID | None = None, metadata: dict[str, MetadataValue] | None = None) -> Item

Create a 7-point Likert scale item.

Convenience function for standard 7-point Likert scale with "Strongly Disagree" to "Strongly Agree" labels.

Parameters:

Name Type Description Default
text str

The stimulus text (statement) to rate.

required
prompt str | None

Optional prompt. If None, uses "Rate your agreement:".

None
item_template_id UUID | None

Template ID for the item. If None, generates new UUID.

None
metadata dict[str, MetadataValue] | None

Additional metadata for item_metadata field.

None

Returns:

Type Description
Item

7-point Likert scale item.

Examples:

>>> item = create_likert_7_item("I enjoy studying linguistics.")
>>> item.item_metadata["scale_min"]
1
>>> item.item_metadata["scale_max"]
7

binary

Utilities for creating binary experimental items.

This module provides language-agnostic utilities for creating binary items where participants make yes/no or true/false judgments about a single stimulus.

IMPORTANT: Binary tasks are semantically distinct from 2AFC tasks: - Binary: Absolute judgment about single stimulus ("Is this grammatical?") - 2AFC: Relative choice between two stimuli ("Which is more natural?")

Integration Points
  • Active Learning: bead/active_learning/models/binary.py
  • Simulation: bead/simulation/strategies/binary.py
  • Deployment: bead/deployment/jspsych/ (binary button plugin)

create_binary_item(text: str, prompt: str = 'Yes/No?', binary_options: tuple[str, str] = ('yes', 'no'), item_template_id: UUID | None = None, metadata: dict[str, MetadataValue] | None = None) -> Item

Create a binary judgment item for a single stimulus.

Parameters:

Name Type Description Default
text str

The stimulus text to judge.

required
prompt str

The question/prompt for the judgment (default: "Yes/No?").

'Yes/No?'
binary_options tuple[str, str]

The two response options (default: ("yes", "no")). Can also be ("true", "false"), ("acceptable", "unacceptable"), etc.

('yes', 'no')
item_template_id UUID | None

Template ID for the item. If None, generates new UUID.

None
metadata dict[str, MetadataValue] | None

Additional metadata for item_metadata field.

None

Returns:

Type Description
Item

Binary item with text and prompt in rendered_elements.

Raises:

Type Description
ValueError

If text is empty or if binary_options doesn't have exactly 2 values.

Examples:

>>> item = create_binary_item(
...     "The cat sat on the mat.",
...     prompt="Is this sentence grammatical?",
...     metadata={"judgment": "grammaticality"}
... )
>>> item.rendered_elements["text"]
'The cat sat on the mat.'
>>> item.rendered_elements["prompt"]
'Is this sentence grammatical?'
>>> item.item_metadata["binary_options"]
['yes', 'no']
>>> # Truth value judgment
>>> item = create_binary_item(
...     "The sky is blue.",
...     prompt="Is this statement true?",
...     binary_options=("true", "false")
... )
>>> item.item_metadata["binary_options"]
['true', 'false']

create_binary_items_from_texts(texts: list[str], prompt: str, binary_options: tuple[str, str] = ('yes', 'no'), *, item_template_id: UUID | None = None, metadata_fn: Callable[[str], dict[str, MetadataValue]] | None = None) -> list[Item]

Create binary items from a list of texts with the same prompt.

Parameters:

Name Type Description Default
texts list[str]

List of stimulus texts.

required
prompt str

The question/prompt for all items.

required
binary_options tuple[str, str]

The two response options (default: ("yes", "no")).

('yes', 'no')
item_template_id UUID | None

Template ID for all created items. If None, generates one per item.

None
metadata_fn Callable[[str], dict[str, MetadataValue]] | None

Function to generate metadata from each text.

None

Returns:

Type Description
list[Item]

Binary items for each text.

Examples:

>>> texts = [
...     "She walks.",
...     "She walk.",
...     "They walk.",
...     "They walks."
... ]
>>> items = create_binary_items_from_texts(
...     texts,
...     prompt="Is this sentence grammatical?",
...     binary_options=("yes", "no")
... )
>>> len(items)
4
>>> items[0].rendered_elements["text"]
'She walks.'

create_binary_items_with_context(contexts: list[str], targets: list[str], prompt: str, binary_options: tuple[str, str] = ('yes', 'no'), *, context_label: str = 'Context', target_label: str = 'Statement', item_template_id: UUID | None = None, metadata_fn: Callable[[str, str], dict[str, MetadataValue]] | None = None) -> list[Item]

Create binary items with context + target structure.

Useful for judgments like "Given context X, is statement Y true?".

Parameters:

Name Type Description Default
contexts list[str]

Context texts (same length as targets).

required
targets list[str]

Target texts to judge given context.

required
prompt str

The question/prompt for the judgment.

required
binary_options tuple[str, str]

The two response options (default: ("yes", "no")).

('yes', 'no')
context_label str

Label for context in rendered text (default: "Context").

'Context'
target_label str

Label for target in rendered text (default: "Statement").

'Statement'
item_template_id UUID | None

Template ID for all created items. If None, generates one per item.

None
metadata_fn Callable[[str, str], dict[str, MetadataValue]] | None

Function to generate metadata from (context, target).

None

Returns:

Type Description
list[Item]

Binary items with context + target structure.

Raises:

Type Description
ValueError

If contexts and targets have different lengths.

Examples:

>>> contexts = ["The dog barked loudly."]
>>> targets = ["The dog made a sound."]
>>> items = create_binary_items_with_context(
...     contexts,
...     targets,
...     prompt="Is the statement true given the context?",
...     binary_options=("true", "false")
... )
>>> len(items)
1
>>> "Context:" in items[0].rendered_elements["text"]
True

create_binary_items_from_groups(items: list[Item], group_by: Callable[[Item], Hashable], prompt: str, binary_options: tuple[str, str] = ('yes', 'no'), *, extract_text: Callable[[Item], str] | None = None, include_group_metadata: bool = True, item_template_id: UUID | None = None) -> list[Item]

Create binary items from grouped source items.

Groups items and creates one binary item per source item, preserving group information in metadata.

Parameters:

Name Type Description Default
items list[Item]

Source items to process.

required
group_by Callable[[Item], Hashable]

Function to extract grouping key from items.

required
prompt str

The question/prompt for all items.

required
binary_options tuple[str, str]

The two response options (default: ("yes", "no")).

('yes', 'no')
extract_text Callable[[Item], str] | None

Function to extract text from item. If None, tries common keys.

None
include_group_metadata bool

Whether to include group key in item metadata.

True
item_template_id UUID | None

Template ID for all created items. If None, generates one per item.

None

Returns:

Type Description
list[Item]

Binary items from source items.

Examples:

>>> source_items = [
...     Item(
...         uuid4(),
...         rendered_elements={"text": "She walks."},
...         item_metadata={"verb": "walk"}
...     ),
...     Item(
...         uuid4(),
...         rendered_elements={"text": "She runs."},
...         item_metadata={"verb": "run"}
...     )
... ]
>>> binary_items = create_binary_items_from_groups(
...     source_items,
...     group_by=lambda i: i.item_metadata["verb"],
...     prompt="Is this sentence grammatical?"
... )
>>> len(binary_items)
2

create_binary_items_cross_product(texts: list[str], prompts: list[str], binary_options: tuple[str, str] = ('yes', 'no'), *, item_template_id: UUID | None = None, metadata_fn: Callable[[str, str], dict[str, MetadataValue]] | None = None) -> list[Item]

Create binary items from cross-product of texts and prompts.

Useful when you want to apply multiple prompts to each text.

Parameters:

Name Type Description Default
texts list[str]

List of stimulus texts.

required
prompts list[str]

List of prompts to apply.

required
binary_options tuple[str, str]

The two response options (default: ("yes", "no")).

('yes', 'no')
item_template_id UUID | None

Template ID for all created items.

None
metadata_fn Callable[[str, str], dict[str, MetadataValue]] | None

Function to generate metadata from (text, prompt).

None

Returns:

Type Description
list[Item]

Binary items from cross-product.

Examples:

>>> texts = ["The cat sat.", "The dog ran."]
>>> prompts = ["Is this grammatical?", "Is this natural?"]
>>> items = create_binary_items_cross_product(texts, prompts)
>>> len(items)
4

create_filtered_binary_items(items: list[Item], prompt: str, binary_options: tuple[str, str] = ('yes', 'no'), *, item_filter: Callable[[Item], bool] | None = None, extract_text: Callable[[Item], str] | None = None, item_template_id: UUID | None = None) -> list[Item]

Create binary items with filtering.

Parameters:

Name Type Description Default
items list[Item]

Source items.

required
prompt str

The question/prompt for all items.

required
binary_options tuple[str, str]

The two response options (default: ("yes", "no")).

('yes', 'no')
item_filter Callable[[Item], bool] | None

Filter individual items.

None
extract_text Callable[[Item], str] | None

Text extraction function.

None
item_template_id UUID | None

Template ID for created items.

None

Returns:

Type Description
list[Item]

Filtered binary items.

Examples:

>>> binary_items = create_filtered_binary_items(
...     items,
...     prompt="Is this grammatical?",
...     item_filter=lambda i: i.item_metadata.get("valid", True)
... )

categorical

Utilities for creating categorical experimental items.

This module provides language-agnostic utilities for creating categorical items where participants select from N unordered categories (e.g., NLI labels, POS tags, semantic relations).

Integration Points
  • Active Learning: bead/active_learning/models/categorical.py
  • Simulation: bead/simulation/strategies/categorical.py
  • Deployment: bead/deployment/jspsych/ (dropdown or radio buttons)

create_categorical_item(text: str, categories: list[str], prompt: str | None = None, item_template_id: UUID | None = None, metadata: dict[str, MetadataValue] | None = None) -> Item

Create a categorical classification item.

Parameters:

Name Type Description Default
text str

The stimulus text to classify.

required
categories list[str]

List of category labels (unordered). Must have at least 2 categories.

required
prompt str | None

Optional question/prompt for the classification. If None, uses "Select a category:".

None
item_template_id UUID | None

Template ID for the item. If None, generates new UUID.

None
metadata dict[str, MetadataValue] | None

Additional metadata for item_metadata field.

None

Returns:

Type Description
Item

Categorical item with text and prompt in rendered_elements.

Raises:

Type Description
ValueError

If text is empty or if fewer than 2 categories provided.

Examples:

>>> item = create_categorical_item(
...     text="Premise: All dogs bark. Hypothesis: Some dogs bark.",
...     categories=["entailment", "neutral", "contradiction"],
...     prompt="What is the relationship?",
...     metadata={"task": "nli"}
... )
>>> item.rendered_elements["text"]
'Premise: All dogs bark. Hypothesis: Some dogs bark.'
>>> item.rendered_elements["prompt"]
'What is the relationship?'
>>> item.item_metadata["categories"]
['entailment', 'neutral', 'contradiction']
>>> # POS tagging
>>> item = create_categorical_item(
...     text="The cat sat on the mat.",
...     categories=["noun", "verb", "adjective", "determiner", "preposition"],
...     prompt="What is the part of speech of 'cat'?"
... )
>>> len(item.item_metadata["categories"])
5

create_nli_item(premise: str, hypothesis: str, categories: list[str] | None = None, prompt: str | None = None, item_template_id: UUID | None = None, metadata: dict[str, MetadataValue] | None = None) -> Item

Create a Natural Language Inference (NLI) item.

Specialized helper for NLI tasks with automatic formatting and default categories.

Parameters:

Name Type Description Default
premise str

The premise text.

required
hypothesis str

The hypothesis text.

required
categories list[str] | None

Category labels. If None, uses ["entailment", "neutral", "contradiction"].

None
prompt str | None

Question/prompt. If None, uses "What is the relationship?".

None
item_template_id UUID | None

Template ID for the item. If None, generates new UUID.

None
metadata dict[str, MetadataValue] | None

Additional metadata for item_metadata field.

None

Returns:

Type Description
Item

NLI categorical item.

Examples:

>>> item = create_nli_item(
...     premise="All dogs bark.",
...     hypothesis="Some dogs bark."
... )
>>> "Premise:" in item.rendered_elements["text"]
True
>>> "Hypothesis:" in item.rendered_elements["text"]
True
>>> item.item_metadata["categories"]
['entailment', 'neutral', 'contradiction']
>>> item.item_metadata["premise"]
'All dogs bark.'
>>> # Custom categories
>>> item = create_nli_item(
...     premise="The cat is on the mat.",
...     hypothesis="There is an animal on the mat.",
...     categories=["entails", "contradicts", "neither"]
... )
>>> item.item_metadata["categories"]
['entails', 'contradicts', 'neither']

create_categorical_items_from_texts(texts: list[str], categories: list[str], prompt: str | None = None, *, item_template_id: UUID | None = None, metadata_fn: Callable[[str], dict[str, MetadataValue]] | None = None) -> list[Item]

Create categorical items from a list of texts with the same categories.

Parameters:

Name Type Description Default
texts list[str]

List of stimulus texts.

required
categories list[str]

Category labels for all items.

required
prompt str | None

The question/prompt for all items.

None
item_template_id UUID | None

Template ID for all created items. If None, generates one per item.

None
metadata_fn Callable[[str], dict[str, MetadataValue]] | None

Function to generate metadata from each text.

None

Returns:

Type Description
list[Item]

Categorical items for each text.

Examples:

>>> texts = ["The cat sat.", "The dog ran.", "The bird flew."]
>>> categories = ["past", "present", "future"]
>>> items = create_categorical_items_from_texts(
...     texts,
...     categories=categories,
...     prompt="What is the tense?"
... )
>>> len(items)
3
>>> items[0].item_metadata["categories"]
['past', 'present', 'future']

create_categorical_items_from_pairs(pairs: list[tuple[str, str]], categories: list[str], prompt: str | None = None, *, pair_label1: str = 'Text 1', pair_label2: str = 'Text 2', item_template_id: UUID | None = None, metadata_fn: Callable[[str, str], dict[str, MetadataValue]] | None = None) -> list[Item]

Create categorical items from pairs of texts.

Useful for NLI, paraphrase detection, semantic similarity, etc.

Parameters:

Name Type Description Default
pairs list[tuple[str, str]]

List of (text1, text2) pairs.

required
categories list[str]

Category labels for all items.

required
prompt str | None

The question/prompt for all items.

None
pair_label1 str

Label for first text in pair (default: "Text 1").

'Text 1'
pair_label2 str

Label for second text in pair (default: "Text 2").

'Text 2'
item_template_id UUID | None

Template ID for all created items. If None, generates one per item.

None
metadata_fn Callable[[str, str], dict[str, MetadataValue]] | None

Function to generate metadata from (text1, text2).

None

Returns:

Type Description
list[Item]

Categorical items from pairs.

Examples:

>>> pairs = [
...     ("All dogs bark.", "Some dogs bark."),
...     ("The sky is blue.", "The sky is not blue.")
... ]
>>> items = create_categorical_items_from_pairs(
...     pairs,
...     categories=["entailment", "neutral", "contradiction"],
...     prompt="What is the relationship?",
...     pair_label1="Premise",
...     pair_label2="Hypothesis"
... )
>>> len(items)
2
>>> "Premise:" in items[0].rendered_elements["text"]
True

create_categorical_items_from_groups(items: list[Item], group_by: Callable[[Item], Hashable], categories: list[str], prompt: str | None = None, *, extract_text: Callable[[Item], str] | None = None, include_group_metadata: bool = True, item_template_id: UUID | None = None) -> list[Item]

Create categorical items from grouped source items.

Groups items and creates one categorical item per source item, preserving group information in metadata.

Parameters:

Name Type Description Default
items list[Item]

Source items to process.

required
group_by Callable[[Item], Hashable]

Function to extract grouping key from items.

required
categories list[str]

Category labels for all items.

required
prompt str | None

The question/prompt for all items.

None
extract_text Callable[[Item], str] | None

Function to extract text from item. If None, tries common keys.

None
include_group_metadata bool

Whether to include group key in item metadata.

True
item_template_id UUID | None

Template ID for all created items. If None, generates one per item.

None

Returns:

Type Description
list[Item]

Categorical items from source items.

Examples:

>>> source_items = [
...     Item(
...         uuid4(),
...         rendered_elements={"text": "The cat sat."},
...         item_metadata={"tense": "past"}
...     ),
...     Item(
...         uuid4(),
...         rendered_elements={"text": "The dog runs."},
...         item_metadata={"tense": "present"}
...     )
... ]
>>> categorical_items = create_categorical_items_from_groups(
...     source_items,
...     group_by=lambda i: i.item_metadata["tense"],
...     categories=["past", "present", "future"],
...     prompt="What is the tense?"
... )
>>> len(categorical_items)
2

create_categorical_items_cross_product(texts: list[str], prompts: list[str], categories: list[str], *, item_template_id: UUID | None = None, metadata_fn: Callable[[str, str], dict[str, MetadataValue]] | None = None) -> list[Item]

Create categorical items from cross-product of texts and prompts.

Useful when you want to apply multiple prompts to each text.

Parameters:

Name Type Description Default
texts list[str]

List of stimulus texts.

required
prompts list[str]

List of prompts to apply.

required
categories list[str]

Category labels for all items.

required
item_template_id UUID | None

Template ID for all created items.

None
metadata_fn Callable[[str, str], dict[str, MetadataValue]] | None

Function to generate metadata from (text, prompt).

None

Returns:

Type Description
list[Item]

Categorical items from cross-product.

Examples:

>>> texts = ["The cat sat.", "The dog ran."]
>>> prompts = ["What is the tense?", "What is the aspect?"]
>>> categories = ["past", "present", "future"]
>>> items = create_categorical_items_cross_product(
...     texts, prompts, categories
... )
>>> len(items)
4

create_filtered_categorical_items(items: list[Item], categories: list[str], prompt: str | None = None, *, item_filter: Callable[[Item], bool] | None = None, extract_text: Callable[[Item], str] | None = None, item_template_id: UUID | None = None) -> list[Item]

Create categorical items with filtering.

Parameters:

Name Type Description Default
items list[Item]

Source items.

required
categories list[str]

Category labels for all items.

required
prompt str | None

The question/prompt for all items.

None
item_filter Callable[[Item], bool] | None

Filter individual items.

None
extract_text Callable[[Item], str] | None

Text extraction function.

None
item_template_id UUID | None

Template ID for created items.

None

Returns:

Type Description
list[Item]

Filtered categorical items.

Examples:

>>> categorical_items = create_filtered_categorical_items(
...     items,
...     categories=["past", "present", "future"],
...     prompt="What is the tense?",
...     item_filter=lambda i: i.item_metadata.get("valid", True)
... )

multi_select

Utilities for creating multi-select experimental items.

This module provides language-agnostic utilities for creating multi-select items where participants select one or more options from a set (checkboxes).

Integration Points
  • Active Learning: bead/active_learning/models/multi_select.py
  • Simulation: bead/simulation/strategies/multi_select.py
  • Deployment: bead/deployment/jspsych/ (checkbox plugin)

create_multi_select_item(*options: str, min_selections: int = 1, max_selections: int | None = None, item_template_id: UUID | None = None, metadata: dict[str, MetadataValue] | None = None) -> Item

Create a multi-select item from N text options.

Parameters:

Name Type Description Default
*options str

Text for each option (2 or more required).

()
min_selections int

Minimum number of options that must be selected (default: 1).

1
max_selections int | None

Maximum number of options that can be selected. If None, defaults to number of options (no upper limit).

None
item_template_id UUID | None

Template ID for the item. If None, generates new UUID.

None
metadata dict[str, MetadataValue] | None

Additional metadata for item_metadata field.

None

Returns:

Type Description
Item

Multi-select item with options stored in the options field.

Raises:

Type Description
ValueError

If fewer than 2 options provided, or if min_selections > max_selections, or if min_selections < 1, or if max_selections > number of options.

Examples:

>>> item = create_multi_select_item(
...     "She walks.",
...     "She walk.",
...     "They walks.",
...     "They walk.",
...     min_selections=1,
...     max_selections=4,
...     metadata={"task": "select_grammatical"}
... )
>>> item.options[0]
'She walks.'
>>> item.item_metadata["min_selections"]
1
>>> item.item_metadata["max_selections"]
4
>>> # Multi-select with default max (all options)
>>> item = create_multi_select_item(
...     "Option A",
...     "Option B",
...     "Option C"
... )
>>> item.item_metadata["max_selections"]
3

create_multi_select_items_from_groups(items: list[Item], group_by: Callable[[Item], Any], n_options: int | None = None, min_selections: int = 1, max_selections: int | None = None, *, extract_text: Callable[[Item], str] | None = None, include_group_metadata: bool = True, item_template_id: UUID | None = None) -> list[Item]

Create multi-select items by grouping source items.

Groups items by a property, then creates multi-select items from each group's items as options.

Parameters:

Name Type Description Default
items list[Item]

Source items to group and combine.

required
group_by Callable[[Item], Any]

Function to extract grouping key from items.

required
n_options int | None

Number of options per multi-select item. If None, uses all items in each group.

None
min_selections int

Minimum number of selections required (default: 1).

1
max_selections int | None

Maximum number of selections allowed. If None, defaults to n_options.

None
extract_text Callable[[Item], str] | None

Function to extract text from item. If None, tries common keys ("text", "sentence", "content") from rendered_elements.

None
include_group_metadata bool

Whether to include group key in item metadata.

True
item_template_id UUID | None

Template ID for all created items. If None, generates one per item.

None

Returns:

Type Description
list[Item]

Multi-select items created from groupings.

Examples:

Create multi-select items grouped by verb (select all acceptable frames):

>>> items = [
...     Item(
...         item_template_id=uuid4(),
...         rendered_elements={"text": "She walks."},
...         item_metadata={"verb": "walk", "frame": "intransitive"}
...     ),
...     Item(
...         item_template_id=uuid4(),
...         rendered_elements={"text": "She walks the dog."},
...         item_metadata={"verb": "walk", "frame": "transitive"}
...     ),
...     Item(
...         item_template_id=uuid4(),
...         rendered_elements={"text": "She walks to school."},
...         item_metadata={"verb": "walk", "frame": "intransitive_pp"}
...     )
... ]
>>> ms_items = create_multi_select_items_from_groups(
...     items,
...     group_by=lambda item: item.item_metadata["verb"],
...     min_selections=1,
...     max_selections=3
... )
>>> len(ms_items)
1
>>> len(ms_items[0].rendered_elements)
3

create_multi_select_items_with_foils(correct_items: list[Item], foil_items: list[Item], n_correct: int = 2, n_foils: int = 2, *, extract_text: Callable[[Item], str] | None = None, item_template_id: UUID | None = None, metadata_fn: Callable[[list[Item], list[Item]], dict[str, MetadataValue]] | None = None) -> list[Item]

Create multi-select items by combining correct items with foils.

Useful for tasks like "Select all grammatical sentences" where some options are correct and others are foils (distractors).

Parameters:

Name Type Description Default
correct_items list[Item]

Items that are correct (should be selected).

required
foil_items list[Item]

Items that are foils/distractors (should not be selected).

required
n_correct int

Number of correct items to include per multi-select item (default: 2).

2
n_foils int

Number of foil items to include per multi-select item (default: 2).

2
extract_text Callable[[Item], str] | None

Function to extract text from items.

None
item_template_id UUID | None

Template ID for all created items.

None
metadata_fn Callable[[list[Item], list[Item]], dict[str, MetadataValue]] | None

Function to generate metadata from (correct_items_used, foil_items_used).

None

Returns:

Type Description
list[Item]

Multi-select items with correct items and foils.

Examples:

>>> grammatical = [
...     Item(uuid4(), rendered_elements={"text": "She walks."},
...          item_metadata={"grammatical": True}),
...     Item(uuid4(), rendered_elements={"text": "They walk."},
...          item_metadata={"grammatical": True})
... ]
>>> ungrammatical = [
...     Item(uuid4(), rendered_elements={"text": "She walk."},
...          item_metadata={"grammatical": False}),
...     Item(uuid4(), rendered_elements={"text": "They walks."},
...          item_metadata={"grammatical": False})
... ]
>>> ms_items = create_multi_select_items_with_foils(
...     grammatical,
...     ungrammatical,
...     n_correct=2,
...     n_foils=2
... )
>>> len(ms_items)
1
>>> ms_items[0].item_metadata["min_selections"]
1
>>> ms_items[0].item_metadata["max_selections"]
4

create_multi_select_items_cross_product(group1_items: list[Item], group2_items: list[Item], n_from_group1: int = 1, n_from_group2: int = 1, min_selections: int = 1, max_selections: int | None = None, *, extract_text: Callable[[Item], str] | None = None, item_template_id: UUID | None = None, metadata_fn: Callable[[list[Item], list[Item]], dict[str, MetadataValue]] | None = None) -> list[Item]

Create multi-select items from cross-product of two groups.

Combines n items from group1 with n items from group2 to create multi-select items with (n_from_group1 + n_from_group2) options.

Parameters:

Name Type Description Default
group1_items list[Item]

Items in first group.

required
group2_items list[Item]

Items in second group.

required
n_from_group1 int

Number of items to select from group1 per combination (default: 1).

1
n_from_group2 int

Number of items to select from group2 per combination (default: 1).

1
min_selections int

Minimum number of selections required (default: 1).

1
max_selections int | None

Maximum number of selections allowed. If None, defaults to total options.

None
extract_text Callable[[Item], str] | None

Function to extract text from items.

None
item_template_id UUID | None

Template ID for all created items.

None
metadata_fn Callable[[list[Item], list[Item]], dict[str, MetadataValue]] | None

Function to generate metadata from (group1_items_used, group2_items_used).

None

Returns:

Type Description
list[Item]

Multi-select items from cross-product.

Examples:

>>> active = [Item(uuid4(), rendered_elements={"text": "She walks."})]
>>> passive = [Item(uuid4(), rendered_elements={"text": "She is walked."})]
>>> ms_items = create_multi_select_items_cross_product(
...     active, passive,
...     n_from_group1=1,
...     n_from_group2=1,
...     min_selections=1,
...     max_selections=2
... )
>>> len(ms_items)
1

create_filtered_multi_select_items(items: list[Item], group_by: Callable[[Item], Any], n_options: int | None = None, min_selections: int = 1, max_selections: int | None = None, *, item_filter: Callable[[Item], bool] | None = None, group_filter: Callable[[Any, list[Item]], bool] | None = None, combination_filter: Callable[[tuple[Item, ...]], bool] | None = None, extract_text: Callable[[Item], str] | None = None, item_template_id: UUID | None = None) -> list[Item]

Create multi-select items with multi-level filtering.

Parameters:

Name Type Description Default
items list[Item]

Source items.

required
group_by Callable[[Item], Any]

Grouping function.

required
n_options int | None

Number of options per item. If None, uses all items in each group.

None
min_selections int

Minimum number of selections required.

1
max_selections int | None

Maximum number of selections allowed.

None
item_filter Callable[[Item], bool] | None

Filter individual items before grouping.

None
group_filter Callable[[Any, list[Item]], bool] | None

Filter groups (receives group_key and group_items).

None
combination_filter Callable[[tuple[Item, ...]], bool] | None

Filter specific combinations.

None
extract_text Callable[[Item], str] | None

Text extraction function.

None
item_template_id UUID | None

Template ID for created items.

None

Returns:

Type Description
list[Item]

Filtered multi-select items.

Examples:

>>> ms_items = create_filtered_multi_select_items(
...     items,
...     group_by=lambda i: i.item_metadata["verb"],
...     n_options=3,
...     item_filter=lambda i: i.item_metadata.get("valid", True),
...     group_filter=lambda key, items: len(items) >= 3,
...     min_selections=1,
...     max_selections=3
... )

magnitude

Utilities for creating magnitude experimental items.

This module provides language-agnostic utilities for creating magnitude items where participants enter numeric values (bounded or unbounded), such as reading times, confidence ratings, or counts.

Integration Points
  • Active Learning: bead/active_learning/models/magnitude.py
  • Simulation: bead/simulation/strategies/magnitude.py
  • Deployment: bead/deployment/jspsych/ (numeric input)

create_magnitude_item(text: str, unit: str | None = None, bounds: tuple[int | float | None, int | float | None] = (None, None), prompt: str | None = None, step: int | float | None = None, item_template_id: UUID | None = None, metadata: dict[str, MetadataValue] | None = None) -> Item

Create a magnitude (numeric input) item.

Parameters:

Name Type Description Default
text str

The stimulus text or question.

required
unit str | None

Optional unit for the value (e.g., "ms", "%", "count").

None
bounds tuple[int | float | None, int | float | None]

Tuple of (min, max) bounds. None means unbounded in that direction. Default: (None, None) for fully unbounded.

(None, None)
prompt str | None

Optional prompt for the numeric input. If None, uses "Enter a value:".

None
step int | float | None

Optional step size for input validation (e.g., 1 for integers, 0.01 for hundredths).

None
item_template_id UUID | None

Template ID for the item. If None, generates new UUID.

None
metadata dict[str, MetadataValue] | None

Additional metadata for item_metadata field.

None

Returns:

Type Description
Item

Magnitude item with text and prompt in rendered_elements.

Raises:

Type Description
ValueError

If text is empty or if both bounds are provided and min >= max.

Examples:

>>> item = create_magnitude_item(
...     text="How long did it take to read this sentence?",
...     unit="ms",
...     bounds=(0, None),
...     prompt="Enter time in milliseconds:"
... )
>>> item.rendered_elements["text"]
'How long did it take to read this sentence?'
>>> item.item_metadata["unit"]
'ms'
>>> item.item_metadata["min_value"]
0
>>> item.item_metadata["max_value"] is None
True
>>> # Confidence with bounded range
>>> item = create_magnitude_item(
...     text="How confident are you in your answer?",
...     unit="%",
...     bounds=(0, 100),
...     step=1
... )
>>> item.item_metadata["max_value"]
100

create_magnitude_items_from_texts(texts: list[str], unit: str | None = None, bounds: tuple[int | float | None, int | float | None] = (None, None), prompt: str | None = None, step: int | float | None = None, *, item_template_id: UUID | None = None, metadata_fn: Callable[[str], dict[str, MetadataValue]] | None = None) -> list[Item]

Create magnitude items from a list of texts.

Parameters:

Name Type Description Default
texts list[str]

List of stimulus texts.

required
unit str | None

Optional unit for all items.

None
bounds tuple[int | float | None, int | float | None]

Bounds (min, max) for all items.

(None, None)
prompt str | None

The question/prompt for all items.

None
step int | float | None

Step size for all items.

None
item_template_id UUID | None

Template ID for all created items. If None, generates one per item.

None
metadata_fn Callable[[str], dict[str, MetadataValue]] | None

Function to generate metadata from each text.

None

Returns:

Type Description
list[Item]

Magnitude items for each text.

Examples:

>>> texts = ["Sentence 1", "Sentence 2", "Sentence 3"]
>>> items = create_magnitude_items_from_texts(
...     texts,
...     unit="ms",
...     bounds=(0, None),
...     prompt="Reading time?",
...     metadata_fn=lambda t: {"text_length": len(t)}
... )
>>> len(items)
3
>>> items[0].item_metadata["unit"]
'ms'

create_magnitude_items_from_groups(items: list[Item], group_by: Callable[[Item], Any], unit: str | None = None, bounds: tuple[int | float | None, int | float | None] = (None, None), prompt: str | None = None, step: int | float | None = None, *, extract_text: Callable[[Item], str] | None = None, include_group_metadata: bool = True, item_template_id: UUID | None = None) -> list[Item]

Create magnitude items from grouped source items.

Groups items and creates one magnitude item per source item, preserving group information in metadata.

Parameters:

Name Type Description Default
items list[Item]

Source items to process.

required
group_by Callable[[Item], Any]

Function to extract grouping key from items.

required
unit str | None

Optional unit for all items.

None
bounds tuple[int | float | None, int | float | None]

Bounds (min, max) for all items.

(None, None)
prompt str | None

The question/prompt for all items.

None
step int | float | None

Step size for all items.

None
extract_text Callable[[Item], str] | None

Function to extract text from item. If None, tries common keys.

None
include_group_metadata bool

Whether to include group key in item metadata.

True
item_template_id UUID | None

Template ID for all created items. If None, generates one per item.

None

Returns:

Type Description
list[Item]

Magnitude items from source items.

Examples:

>>> source_items = [
...     Item(
...         uuid4(),
...         rendered_elements={"text": "The cat sat."},
...         item_metadata={"category": "simple"}
...     )
... ]
>>> magnitude_items = create_magnitude_items_from_groups(
...     source_items,
...     group_by=lambda i: i.item_metadata["category"],
...     unit="ms",
...     bounds=(0, None),
...     prompt="Reading time:"
... )
>>> len(magnitude_items)
1

create_magnitude_items_cross_product(texts: list[str], prompts: list[str], unit: str | None = None, bounds: tuple[int | float | None, int | float | None] = (None, None), step: int | float | None = None, *, item_template_id: UUID | None = None, metadata_fn: Callable[[str, str], dict[str, MetadataValue]] | None = None) -> list[Item]

Create magnitude items from cross-product of texts and prompts.

Useful when you want to apply multiple prompts to each text.

Parameters:

Name Type Description Default
texts list[str]

List of stimulus texts.

required
prompts list[str]

List of prompts to apply.

required
unit str | None

Optional unit for all items.

None
bounds tuple[int | float | None, int | float | None]

Bounds (min, max) for all items.

(None, None)
step int | float | None

Step size for all items.

None
item_template_id UUID | None

Template ID for all created items.

None
metadata_fn Callable[[str, str], dict[str, MetadataValue]] | None

Function to generate metadata from (text, prompt).

None

Returns:

Type Description
list[Item]

Magnitude items from cross-product.

Examples:

>>> texts = ["Sentence 1.", "Sentence 2."]
>>> prompts = ["Reading time?", "Processing time?"]
>>> items = create_magnitude_items_cross_product(
...     texts, prompts, unit="ms", bounds=(0, None)
... )
>>> len(items)
4

create_filtered_magnitude_items(items: list[Item], unit: str | None = None, bounds: tuple[int | float | None, int | float | None] = (None, None), prompt: str | None = None, step: int | float | None = None, *, item_filter: Callable[[Item], bool] | None = None, extract_text: Callable[[Item], str] | None = None, item_template_id: UUID | None = None) -> list[Item]

Create magnitude items with filtering.

Parameters:

Name Type Description Default
items list[Item]

Source items.

required
unit str | None

Optional unit for all items.

None
bounds tuple[int | float | None, int | float | None]

Bounds (min, max) for all items.

(None, None)
prompt str | None

The question/prompt for all items.

None
step int | float | None

Step size for all items.

None
item_filter Callable[[Item], bool] | None

Filter individual items.

None
extract_text Callable[[Item], str] | None

Text extraction function.

None
item_template_id UUID | None

Template ID for created items.

None

Returns:

Type Description
list[Item]

Filtered magnitude items.

Examples:

>>> magnitude_items = create_filtered_magnitude_items(
...     items,
...     unit="ms",
...     bounds=(0, None),
...     prompt="Reading time:",
...     item_filter=lambda i: i.item_metadata.get("valid", True)
... )

create_reading_time_item(text: str, item_template_id: UUID | None = None, metadata: dict[str, MetadataValue] | None = None) -> Item

Create a reading time measurement item.

Convenience function for reading time in milliseconds with a lower bound of 0 (no upper bound).

Parameters:

Name Type Description Default
text str

The text to measure reading time for.

required
item_template_id UUID | None

Template ID for the item. If None, generates new UUID.

None
metadata dict[str, MetadataValue] | None

Additional metadata for item_metadata field.

None

Returns:

Type Description
Item

Reading time magnitude item.

Examples:

>>> item = create_reading_time_item("The cat sat on the mat.")
>>> item.item_metadata["unit"]
'ms'
>>> item.item_metadata["min_value"]
0

create_confidence_item(text: str, item_template_id: UUID | None = None, metadata: dict[str, MetadataValue] | None = None) -> Item

Create a confidence rating item (0-100%).

Convenience function for confidence percentage with bounds (0, 100).

Parameters:

Name Type Description Default
text str

The text or question to rate confidence for.

required
item_template_id UUID | None

Template ID for the item. If None, generates new UUID.

None
metadata dict[str, MetadataValue] | None

Additional metadata for item_metadata field.

None

Returns:

Type Description
Item

Confidence magnitude item.

Examples:

>>> item = create_confidence_item("Is this sentence grammatical?")
>>> item.item_metadata["unit"]
'%'
>>> item.item_metadata["max_value"]
100

free_text

Utilities for creating free text experimental items.

This module provides language-agnostic utilities for creating free text items where participants provide open-ended text responses (e.g., paraphrasing, question answering, cloze completion).

Integration Points
  • Active Learning: bead/active_learning/models/free_text.py
  • Simulation: bead/simulation/strategies/free_text.py
  • Deployment: bead/deployment/jspsych/ (text input or textarea)

create_free_text_item(text: str, prompt: str, max_length: int | None = None, validation_pattern: str | None = None, min_length: int | None = None, multiline: bool = False, item_template_id: UUID | None = None, metadata: dict[str, MetadataValue] | None = None) -> Item

Create a free text (open-ended) item.

Parameters:

Name Type Description Default
text str

The stimulus text or context.

required
prompt str

The question/instruction for what to enter (required).

required
max_length int | None

Maximum character limit. None means unlimited.

None
validation_pattern str | None

Optional regex pattern for validation (validated at deployment).

None
min_length int | None

Minimum characters required. None means no minimum.

None
multiline bool

True for textarea (multiline), False for single-line input (default).

False
item_template_id UUID | None

Template ID for the item. If None, generates new UUID.

None
metadata dict[str, MetadataValue] | None

Additional metadata for item_metadata field.

None

Returns:

Type Description
Item

Free text item with text and prompt in rendered_elements.

Raises:

Type Description
ValueError

If text or prompt is empty, or if min_length > max_length.

Examples:

>>> item = create_free_text_item(
...     text="The dog chased the cat.",
...     prompt="Who chased whom?",
...     max_length=100
... )
>>> item.rendered_elements["text"]
'The dog chased the cat.'
>>> item.rendered_elements["prompt"]
'Who chased whom?'
>>> item.item_metadata["max_length"]
100
>>> # Multiline paraphrase task
>>> item = create_free_text_item(
...     text="The quick brown fox jumps over the lazy dog.",
...     prompt="Rewrite this sentence in your own words:",
...     multiline=True,
...     max_length=200
... )
>>> item.item_metadata["multiline"]
True

create_free_text_items_from_texts(texts: list[str], prompt: str, max_length: int | None = None, validation_pattern: str | None = None, min_length: int | None = None, multiline: bool = False, *, item_template_id: UUID | None = None, metadata_fn: Callable[[str], dict[str, MetadataValue]] | None = None) -> list[Item]

Create free text items from a list of texts with the same prompt.

Parameters:

Name Type Description Default
texts list[str]

List of stimulus texts.

required
prompt str

The question/instruction for all items (required).

required
max_length int | None

Maximum character limit for all items.

None
validation_pattern str | None

Optional regex pattern for validation.

None
min_length int | None

Minimum characters required.

None
multiline bool

True for textarea, False for single-line input.

False
item_template_id UUID | None

Template ID for all created items. If None, generates one per item.

None
metadata_fn Callable[[str], dict[str, MetadataValue]] | None

Function to generate metadata from each text.

None

Returns:

Type Description
list[Item]

Free text items for each text.

Examples:

>>> texts = ["Sentence 1", "Sentence 2", "Sentence 3"]
>>> items = create_free_text_items_from_texts(
...     texts,
...     prompt="Paraphrase this:",
...     multiline=True,
...     max_length=200,
...     metadata_fn=lambda t: {"original_length": len(t)}
... )
>>> len(items)
3
>>> items[0].item_metadata["original_length"]
10

create_free_text_items_with_context(contexts: list[str], prompts: list[str], max_length: int | None = None, validation_pattern: str | None = None, min_length: int | None = None, multiline: bool = False, *, item_template_id: UUID | None = None, metadata_fn: Callable[[str, str], dict[str, MetadataValue]] | None = None) -> list[Item]

Create free text items with context + prompt pairs.

Useful for reading comprehension, question answering where each context has a specific question.

Parameters:

Name Type Description Default
contexts list[str]

Context texts (same length as prompts).

required
prompts list[str]

Prompts/questions for each context.

required
max_length int | None

Maximum character limit for all items.

None
validation_pattern str | None

Optional regex pattern for validation.

None
min_length int | None

Minimum characters required.

None
multiline bool

True for textarea, False for single-line input.

False
item_template_id UUID | None

Template ID for all created items. If None, generates one per item.

None
metadata_fn Callable[[str, str], dict[str, MetadataValue]] | None

Function to generate metadata from (context, prompt).

None

Returns:

Type Description
list[Item]

Free text items with context + prompt structure.

Raises:

Type Description
ValueError

If contexts and prompts have different lengths.

Examples:

>>> contexts = ["The cat sat on the mat."]
>>> prompts = ["What sat on the mat?"]
>>> items = create_free_text_items_with_context(
...     contexts,
...     prompts,
...     max_length=50
... )
>>> len(items)
1
>>> items[0].rendered_elements["text"]
'The cat sat on the mat.'
>>> items[0].rendered_elements["prompt"]
'What sat on the mat?'

create_free_text_items_from_groups(items: list[Item], group_by: Callable[[Item], Any], prompt: str, max_length: int | None = None, validation_pattern: str | None = None, min_length: int | None = None, multiline: bool = False, *, extract_text: Callable[[Item], str] | None = None, include_group_metadata: bool = True, item_template_id: UUID | None = None) -> list[Item]

Create free text items from grouped source items.

Groups items and creates one free text item per source item, preserving group information in metadata.

Parameters:

Name Type Description Default
items list[Item]

Source items to process.

required
group_by Callable[[Item], Any]

Function to extract grouping key from items.

required
prompt str

The question/instruction for all items (required).

required
max_length int | None

Maximum character limit.

None
validation_pattern str | None

Optional regex pattern for validation.

None
min_length int | None

Minimum characters required.

None
multiline bool

True for textarea, False for single-line input.

False
extract_text Callable[[Item], str] | None

Function to extract text from item. If None, tries common keys.

None
include_group_metadata bool

Whether to include group key in item metadata.

True
item_template_id UUID | None

Template ID for all created items. If None, generates one per item.

None

Returns:

Type Description
list[Item]

Free text items from source items.

Examples:

>>> source_items = [
...     Item(
...         uuid4(),
...         rendered_elements={"text": "Sentence 1"},
...         item_metadata={"type": "simple"}
...     )
... ]
>>> free_text_items = create_free_text_items_from_groups(
...     source_items,
...     group_by=lambda i: i.item_metadata["type"],
...     prompt="Paraphrase this:",
...     multiline=True
... )
>>> len(free_text_items)
1

create_free_text_items_cross_product(texts: list[str], prompts: list[str], max_length: int | None = None, validation_pattern: str | None = None, min_length: int | None = None, multiline: bool = False, *, item_template_id: UUID | None = None, metadata_fn: Callable[[str, str], dict[str, MetadataValue]] | None = None) -> list[Item]

Create free text items from cross-product of texts and prompts.

Useful when you want to apply multiple prompts to each text.

Parameters:

Name Type Description Default
texts list[str]

List of stimulus texts.

required
prompts list[str]

List of prompts to apply.

required
max_length int | None

Maximum character limit for all items.

None
validation_pattern str | None

Optional regex pattern for validation.

None
min_length int | None

Minimum characters required.

None
multiline bool

True for textarea, False for single-line input.

False
item_template_id UUID | None

Template ID for all created items.

None
metadata_fn Callable[[str, str], dict[str, MetadataValue]] | None

Function to generate metadata from (text, prompt).

None

Returns:

Type Description
list[Item]

Free text items from cross-product.

Examples:

>>> texts = ["Sentence 1", "Sentence 2"]
>>> prompts = ["Paraphrase this:", "Summarize this:"]
>>> items = create_free_text_items_cross_product(
...     texts, prompts, multiline=True, max_length=200
... )
>>> len(items)
4

create_filtered_free_text_items(items: list[Item], prompt: str, max_length: int | None = None, validation_pattern: str | None = None, min_length: int | None = None, multiline: bool = False, *, item_filter: Callable[[Item], bool] | None = None, extract_text: Callable[[Item], str] | None = None, item_template_id: UUID | None = None) -> list[Item]

Create free text items with filtering.

Parameters:

Name Type Description Default
items list[Item]

Source items.

required
prompt str

The question/instruction for all items (required).

required
max_length int | None

Maximum character limit.

None
validation_pattern str | None

Optional regex pattern for validation.

None
min_length int | None

Minimum characters required.

None
multiline bool

True for textarea, False for single-line input.

False
item_filter Callable[[Item], bool] | None

Filter individual items.

None
extract_text Callable[[Item], str] | None

Text extraction function.

None
item_template_id UUID | None

Template ID for created items.

None

Returns:

Type Description
list[Item]

Filtered free text items.

Examples:

>>> free_text_items = create_filtered_free_text_items(
...     items,
...     prompt="Paraphrase this:",
...     multiline=True,
...     item_filter=lambda i: i.item_metadata.get("valid", True)
... )

create_paraphrase_item(text: str, instruction: str = 'Rewrite in your own words:', item_template_id: UUID | None = None, metadata: dict[str, MetadataValue] | None = None) -> Item

Create a paraphrase generation item.

Convenience function for paraphrase tasks with multiline input.

Parameters:

Name Type Description Default
text str

The text to paraphrase.

required
instruction str

The instruction for paraphrasing (default: "Rewrite in your own words:").

'Rewrite in your own words:'
item_template_id UUID | None

Template ID for the item. If None, generates new UUID.

None
metadata dict[str, MetadataValue] | None

Additional metadata for item_metadata field.

None

Returns:

Type Description
Item

Paraphrase free text item.

Examples:

>>> item = create_paraphrase_item(
...     "The quick brown fox jumps over the lazy dog."
... )
>>> item.rendered_elements["prompt"]
'Rewrite in your own words:'
>>> item.item_metadata["multiline"]
True

create_wh_question_item(text: str, question_word: str = 'Who', item_template_id: UUID | None = None, metadata: dict[str, MetadataValue] | None = None) -> Item

Create a WH-question answering item.

Convenience function for WH-question answering with short text input.

Parameters:

Name Type Description Default
text str

The context/passage for the question.

required
question_word str

The question word to use (default: "Who").

'Who'
item_template_id UUID | None

Template ID for the item. If None, generates new UUID.

None
metadata dict[str, MetadataValue] | None

Additional metadata for item_metadata field.

None

Returns:

Type Description
Item

WH-question free text item.

Examples:

>>> item = create_wh_question_item(
...     "The dog chased the cat.",
...     question_word="What"
... )
>>> "What" in item.rendered_elements["prompt"]
True
>>> item.item_metadata["max_length"]
100

cloze

Utilities for creating cloze experimental items.

This module provides language-agnostic utilities for creating cloze items where participants fill in missing words/phrases in partially-filled templates.

SPECIAL: This is the ONLY task type that uses the Item.unfilled_slots field.

Cloze items are unique in that: - They use partially-filled templates with specific slots left blank - UI widgets are inferred from slot constraints at deployment time: - Extensional constraint (finite set) → dropdown - Intensional constraint (rules) → text input with validation - No constraint → free text input - Multiple slots can be unfilled in a single item

Integration Points
  • Active Learning: bead/active_learning/models/cloze.py
  • Simulation: bead/simulation/strategies/cloze.py
  • Deployment: bead/deployment/jspsych/ (dynamic widget generation)
  • Resources: bead/resources/template.py (Template and Slot models)

create_cloze_item(template: Any, unfilled_slot_names: list[str], filled_slots: dict[str, str] | None = None, instructions: str | None = None, *, item_template_id: UUID | None = None, metadata: dict[str, MetadataValue] | None = None) -> Item

Create a cloze item from a template with specific slots unfilled.

Parameters:

Name Type Description Default
template Template

Source template with slots.

required
unfilled_slot_names list[str]

Names of slots to leave unfilled (must exist in template.slots).

required
filled_slots dict[str, str] | None

Pre-filled slots (keys must be valid slot names, disjoint from unfilled).

None
instructions str | None

Optional instructions for filling (e.g., "Fill in the verb").

None
item_template_id UUID | None

Template ID for the item. If None, generates new UUID.

None
metadata dict[str, MetadataValue] | None

Additional metadata for item_metadata field.

None

Returns:

Type Description
Item

Cloze item with unfilled_slots populated.

Raises:

Type Description
ValueError

If unfilled_slot_names not in template, if filled_slots not in template, if unfilled and filled overlap, if no unfilled slots, or if validation fails.

Examples:

>>> from bead.resources.template import Template, Slot
>>> template = Template(
...     name="simple",
...     template_string="{det} {noun} {verb}.",
...     slots={
...         "det": Slot(name="det"),
...         "noun": Slot(name="noun"),
...         "verb": Slot(name="verb")
...     }
... )
>>> item = create_cloze_item(
...     template,
...     unfilled_slot_names=["verb"],
...     filled_slots={"det": "The", "noun": "cat"}
... )
>>> item.rendered_elements["text"]
'The cat ___.'
>>> len(item.unfilled_slots)
1
>>> item.unfilled_slots[0].slot_name
'verb'
>>> item.unfilled_slots[0].position
2

create_cloze_items_from_template(template: Any, n_unfilled: int = 1, strategy: str = 'all_combinations', unfilled_combinations: list[list[str]] | None = None, instructions: str | None = None, *, item_template_id: UUID | None = None, metadata_fn: Callable[[list[str]], dict[str, MetadataValue]] | None = None) -> list[Item]

Create multiple cloze items from a template, varying unfilled slots.

Parameters:

Name Type Description Default
template Template

Source template.

required
n_unfilled int

Number of slots to leave unfilled per item (default: 1).

1
strategy str

How to choose unfilled slots: - 'random': Randomly sample combinations - 'all_combinations': Generate all C(n_slots, n_unfilled) combinations - 'specified': Use provided list

'all_combinations'
unfilled_combinations list[list[str]] | None

For strategy='specified', list of slot name combinations to unfill.

None
instructions str | None

Instructions for all items.

None
item_template_id UUID | None

Template ID for all items.

None
metadata_fn Callable[[list[str]], dict[str, MetadataValue]] | None

Generate metadata from unfilled slot names.

None

Returns:

Type Description
list[Item]

Cloze items with varying unfilled slots.

Raises:

Type Description
ValueError

If n_unfilled invalid, if strategy='specified' without unfilled_combinations, or if any combination contains invalid slots.

Examples:

>>> # Generate all single-slot cloze items
>>> items = create_cloze_items_from_template(
...     template, n_unfilled=1, strategy='all_combinations'
... )
>>> len(items)  # One for each slot
3

create_simple_cloze_item(text: str, blank_positions: list[int], blank_labels: list[str] | None = None, instructions: str | None = None, *, item_template_id: UUID | None = None, metadata: dict[str, MetadataValue] | None = None) -> Item

Create a cloze item from plain text (no template).

Replaces words at specified positions with blanks. This is a simplified helper for creating cloze items without the template infrastructure.

Parameters:

Name Type Description Default
text str

Full text with no blanks.

required
blank_positions list[int]

Word positions to blank (0-indexed).

required
blank_labels list[str] | None

Optional labels for blanks (for slot_name field). If None, uses generic labels like "blank_0", "blank_1".

None
instructions str | None

Optional instructions.

None
item_template_id UUID | None

Template ID for the item.

None
metadata dict[str, MetadataValue] | None

Additional metadata.

None

Returns:

Type Description
Item

Cloze item with text-based blanks.

Raises:

Type Description
ValueError

If blank_positions out of range or if blank_labels length mismatch.

Examples:

>>> item = create_simple_cloze_item(
...     text="The quick brown fox",
...     blank_positions=[1],  # "quick"
...     blank_labels=["adjective"]
... )
>>> item.rendered_elements["text"]
'The ___ brown fox'
>>> item.unfilled_slots[0].slot_name
'adjective'
>>> item.unfilled_slots[0].position
1

create_cloze_items_from_groups(items: list[Item], group_by: Callable[[Item], Any], n_slots_to_unfill: int = 1, *, extract_text: Callable[[Item], str] | None = None, include_group_metadata: bool = True, item_template_id: UUID | None = None) -> list[Item]

Create cloze items from grouped source items.

Groups items and creates cloze items from them. If source items have template metadata, uses template-based cloze. Otherwise, falls back to simple text-based cloze.

Parameters:

Name Type Description Default
items list[Item]

Source items to group.

required
group_by Callable[[Item], Any]

Grouping function.

required
n_slots_to_unfill int

Number of slots/words to unfill.

1
extract_text Callable[[Item], str] | None

Text extraction function. If None, tries common keys.

None
include_group_metadata bool

Whether to include group_key in metadata.

True
item_template_id UUID | None

Template ID for created items.

None

Returns:

Type Description
list[Item]

Cloze items from grouped source items.

Examples:

>>> cloze_items = create_cloze_items_from_groups(
...     items=source_items,
...     group_by=lambda i: i.item_metadata.get("category"),
...     n_slots_to_unfill=1
... )

create_filtered_cloze_items(templates: list[Any], n_slots_to_unfill: int = 1, *, template_filter: Callable[[Any], bool] | None = None, slot_filter: Callable[[str, Any], bool] | None = None, item_template_id: UUID | None = None) -> list[Item]

Create cloze items with multi-level filtering.

Filters templates and/or slots before creating cloze items.

Parameters:

Name Type Description Default
templates list[Template]

Source templates.

required
n_slots_to_unfill int

Number of slots to unfill.

1
template_filter Callable[[Template], bool] | None

Filter templates.

None
slot_filter Callable[[str, Slot], bool] | None

Filter which slots can be unfilled (receives slot_name and Slot object).

None
item_template_id UUID | None

Template ID for created items.

None

Returns:

Type Description
list[Item]

Filtered cloze items.

Examples:

>>> # Only unfill slots with constraints
>>> cloze_items = create_filtered_cloze_items(
...     templates=all_templates,
...     n_slots_to_unfill=1,
...     template_filter=lambda t: len(t.slots) >= 3,
...     slot_filter=lambda name, slot: len(slot.constraints) > 0
... )

Span Annotation Models

spans

Core span annotation models.

Provides data models for labeled spans, span segments, span labels, span relations, and span specifications. Supports discontiguous spans, overlapping spans (nested and intersecting), static and interactive modes, and two label sources (fixed sets and Wikidata entity search).

SpanSegment

Bases: BeadBaseModel

Contiguous or discontiguous indices within a single element.

Attributes:

Name Type Description
element_name str

Which rendered element this segment belongs to.

indices list[int]

Token or character indices within the element.

validate_element_name(v: str) -> str classmethod

Validate element name is not empty.

Parameters:

Name Type Description Default
v str

Element name to validate.

required

Returns:

Type Description
str

Validated element name.

Raises:

Type Description
ValueError

If element name is empty.

validate_indices(v: list[int]) -> list[int] classmethod

Validate indices are not empty and non-negative.

Parameters:

Name Type Description Default
v list[int]

Indices to validate.

required

Returns:

Type Description
list[int]

Validated indices.

Raises:

Type Description
ValueError

If indices are empty or contain negative values.

SpanLabel

Bases: BeadBaseModel

Label applied to a span or relation.

Attributes:

Name Type Description
label str

Human-readable label text.

label_id str | None

External identifier (e.g. Wikidata QID "Q5").

confidence float | None

Confidence score for model-assigned labels.

validate_label(v: str) -> str classmethod

Validate label is not empty.

Parameters:

Name Type Description Default
v str

Label to validate.

required

Returns:

Type Description
str

Validated label.

Raises:

Type Description
ValueError

If label is empty.

Span

Bases: BeadBaseModel

Labeled span across one or more elements.

Supports discontiguous, overlapping, and nested spans.

Attributes:

Name Type Description
span_id str

Unique identifier within the item.

segments list[SpanSegment]

Index segments composing this span.

head_index int | None

Syntactic head token index.

label SpanLabel | None

Label applied to this span (None = to-be-labeled).

span_type str | None

Semantic category (e.g. "entity", "event", "role").

span_metadata dict[str, MetadataValue]

Additional span-specific metadata.

validate_span_id(v: str) -> str classmethod

Validate span_id is not empty.

Parameters:

Name Type Description Default
v str

Span ID to validate.

required

Returns:

Type Description
str

Validated span ID.

Raises:

Type Description
ValueError

If span_id is empty.

SpanRelation

Bases: BeadBaseModel

A typed, directed relation between two spans.

Used for semantic role labeling, relation extraction, entity linking, coreference, and similar tasks.

Attributes:

Name Type Description
relation_id str

Unique identifier within the item.

source_span_id str

span_id of the source span.

target_span_id str

span_id of the target span.

label SpanLabel | None

Relation label (reuses SpanLabel for consistency).

directed bool

Whether the relation is directed (A->B) or undirected (A--B).

relation_metadata dict[str, MetadataValue]

Additional relation-specific metadata.

validate_relation_id(v: str) -> str classmethod

Validate relation_id is not empty.

Parameters:

Name Type Description Default
v str

Relation ID to validate.

required

Returns:

Type Description
str

Validated relation ID.

Raises:

Type Description
ValueError

If relation_id is empty.

validate_span_ids(v: str) -> str classmethod

Validate span IDs are not empty.

Parameters:

Name Type Description Default
v str

Span ID to validate.

required

Returns:

Type Description
str

Validated span ID.

Raises:

Type Description
ValueError

If span ID is empty.

SpanSpec

Bases: BeadBaseModel

Specification for span labeling behavior.

Configures how spans are displayed, created, and labeled in an experiment. Supports both fixed label sets and Wikidata entity search for both span labels and relation labels.

Attributes:

Name Type Description
index_mode SpanIndexMode

Whether spans index by token or character position.

interaction_mode SpanInteractionMode

"static" for read-only highlights, "interactive" for participant annotation.

label_source LabelSourceType

Source of span labels ("fixed" or "wikidata").

labels list[str] | None

Fixed span label set (when label_source is "fixed").

label_colors dict[str, str] | None

CSS colors keyed by label name.

allow_overlapping bool

Whether overlapping spans are permitted.

min_spans int | None

Minimum number of spans required (interactive mode).

max_spans int | None

Maximum number of spans allowed (interactive mode).

enable_relations bool

Whether relation annotation is enabled.

relation_label_source LabelSourceType

Source of relation labels.

relation_labels list[str] | None

Fixed relation label set.

relation_label_colors dict[str, str] | None

CSS colors keyed by relation label name.

relation_directed bool

Default directionality for new relations.

min_relations int | None

Minimum number of relations required (interactive mode).

max_relations int | None

Maximum number of relations allowed (interactive mode).

wikidata_language str

Language for Wikidata entity search.

wikidata_entity_types list[str] | None

Restrict Wikidata search to these entity types.

wikidata_result_limit int

Maximum number of Wikidata search results.

Span Labeling Utilities

span_labeling

Utilities for creating span labeling experimental items.

This module provides language-agnostic utilities for creating items with span annotations. Spans can be added to any existing item type (composability) or used as standalone span labeling tasks.

Integration Points
  • Active Learning: bead/active_learning/ (via alignment module)
  • Deployment: bead/deployment/jspsych/ (span-label plugin)
  • Tokenization: bead/tokenization/ (display-level tokens)

tokenize_item(item: Item, tokenizer_config: TokenizerConfig | None = None) -> Item

Tokenize an item's rendered_elements.

Populates tokenized_elements and token_space_after using the configured tokenizer. Returns a new Item (does not mutate).

Parameters:

Name Type Description Default
item Item

Item to tokenize.

required
tokenizer_config TokenizerConfig | None

Tokenizer configuration. If None, uses default (spaCy English).

None

Returns:

Type Description
Item

New item with populated tokenized_elements and token_space_after.

create_span_item(text: str, spans: list[Span], prompt: str, tokenizer_config: TokenizerConfig | None = None, tokens: list[str] | None = None, labels: list[str] | None = None, span_spec: SpanSpec | None = None, item_template_id: UUID | None = None, metadata: dict[str, MetadataValue] | None = None) -> Item

Create a standalone span labeling item.

Tokenizes text using config, validates span indices against tokens.

Parameters:

Name Type Description Default
text str

The stimulus text.

required
spans list[Span]

Pre-defined span annotations.

required
prompt str

Question or instruction for the participant.

required
tokenizer_config TokenizerConfig | None

Tokenizer configuration. Ignored if tokens is provided.

None
tokens list[str] | None

Pre-tokenized text (overrides tokenizer).

None
labels list[str] | None

Fixed label set for span labeling.

None
span_spec SpanSpec | None

Span specification. If None, creates a default static spec.

None
item_template_id UUID | None

Template ID. If None, generates a new UUID.

None
metadata dict[str, MetadataValue] | None

Additional item metadata.

None

Returns:

Type Description
Item

Span labeling item.

Raises:

Type Description
ValueError

If text is empty or span indices are out of bounds.

create_interactive_span_item(text: str, prompt: str, tokenizer_config: TokenizerConfig | None = None, tokens: list[str] | None = None, label_set: list[str] | None = None, label_source: LabelSourceType = 'fixed', item_template_id: UUID | None = None, metadata: dict[str, MetadataValue] | None = None) -> Item

Create an item for interactive span selection by participants.

Parameters:

Name Type Description Default
text str

The stimulus text.

required
prompt str

Instruction for the participant.

required
tokenizer_config TokenizerConfig | None

Tokenizer configuration.

None
tokens list[str] | None

Pre-tokenized text (overrides tokenizer).

None
label_set list[str] | None

Fixed label set (when label_source is "fixed").

None
label_source LabelSourceType

Label source type ("fixed" or "wikidata").

'fixed'
item_template_id UUID | None

Template ID. If None, generates a new UUID.

None
metadata dict[str, MetadataValue] | None

Additional item metadata.

None

Returns:

Type Description
Item

Interactive span labeling item (no pre-defined spans).

add_spans_to_item(item: Item, spans: list[Span], tokenizer_config: TokenizerConfig | None = None, span_spec: SpanSpec | None = None) -> Item

Add span annotations to any existing item.

This is the key composability function: any item (rating, forced choice, binary, etc.) can have spans added as an overlay. Tokenizes rendered_elements if not already tokenized. Returns a new Item.

Parameters:

Name Type Description Default
item Item

Existing item to add spans to.

required
spans list[Span]

Span annotations to add.

required
tokenizer_config TokenizerConfig | None

Tokenizer configuration (used only if item lacks tokenization).

None
span_spec SpanSpec | None

Span specification.

None

Returns:

Type Description
Item

New item with spans added.

Raises:

Type Description
ValueError

If span indices are out of bounds.

create_span_items_from_texts(texts: list[str], span_extractor: Callable[[str, list[str]], list[Span]], prompt: str, tokenizer_config: TokenizerConfig | None = None, labels: list[str] | None = None, item_template_id: UUID | None = None) -> list[Item]

Batch create span items with automatic tokenization.

Parameters:

Name Type Description Default
texts list[str]

List of stimulus texts.

required
span_extractor Callable[[str, list[str]], list[Span]]

Function that takes (text, tokens) and returns spans.

required
prompt str

Question or instruction for the participant.

required
tokenizer_config TokenizerConfig | None

Tokenizer configuration.

None
labels list[str] | None

Fixed label set.

None
item_template_id UUID | None

Shared template ID. If None, generates one per item.

None

Returns:

Type Description
list[Item]

Span labeling items.

Item Construction

constructor

Item constructor for building experimental items from templates.

This module provides the ItemConstructor class which transforms filled templates into experimental items by applying model-based constraints and collecting model outputs for analysis.

ItemConstructor

Construct experimental items from filled templates.

Transforms filled templates into items by: 1. Resolving element references to text 2. Computing required model outputs (from constraints) 3. Evaluating constraints with model outputs 4. Creating Item instances with metadata

Parameters:

Name Type Description Default
model_registry ModelAdapterRegistry

Registry of model adapters for constraint evaluation.

required
cache ModelOutputCache

Cache for model outputs to avoid redundant computation.

required
constraint_resolver ConstraintResolver | None

Resolver for evaluating non-model constraints. If None, only model-based constraints can be evaluated.

None

Attributes:

Name Type Description
model_registry ModelAdapterRegistry

Registry of model adapters for constraint evaluation.

cache ModelOutputCache

Cache for model outputs to avoid redundant computation.

constraint_resolver ConstraintResolver | None

Resolver for evaluating constraints (not used for model constraints).

Examples:

>>> from bead.items.adapters.registry import default_registry
>>> from bead.items.cache import ModelOutputCache
>>> cache = ModelOutputCache(backend="memory")
>>> constructor = ItemConstructor(default_registry, cache)
>>> constraints = {constraint_id: constraint_obj}
>>> items = list(constructor.construct_items(
...     template, filled_templates, constraints
... ))

construct_items(item_template: ItemTemplate, filled_templates: dict[UUID, FilledTemplate], constraints: dict[UUID, Constraint]) -> Iterator[Item]

Construct items from template and filled templates.

For each combination of filled templates: 1. Render elements (resolve filled_template_ref → text) 2. Compute required model outputs (from constraints) 3. Check constraints using model outputs 4. Yield item if all constraints satisfied

Parameters:

Name Type Description Default
item_template ItemTemplate

Template defining item structure and constraints.

required
filled_templates dict[UUID, FilledTemplate]

Map of filled template UUIDs to FilledTemplate instances.

required
constraints dict[UUID, Constraint]

Map of constraint UUIDs to Constraint objects.

required

Yields:

Type Description
Item

Constructed items that satisfy all constraints.

Raises:

Type Description
ValueError

If template references missing filled templates or constraints.

RuntimeError

If constraint evaluation or model computation fails.

Examples:

>>> template = ItemTemplate(...)
>>> filled = {uuid1: filled1, uuid2: filled2}
>>> constraints = {c_id: constraint_obj}
>>> items = list(constructor.construct_items(
...     template, filled, constraints
... ))
>>> len(items)
2

generation

Utilities for generating cross-product items from templates and lexicons.

This module provides language-agnostic utilities for generating items by combining templates with lexical resources in various patterns.

RELATIONSHIP TO ItemConstructor: - This module (generation.py): Generates cross-product combinations of templates × lexical items BEFORE template filling. Creates lightweight Item objects with just template_id, metadata, and unfilled information. Use when: You want to systematically explore all combinations of a lexical property (e.g., every verb in every frame).

  • ItemConstructor (constructor.py): Builds Items FROM ItemTemplates + FilledTemplates with constraint evaluation and model scoring. Takes filled templates and combines them into experimental items with multi-slot constraints checked. Use when: You have filled templates and want to construct experimental items with model-based constraint checking.

These modules are COMPLEMENTARY, not redundant. Typical pipeline: 1. generation.py: Generate cross-product → unfilled item specifications 2. Template filling: Fill template slots → FilledTemplates 3. constructor.py: Construct items → Items with constraints checked

create_cross_product_items(templates: list[Template], lexicons: dict[str, Lexicon], *, cross_product_slot: str = 'verb', metadata_extractor: Callable[[Template, LexicalItem], dict[str, MetadataValue]] | None = None, filter_fn: Callable[[Template, LexicalItem], bool] | None = None) -> Iterator[Item]

Generate cross-product items from templates and lexicons.

Creates an item for each combination of template × lexical item from the specified slot's lexicon. This is useful for systematic exploration of a lexical property (e.g., every verb in every frame).

Items are generated lazily via iterator for memory efficiency with large cross-products.

Parameters:

Name Type Description Default
templates list[Template]

Templates to use for generation.

required
lexicons dict[str, Lexicon]

Lexicons keyed by slot name.

required
cross_product_slot str

Slot name to vary across items (default: "verb"). This slot's lexicon will be crossed with all templates.

'verb'
metadata_extractor Callable[[Template, LexicalItem], dict[str, MetadataValue]] | None

Optional function to extract metadata from template and lexical item. Receives (template, lexical_item) and returns dict for item_metadata.

None
filter_fn Callable[[Template, LexicalItem], bool] | None

Optional filter function. Receives (template, lexical_item) and returns True to include, False to skip.

None

Yields:

Type Description
Item

Items representing template × lexical item combinations.

Examples:

Basic verb × template cross-product:

>>> from uuid import uuid4
>>> templates = [
...     Template(
...         name="transitive",
...         template_string="{subject} {verb} {object}.",
...         slots={}
...     )
... ]
>>> verb_lex = Lexicon(name="verbs")
>>> verb_lex.add(LexicalItem(lemma="walk"))
>>> verb_lex.add(LexicalItem(lemma="eat"))
>>> lexicons = {"verb": verb_lex}
>>> items = list(create_cross_product_items(templates, lexicons))
>>> len(items)
2

With metadata extraction:

>>> def extract_metadata(template, item):
...     return {
...         "verb_lemma": item.lemma,
...         "template_name": template.name,
...         "verb_pos": item.pos
...     }
>>> items = list(create_cross_product_items(
...     templates,
...     lexicons,
...     metadata_extractor=extract_metadata
... ))

With filtering:

>>> def filter_transitive_only(template, item):
...     return "transitive" in template.name
>>> items = list(create_cross_product_items(
...     templates,
...     lexicons,
...     filter_fn=filter_transitive_only
... ))

create_filtered_cross_product_items(templates: list[Template], lexicons: dict[str, Lexicon], *, cross_product_slot: str = 'verb', template_filter: Callable[[Template], bool] | None = None, item_filter: Callable[[LexicalItem], bool] | None = None, combination_filter: Callable[[Template, LexicalItem], bool] | None = None, metadata_extractor: Callable[[Template, LexicalItem], dict[str, MetadataValue]] | None = None) -> Iterator[Item]

Generate cross-product items with multiple filter levels.

Provides separate filters for templates, lexical items, and their combinations, offering more control than the basic cross-product function.

Parameters:

Name Type Description Default
templates list[Template]

Templates to use for generation.

required
lexicons dict[str, Lexicon]

Lexicons keyed by slot name.

required
cross_product_slot str

Slot name to vary across items.

'verb'
template_filter Callable[[Template], bool] | None

Filter for templates (applied before cross-product).

None
item_filter Callable[[LexicalItem], bool] | None

Filter for lexical items (applied before cross-product).

None
combination_filter Callable[[Template, LexicalItem], bool] | None

Filter for combinations (applied during generation).

None
metadata_extractor Callable[[Template, LexicalItem], dict[str, MetadataValue]] | None

Metadata extraction function.

None

Yields:

Type Description
Item

Filtered cross-product items.

Examples:

Filter at multiple levels:

>>> def template_filter(t):
...     return "transitive" in t.name
>>> def item_filter(i):
...     return i.pos == "VERB"
>>> def combination_filter(t, i):
...     # Only combine if verb is compatible with template
...     return True
>>> items = list(create_filtered_cross_product_items(
...     templates,
...     lexicons,
...     template_filter=template_filter,
...     item_filter=item_filter,
...     combination_filter=combination_filter
... ))

create_stratified_cross_product_items(templates: list[Template], lexicons: dict[str, Lexicon], *, cross_product_slot: str = 'verb', stratify_by: Callable[[LexicalItem], str], items_per_stratum: int, metadata_extractor: Callable[[Template, LexicalItem], dict[str, MetadataValue]] | None = None) -> Iterator[Item]

Generate stratified sample of cross-product items.

Instead of full cross-product, samples a fixed number of lexical items from each stratum (defined by stratify_by function) and crosses them with all templates.

Parameters:

Name Type Description Default
templates list[Template]

Templates to use for generation.

required
lexicons dict[str, Lexicon]

Lexicons keyed by slot name.

required
cross_product_slot str

Slot name to vary across items.

'verb'
stratify_by Callable[[LexicalItem], str]

Function to extract stratum key from lexical items.

required
items_per_stratum int

Number of items to sample from each stratum.

required
metadata_extractor Callable[[Template, LexicalItem], dict[str, MetadataValue]] | None

Metadata extraction function.

None

Yields:

Type Description
Item

Stratified cross-product items.

Examples:

Sample verbs stratified by frequency:

>>> def stratify_by_frequency(item):
...     freq = item.attributes.get("frequency", 0)
...     if freq > 1000:
...         return "high"
...     elif freq > 100:
...         return "medium"
...     else:
...         return "low"
>>> items = list(create_stratified_cross_product_items(
...     templates,
...     lexicons,
...     stratify_by=stratify_by_frequency,
...     items_per_stratum=10
... ))

items_to_jsonl(items: Iterator[Item], output_path: str, progress_interval: int = 1000) -> int

Write iterator of items to JSONL file with progress tracking.

Utility function for efficient streaming write of large item sets.

Parameters:

Name Type Description Default
items Iterator[Item]

Items to write.

required
output_path str

Path to output JSONL file.

required
progress_interval int

Print progress every N items (default: 1000).

1000

Returns:

Type Description
int

Number of items written.

Examples:

>>> items = create_cross_product_items(templates, lexicons)
>>> n = items_to_jsonl(items, "output.jsonl")
>>> print(f"Wrote {n} items")

Validation and Scoring

validation

Validation utilities for constructed items.

This module provides validation functions to ensure constructed items meet all requirements and contain complete, valid data.

validate_item(item: Item, item_template: ItemTemplate) -> list[str]

Validate a constructed item against its template.

Check that the item has all required fields, references valid templates, has consistent constraint satisfaction, and contains valid model outputs.

Parameters:

Name Type Description Default
item Item

Item to validate.

required
item_template ItemTemplate

Template the item was constructed from.

required

Returns:

Type Description
list[str]

List of validation error messages. Empty list if valid.

Examples:

>>> errors = validate_item(item, template)
>>> if errors:
...     print(f"Item is invalid: {errors}")
>>> else:
...     print("Item is valid")

validate_model_output(output: ModelOutput) -> list[str]

Validate a model output.

Check that the model output has all required fields and valid values.

Parameters:

Name Type Description Default
output ModelOutput

Model output to validate.

required

Returns:

Type Description
list[str]

List of validation error messages. Empty list if valid.

Examples:

>>> errors = validate_model_output(output)
>>> if not errors:
...     print("Model output is valid")

validate_constraint_satisfaction(item: Item, item_template: ItemTemplate) -> list[str]

Validate constraint satisfaction consistency.

Check that all constraints in the template have been evaluated and that the results are boolean values.

Parameters:

Name Type Description Default
item Item

Item to validate.

required
item_template ItemTemplate

Template with constraints.

required

Returns:

Type Description
list[str]

List of validation error messages. Empty list if valid.

Examples:

>>> errors = validate_constraint_satisfaction(item, template)
>>> if not errors:
...     print("Constraint satisfaction is valid")

validate_metadata_completeness(item: Item) -> list[str]

Validate that item metadata is complete.

Check that the item has all expected metadata fields populated. Since Item inherits from BeadBaseModel, id, created_at, and modified_at are always present. This function is kept for consistency and future extensibility.

Parameters:

Name Type Description Default
item Item

Item to validate.

required

Returns:

Type Description
list[str]

List of validation error messages. Empty list if valid.

Examples:

>>> errors = validate_metadata_completeness(item)
>>> if not errors:
...     print("Metadata is complete")

item_passes_all_constraints(item: Item) -> bool

Check if item satisfies all constraints.

Convenience function to check if all constraints are satisfied.

Parameters:

Name Type Description Default
item Item

Item to check.

required

Returns:

Type Description
bool

True if all constraints satisfied, False otherwise.

Examples:

>>> if item_passes_all_constraints(item):
...     print("Item is valid")

get_task_type_requirements(task_type: TaskType) -> dict[str, list[str] | str]

Get validation requirements for a task type.

Returns a dictionary describing the structural requirements for items of the specified task type. Useful for introspection, error messages, and documentation generation.

Parameters:

Name Type Description Default
task_type TaskType

Task type to get requirements for.

required

Returns:

Type Description
dict

Requirements specification with keys: - required_rendered_keys: List of required rendered_elements keys - required_metadata_keys: List of required item_metadata keys - optional_metadata_keys: List of optional item_metadata keys - special_fields: List of special fields (e.g., ["unfilled_slots"]) - description: Human-readable description

Examples:

>>> reqs = get_task_type_requirements("ordinal_scale")
>>> print(reqs["required_rendered_keys"])
['text']
>>> print(reqs["required_metadata_keys"])
['scale_min', 'scale_max']

validate_item_for_task_type(item: Item, task_type: TaskType) -> bool

Validate that an Item's structure matches requirements for a task type.

Checks that the item has the required rendered_elements keys, item_metadata keys, and special fields for the specified task type. Raises descriptive ValueError if validation fails.

Parameters:

Name Type Description Default
item Item

Item to validate.

required
task_type TaskType

Expected task type (from bead.items.item_template.TaskType).

required

Returns:

Type Description
bool

True if valid.

Raises:

Type Description
ValueError

If item structure doesn't match task type requirements, with detailed explanation of what's wrong.

Examples:

>>> from bead.items.ordinal_scale import create_ordinal_scale_item
>>> item = create_ordinal_scale_item("How natural?", scale_bounds=(1, 7))
>>> validate_item_for_task_type(item, "ordinal_scale")
True
>>> from bead.items.forced_choice import create_forced_choice_item
>>> fc_item = create_forced_choice_item("A", "B")
>>> validate_item_for_task_type(fc_item, "ordinal_scale")
ValueError: ordinal_scale items must have 'text' in rendered_elements...

infer_task_type_from_item(item: Item) -> TaskType

Infer most likely task type from Item structure.

Examines the item's rendered_elements, item_metadata, and special fields to determine which task type it matches. Uses priority order to handle ambiguous cases.

Parameters:

Name Type Description Default
item Item

Item to infer from.

required

Returns:

Type Description
TaskType

Inferred task type.

Raises:

Type Description
ValueError

If item structure doesn't match any task type or is ambiguous.

Examples:

>>> from bead.items.ordinal_scale import create_likert_7_item
>>> item = create_likert_7_item("How natural is this sentence?")
>>> infer_task_type_from_item(item)
'ordinal_scale'
>>> from bead.items.categorical import create_nli_item
>>> item2 = create_nli_item("All dogs bark", "Some dogs bark")
>>> infer_task_type_from_item(item2)
'categorical'

scoring

Abstract base classes for item scoring with language models.

This module provides language-agnostic base classes for scoring items using various metrics (log probability, perplexity, embeddings).

ItemScorer

Bases: ABC

Abstract base class for item scoring.

ItemScorer provides a framework for assigning numeric scores to items based on various criteria (language model probability, acceptability, similarity, etc.).

Examples:

Implementing a custom scorer:

>>> class AcceptabilityScorer(ItemScorer):
...     def score(self, item):
...         # Score based on some acceptability metric
...         text = item.rendered_elements.get("text", "")
...         return self._compute_acceptability(text)
...
...     def score_batch(self, items):
...         return [self.score(item) for item in items]

score(item: Item) -> float abstractmethod

Compute score for a single item.

Parameters:

Name Type Description Default
item Item

Item to score.

required

Returns:

Type Description
float

Numeric score for the item.

score_batch(items: list[Item]) -> list[float]

Compute scores for multiple items.

Default implementation calls score() for each item sequentially. Subclasses can override for batch processing optimization.

Parameters:

Name Type Description Default
items list[Item]

Items to score.

required

Returns:

Type Description
list[float]

Scores for each item.

Examples:

>>> scorer = ConcreteScorer()
>>> items = [item1, item2, item3]
>>> scores = scorer.score_batch(items)
>>> len(scores) == len(items)
True

score_with_metadata(items: list[Item]) -> dict[UUID, dict[str, float | str]]

Score items and return results with metadata.

Parameters:

Name Type Description Default
items list[Item]

Items to score.

required

Returns:

Type Description
dict[UUID, dict[str, float | str]]

Dictionary mapping item UUIDs to score dictionaries. Each score dict contains at least a "score" key.

Examples:

>>> scorer = ConcreteScorer()
>>> results = scorer.score_with_metadata([item1, item2])
>>> results[item1.id]["score"]
-42.5

LanguageModelScorer

Bases: ItemScorer

Scorer using language model log probabilities.

Scores items based on their log probability under a language model. Uses HuggingFace adapters for model inference and supports caching.

Parameters:

Name Type Description Default
model_name str

HuggingFace model identifier (e.g., "gpt2", "gpt2-medium").

required
cache_dir Path | str | None

Directory for caching model outputs. If None, no caching.

None
device str

Device to run model on ("cpu", "cuda", "mps").

'cpu'
text_key str

Key in item.rendered_elements to use as text (default: "text").

'text'
model_version str

Version string for cache tracking.

'unknown'

Examples:

>>> from pathlib import Path
>>> scorer = LanguageModelScorer(
...     model_name="gpt2",
...     cache_dir=Path(".cache"),
...     device="cpu"
... )
>>> score = scorer.score(item)
>>> score < 0  # Log probabilities are negative
True

model: HuggingFaceLanguageModel property

Get the model, loading if necessary.

Returns:

Type Description
HuggingFaceLanguageModel

The language model adapter.

score(item: Item) -> float

Compute log probability score for an item.

Parameters:

Name Type Description Default
item Item

Item to score.

required

Returns:

Type Description
float

Log probability of the item's text under the language model.

Raises:

Type Description
KeyError

If text_key not found in item.rendered_elements.

score_batch(items: list[Item], batch_size: int | None = None) -> list[float]

Compute scores for multiple items efficiently using batched inference.

Parameters:

Name Type Description Default
items list[Item]

Items to score.

required
batch_size int | None

Number of items to process in each batch. If None, automatically infers optimal batch size based on available resources.

None

Returns:

Type Description
list[float]

Log probabilities for each item.

score_with_metadata(items: list[Item]) -> dict[UUID, dict[str, float | str]]

Score items and return results with additional metrics.

Returns log probability and perplexity for each item.

Parameters:

Name Type Description Default
items list[Item]

Items to score.

required

Returns:

Type Description
dict[UUID, dict[str, float | str]]

Dictionary with "score" (log prob) and "perplexity" for each item.

ForcedChoiceScorer

Bases: ItemScorer

Scorer for N-AFC (forced-choice) items with multiple options.

Computes comparison scores for forced-choice items by scoring each option and applying a comparison function (e.g., max difference, variance, entropy).

Parameters:

Name Type Description Default
base_scorer ItemScorer

Base scorer to use for individual options.

required
comparison_fn callable | None

Function that takes list of scores and returns comparison metric. Default is standard deviation (variance in scores).

None
option_prefix str

Prefix for option names in rendered_elements (default: "option").

'option'

Examples:

>>> base = LanguageModelScorer("gpt2", device="cpu")
>>> fc_scorer = ForcedChoiceScorer(
...     base_scorer=base,
...     comparison_fn=lambda scores: max(scores) - min(scores)  # Range
... )
>>> # Item with option_a, option_b, option_c, ...
>>> score = fc_scorer.score(forced_choice_item)

score(item: Item) -> float

Score a forced-choice item.

Extracts all options from item.rendered_elements (option_a, option_b, ...), scores each option, and applies comparison function.

Parameters:

Name Type Description Default
item Item

Forced-choice item with multiple options.

required

Returns:

Type Description
float

Comparison score across all options.

Raises:

Type Description
ValueError

If item doesn't contain option elements or has precomputed scores.

Model Output Cache

cache

Content-addressable cache for judgment model outputs.

This module provides caching infrastructure for model outputs during item construction. It supports multiple backends (filesystem, in-memory) and various operation types including log probabilities, NLI scores, embeddings, and similarity metrics.

Note: This cache is distinct from bead.templates.adapters.cache, which handles MLM predictions for template filling. This module caches judgment model outputs used in item construction.

CacheBackend

Bases: ABC

Abstract base class for cache backends.

Defines the interface that all cache backends must implement.

get(key: str) -> dict[str, object] | None abstractmethod

Retrieve cache entry by key.

Parameters:

Name Type Description Default
key str

Cache key to retrieve.

required

Returns:

Type Description
dict[str, object] | None

Cache entry data if found, None otherwise.

set(key: str, data: dict[str, object]) -> None abstractmethod

Store cache entry with key.

Parameters:

Name Type Description Default
key str

Cache key.

required
data dict[str, object]

Cache entry data to store.

required

delete(key: str) -> None abstractmethod

Delete cache entry by key.

Parameters:

Name Type Description Default
key str

Cache key to delete.

required

clear() -> None abstractmethod

Clear all cache entries.

keys() -> list[str] abstractmethod

Return all cache keys.

Returns:

Type Description
list[str]

List of all cache keys in the backend.

FilesystemBackend

Bases: CacheBackend

Filesystem-based cache backend.

Stores each cache entry as a separate JSON file with the cache key as the filename.

Parameters:

Name Type Description Default
cache_dir Path

Directory for cache storage.

required

Attributes:

Name Type Description
cache_dir Path

Directory where cache files are stored.

Examples:

>>> from pathlib import Path
>>> backend = FilesystemBackend(cache_dir=Path(".cache"))
>>> backend.set("abc123", {"result": 42})
>>> backend.get("abc123")
{'result': 42}

get(key: str) -> dict[str, object] | None

Retrieve cache entry from filesystem.

Parameters:

Name Type Description Default
key str

Cache key.

required

Returns:

Type Description
dict[str, object] | None

Cache entry data if found, None otherwise.

set(key: str, data: dict[str, object]) -> None

Store cache entry to filesystem.

Parameters:

Name Type Description Default
key str

Cache key.

required
data dict[str, object]

Cache entry data.

required

delete(key: str) -> None

Delete cache entry from filesystem.

Parameters:

Name Type Description Default
key str

Cache key to delete.

required

clear() -> None

Clear all cache entries from filesystem.

keys() -> list[str]

Return all cache keys from filesystem.

Returns:

Type Description
list[str]

List of cache keys (filenames without .json extension).

InMemoryBackend

Bases: CacheBackend

In-memory cache backend.

Stores cache entries in a dictionary. No persistence across program runs. Useful for testing and temporary caching scenarios.

Examples:

>>> backend = InMemoryBackend()
>>> backend.set("xyz789", {"result": 3.14})
>>> backend.get("xyz789")
{'result': 3.14}

get(key: str) -> dict[str, object] | None

Retrieve cache entry from memory.

Parameters:

Name Type Description Default
key str

Cache key.

required

Returns:

Type Description
dict[str, object] | None

Cache entry data if found, None otherwise.

set(key: str, data: dict[str, object]) -> None

Store cache entry in memory.

Parameters:

Name Type Description Default
key str

Cache key.

required
data dict[str, object]

Cache entry data.

required

delete(key: str) -> None

Delete cache entry from memory.

Parameters:

Name Type Description Default
key str

Cache key to delete.

required

clear() -> None

Clear all cache entries from memory.

keys() -> list[str]

Return all cache keys from memory.

Returns:

Type Description
list[str]

List of cache keys.

ModelOutputCache

Content-addressable cache for judgment model outputs.

Caches results from various model operations to avoid redundant computation. Supports multiple operation types including log probabilities, perplexity, NLI scores, embeddings, and similarity metrics.

Cache keys are automatically generated using SHA-256 hashing of the model name, operation type, and all input parameters, ensuring deterministic cache hits for identical inputs.

Parameters:

Name Type Description Default
cache_dir Path | None

Directory for cache files (filesystem backend only). Defaults to ~/.cache/bead/models if not specified.

None
backend ('filesystem', 'memory')

Cache backend type. "filesystem" persists across runs, "memory" is ephemeral.

"filesystem"
enabled bool

Whether caching is enabled.

True

Attributes:

Name Type Description
enabled bool

Whether caching is enabled. When False, all operations are no-ops.

Examples:

Basic usage with filesystem backend:

>>> from pathlib import Path
>>> cache = ModelOutputCache(cache_dir=Path(".cache"))
>>> result = cache.get("gpt2", "log_probability", text="Hello world")
>>> if result is None:
...     result = -2.5
...     cache.set("gpt2", "log_probability", result, text="Hello world")

Caching NLI scores:

>>> nli_scores = cache.get("roberta-nli", "nli",
...                        premise="Mary loves books",
...                        hypothesis="Mary enjoys reading")
>>> if nli_scores is None:
...     nli_scores = {"entailment": 0.9, "neutral": 0.08, "contradiction": 0.02}
...     cache.set("roberta-nli", "nli", nli_scores,
...              premise="Mary loves books", hypothesis="Mary enjoys reading")

Caching embeddings:

>>> import numpy as np
>>> embedding = cache.get("bert-base", "embedding", text="Hello")
>>> if embedding is None:
...     embedding = np.random.rand(768)
...     cache.set("bert-base", "embedding", embedding, text="Hello")

generate_cache_key(model_name: str, operation: str, **inputs: str | int | float | bool | None) -> str

Generate deterministic cache key from inputs.

Parameters:

Name Type Description Default
model_name str

Model identifier.

required
operation str

Operation type (e.g., "log_probability", "embedding").

required
**inputs str | int | float | bool | None

Input parameters for the operation (text, premise, hypothesis).

{}

Returns:

Type Description
str

SHA-256 hex digest as cache key.

get(model_name: str, operation: str, **inputs: str | int | float | bool | None) -> Any

Retrieve cached result.

Parameters:

Name Type Description Default
model_name str

Model identifier.

required
operation str

Operation type (e.g., "log_probability", "nli", "embedding").

required
**inputs str | int | float | bool | None

Input parameters for the operation (text, premise, hypothesis).

{}

Returns:

Type Description
Any

Cached result if found, None otherwise.

set(model_name: str, operation: str, result: float | dict[str, float] | list[float] | np.ndarray, model_version: str | None = None, **inputs: str | int | float | bool | None) -> None

Store result in cache.

Parameters:

Name Type Description Default
model_name str

Model identifier.

required
operation str

Operation type (e.g., "log_probability", "nli", "embedding").

required
result float | dict[str, float] | list[float] | ndarray

Result to cache (log probability, NLI scores, embedding, etc.).

required
model_version str | None

Optional model version string for tracking.

None
**inputs str | int | float | bool | None

Input parameters for the operation (text, premise, hypothesis).

{}

invalidate(model_name: str, operation: str, **inputs: str | int | float | bool | None) -> None

Invalidate specific cache entry.

Parameters:

Name Type Description Default
model_name str

Model identifier.

required
operation str

Operation type.

required
**inputs str | int | float | bool | None

Input parameters for the operation.

{}

clear_model(model_name: str) -> None

Clear all cache entries for a specific model.

Parameters:

Name Type Description Default
model_name str

Model identifier.

required

clear() -> None

Clear all cache entries.

Model Adapters

base

Base class for model adapters used in item construction.

This module defines the abstract ModelAdapter interface that all model adapters must implement to support judgment prediction operations during Stage 3 (Item Construction).

This is SEPARATE from template filling model adapters (bead.templates.models.adapter), which are used in Stage 2.

ModelAdapter

Bases: ABC

Base class for model adapters used in item construction.

All model adapters must implement this interface to support judgment prediction operations during Stage 3 (Item Construction).

This is SEPARATE from template filling model adapters (bead.templates.models.adapter), which are used in Stage 2.

Parameters:

Name Type Description Default
model_name str

Model identifier (e.g., "gpt2", "roberta-large-mnli").

required
cache ModelOutputCache

Cache instance for storing model outputs.

required
model_version str

Version of the model for cache tracking.

'unknown'

Attributes:

Name Type Description
model_name str

Model identifier (e.g., "gpt2", "roberta-large-mnli").

model_version str

Version of the model.

cache ModelOutputCache

Cache for model outputs.

compute_log_probability(text: str) -> float abstractmethod

Compute log probability of text under language model.

Required for language model constraints. Should raise NotImplementedError if not supported by model type.

Parameters:

Name Type Description Default
text str

Text to compute log probability for.

required

Returns:

Type Description
float

Log probability of the text.

Raises:

Type Description
NotImplementedError

If this operation is not supported by the model type.

compute_perplexity(text: str) -> float abstractmethod

Compute perplexity of text.

Required for complexity-based filtering. Should raise NotImplementedError if not supported by model type.

Parameters:

Name Type Description Default
text str

Text to compute perplexity for.

required

Returns:

Type Description
float

Perplexity of the text (must be positive).

Raises:

Type Description
NotImplementedError

If this operation is not supported by the model type.

get_embedding(text: str) -> np.ndarray[tuple[int, ...], np.dtype[np.float64]] abstractmethod

Get embedding vector for text.

Required for similarity computations and semantic clustering. Should raise NotImplementedError if not supported by model type.

Parameters:

Name Type Description Default
text str

Text to embed.

required

Returns:

Type Description
ndarray

Embedding vector for the text.

Raises:

Type Description
NotImplementedError

If this operation is not supported by the model type.

compute_nli(premise: str, hypothesis: str) -> dict[str, float] abstractmethod

Compute natural language inference scores.

Must return dict with keys: "entailment", "neutral", "contradiction". Required for inference-based constraints. Should raise NotImplementedError if not supported by model type.

Parameters:

Name Type Description Default
premise str

Premise text.

required
hypothesis str

Hypothesis text.

required

Returns:

Type Description
dict[str, float]

Dictionary with keys "entailment", "neutral", "contradiction" mapping to probability scores that sum to ~1.0.

Raises:

Type Description
NotImplementedError

If this operation is not supported by the model type.

compute_similarity(text1: str, text2: str) -> float

Compute similarity between two texts.

Default implementation using cosine similarity of embeddings. Can be overridden for specialized similarity computation.

Parameters:

Name Type Description Default
text1 str

First text.

required
text2 str

Second text.

required

Returns:

Type Description
float

Similarity score in [-1, 1] (cosine similarity).

Raises:

Type Description
NotImplementedError

If embeddings are not supported by the model type.

get_nli_label(premise: str, hypothesis: str) -> str

Get predicted NLI label (max score).

Default implementation using argmax over compute_nli() scores.

Parameters:

Name Type Description Default
premise str

Premise text.

required
hypothesis str

Hypothesis text.

required

Returns:

Type Description
str

Predicted label: "entailment", "neutral", or "contradiction".

Raises:

Type Description
NotImplementedError

If NLI is not supported by the model type.

huggingface

HuggingFace model adapters for language models and NLI.

This module provides adapters for HuggingFace Transformers models: - HuggingFaceLanguageModel: Causal LMs (GPT-2, GPT-Neo, Llama, Mistral) - HuggingFaceMaskedLanguageModel: Masked LMs (BERT, RoBERTa, ALBERT) - HuggingFaceNLI: NLI models (RoBERTa-MNLI, DeBERTa-MNLI, BART-MNLI)

HuggingFaceLanguageModel

Bases: HuggingFaceAdapterMixin, ModelAdapter

Adapter for HuggingFace causal language models.

Supports models like GPT-2, GPT-Neo, Llama, Mistral, and other autoregressive (left-to-right) language models.

Parameters:

Name Type Description Default
model_name str

HuggingFace model identifier (e.g., "gpt2", "gpt2-medium").

required
cache ModelOutputCache

Cache instance for storing model outputs.

required
device ('cpu', 'cuda', 'mps')

Device to run model on. Falls back to CPU if device unavailable.

"cpu"
model_version str

Version string for cache tracking.

'unknown'

Examples:

>>> from pathlib import Path
>>> from bead.items.cache import ModelOutputCache
>>> cache = ModelOutputCache(cache_dir=Path(".cache"))
>>> model = HuggingFaceLanguageModel("gpt2", cache, device="cpu")
>>> log_prob = model.compute_log_probability("The cat sat on the mat.")
>>> perplexity = model.compute_perplexity("The cat sat on the mat.")
>>> embedding = model.get_embedding("The cat sat on the mat.")

model: PreTrainedModel property

Get the model, loading if necessary.

tokenizer: PreTrainedTokenizerBase property

Get the tokenizer, loading if necessary.

compute_log_probability(text: str) -> float

Compute log probability of text under language model.

Uses the model's loss with labels=input_ids to compute the negative log-likelihood of the text.

Parameters:

Name Type Description Default
text str

Text to compute log probability for.

required

Returns:

Type Description
float

Log probability of the text.

compute_log_probability_batch(texts: list[str], batch_size: int | None = None) -> list[float]

Compute log probabilities for multiple texts efficiently.

Uses batched tokenization and inference for significant speedup. Checks cache before computing, only processes uncached texts.

Parameters:

Name Type Description Default
texts list[str]

Texts to compute log probabilities for.

required
batch_size int | None

Number of texts to process in each batch. If None, automatically infers optimal batch size based on available device memory and model size.

None

Returns:

Type Description
list[float]

Log probabilities for each text, in the same order as input.

Examples:

>>> texts = ["The cat sat.", "The dog ran.", "The bird flew."]
>>> log_probs = model.compute_log_probability_batch(texts)
>>> len(log_probs) == len(texts)
True

compute_perplexity(text: str) -> float

Compute perplexity of text.

Perplexity is exp(average negative log-likelihood per token).

Parameters:

Name Type Description Default
text str

Text to compute perplexity for.

required

Returns:

Type Description
float

Perplexity of the text (positive value).

get_embedding(text: str) -> np.ndarray

Get embedding vector for text.

Uses mean pooling of last hidden states as the text embedding.

Parameters:

Name Type Description Default
text str

Text to embed.

required

Returns:

Type Description
ndarray

Embedding vector for the text.

compute_nli(premise: str, hypothesis: str) -> dict[str, float]

Compute natural language inference scores.

Not supported for causal language models.

Raises:

Type Description
NotImplementedError

Always raised, as causal LMs don't support NLI directly.

HuggingFaceMaskedLanguageModel

Bases: HuggingFaceAdapterMixin, ModelAdapter

Adapter for HuggingFace masked language models.

Supports models like BERT, RoBERTa, ALBERT, and other masked language models (MLMs).

Parameters:

Name Type Description Default
model_name str

HuggingFace model identifier (e.g., "bert-base-uncased").

required
cache ModelOutputCache

Cache instance for storing model outputs.

required
device ('cpu', 'cuda', 'mps')

Device to run model on. Falls back to CPU if device unavailable.

"cpu"
model_version str

Version string for cache tracking.

'unknown'

Examples:

>>> from pathlib import Path
>>> from bead.items.cache import ModelOutputCache
>>> cache = ModelOutputCache(cache_dir=Path(".cache"))
>>> model = HuggingFaceMaskedLanguageModel("bert-base-uncased", cache)
>>> log_prob = model.compute_log_probability("The cat sat on the mat.")
>>> embedding = model.get_embedding("The cat sat on the mat.")

model: PreTrainedModel property

Get the model, loading if necessary.

tokenizer: PreTrainedTokenizerBase property

Get the tokenizer, loading if necessary.

compute_log_probability(text: str) -> float

Compute log probability of text using pseudo-log-likelihood.

For MLMs, we use pseudo-log-likelihood: mask each token one at a time and sum the log probabilities of predicting each token.

This is computationally expensive - caching is critical.

Parameters:

Name Type Description Default
text str

Text to compute log probability for.

required

Returns:

Type Description
float

Pseudo-log-probability of the text.

compute_perplexity(text: str) -> float

Compute perplexity based on pseudo-log-likelihood.

Parameters:

Name Type Description Default
text str

Text to compute perplexity for.

required

Returns:

Type Description
float

Perplexity of the text (positive value).

get_embedding(text: str) -> np.ndarray

Get embedding vector for text.

Uses the [CLS] token embedding from the last layer.

Parameters:

Name Type Description Default
text str

Text to embed.

required

Returns:

Type Description
ndarray

Embedding vector for the text.

compute_nli(premise: str, hypothesis: str) -> dict[str, float]

Compute natural language inference scores.

Not supported for masked language models.

Raises:

Type Description
NotImplementedError

Always raised, as MLMs don't support NLI directly.

HuggingFaceNLI

Bases: HuggingFaceAdapterMixin, ModelAdapter

Adapter for HuggingFace NLI models.

Supports NLI models trained on MNLI and similar datasets (e.g., "roberta-large-mnli", "microsoft/deberta-base-mnli").

Parameters:

Name Type Description Default
model_name str

HuggingFace model identifier for NLI model.

required
cache ModelOutputCache

Cache instance for storing model outputs.

required
device ('cpu', 'cuda', 'mps')

Device to run model on. Falls back to CPU if device unavailable.

"cpu"
model_version str

Version string for cache tracking.

'unknown'

Examples:

>>> from pathlib import Path
>>> from bead.items.cache import ModelOutputCache
>>> cache = ModelOutputCache(cache_dir=Path(".cache"))
>>> nli = HuggingFaceNLI("roberta-large-mnli", cache, device="cpu")
>>> scores = nli.compute_nli(
...     premise="Mary loves reading books.",
...     hypothesis="Mary enjoys literature."
... )
>>> label = nli.get_nli_label(
...     premise="Mary loves reading books.",
...     hypothesis="Mary enjoys literature."
... )

model: PreTrainedModel property

Get the model, loading if necessary.

tokenizer: PreTrainedTokenizerBase property

Get the tokenizer, loading if necessary.

compute_log_probability(text: str) -> float

Compute log probability of text.

Not supported for NLI models.

Raises:

Type Description
NotImplementedError

Always raised, as NLI models don't provide log probabilities.

compute_perplexity(text: str) -> float

Compute perplexity of text.

Not supported for NLI models.

Raises:

Type Description
NotImplementedError

Always raised, as NLI models don't provide perplexity.

get_embedding(text: str) -> np.ndarray

Get embedding vector for text.

Uses the model's encoder to get embeddings. Note that NLI models are typically fine-tuned for classification, so embeddings may not be optimal for general similarity tasks.

Parameters:

Name Type Description Default
text str

Text to embed.

required

Returns:

Type Description
ndarray

Embedding vector for the text.

compute_nli(premise: str, hypothesis: str) -> dict[str, float]

Compute natural language inference scores.

Parameters:

Name Type Description Default
premise str

Premise text.

required
hypothesis str

Hypothesis text.

required

Returns:

Type Description
dict[str, float]

Dictionary with keys "entailment", "neutral", "contradiction" mapping to probability scores that sum to ~1.0.

openai

OpenAI API adapter for item construction.

This module provides a ModelAdapter implementation for OpenAI's API, supporting GPT models for various NLP tasks including log probability computation, embeddings, and natural language inference via prompting.

OpenAIAdapter

Bases: ModelAdapter

Adapter for OpenAI API models.

Provides access to OpenAI's GPT models for language model operations, embeddings, and prompted natural language inference.

Parameters:

Name Type Description Default
model_name str

OpenAI model identifier (default: "gpt-3.5-turbo").

'gpt-3.5-turbo'
api_key str | None

OpenAI API key. If None, uses OPENAI_API_KEY environment variable.

None
cache ModelOutputCache | None

Cache for model outputs. If None, creates in-memory cache.

None
model_version str

Model version for cache tracking (default: "latest").

'latest'
embedding_model str

Model to use for embeddings (default: "text-embedding-ada-002").

'text-embedding-ada-002'

Attributes:

Name Type Description
model_name str

OpenAI model identifier (e.g., "gpt-3.5-turbo", "gpt-4").

client OpenAI

OpenAI API client.

embedding_model str

Model to use for embeddings (default: "text-embedding-ada-002").

Raises:

Type Description
ValueError

If no API key is provided and OPENAI_API_KEY is not set.

compute_log_probability(text: str) -> float

Compute log probability of text using OpenAI completions API.

Uses the completions API with logprobs to get token-level log probabilities and sums them to get the total log probability.

Parameters:

Name Type Description Default
text str

Text to compute log probability for.

required

Returns:

Type Description
float

Log probability of the text (sum of token log probabilities).

compute_perplexity(text: str) -> float

Compute perplexity of text.

Perplexity is computed as exp(-log_prob / num_tokens).

Parameters:

Name Type Description Default
text str

Text to compute perplexity for.

required

Returns:

Type Description
float

Perplexity of the text (must be positive).

get_embedding(text: str) -> np.ndarray

Get embedding vector for text using OpenAI embeddings API.

Parameters:

Name Type Description Default
text str

Text to embed.

required

Returns:

Type Description
ndarray

Embedding vector for the text.

compute_nli(premise: str, hypothesis: str) -> dict[str, float]

Compute natural language inference scores via prompting.

Uses chat completions API with a prompt to classify the relationship between premise and hypothesis.

Parameters:

Name Type Description Default
premise str

Premise text.

required
hypothesis str

Hypothesis text.

required

Returns:

Type Description
dict[str, float]

Dictionary with keys "entailment", "neutral", "contradiction" mapping to probability scores.

anthropic

Anthropic API adapter for item construction.

This module provides a ModelAdapter implementation for Anthropic's Claude API, supporting natural language inference via prompting. Note that Claude API does not provide direct access to log probabilities or embeddings.

AnthropicAdapter

Bases: ModelAdapter

Adapter for Anthropic Claude API models.

Provides access to Claude models for prompted natural language inference. Note that Claude API does not support log probability computation or embeddings, so those methods will raise NotImplementedError.

Parameters:

Name Type Description Default
model_name str

Claude model identifier (default: "claude-3-5-sonnet-20241022").

'claude-3-5-sonnet-20241022'
api_key str | None

Anthropic API key. If None, uses ANTHROPIC_API_KEY environment variable.

None
cache ModelOutputCache | None

Cache for model outputs. If None, creates in-memory cache.

None
model_version str

Model version for cache tracking (default: "latest").

'latest'

Attributes:

Name Type Description
model_name str

Claude model identifier (e.g., "claude-3-5-sonnet-20241022").

client Anthropic

Anthropic API client.

Raises:

Type Description
ValueError

If no API key is provided and ANTHROPIC_API_KEY is not set.

compute_log_probability(text: str) -> float

Compute log probability of text.

Not supported by Anthropic API.

Raises:

Type Description
NotImplementedError

Always raised - Claude API does not provide log probabilities.

compute_perplexity(text: str) -> float

Compute perplexity of text.

Not supported by Anthropic API (requires log probabilities).

Raises:

Type Description
NotImplementedError

Always raised - requires log probability support.

get_embedding(text: str) -> np.ndarray

Get embedding vector for text.

Not supported by Anthropic API.

Raises:

Type Description
NotImplementedError

Always raised - Claude API does not provide embeddings.

compute_nli(premise: str, hypothesis: str) -> dict[str, float]

Compute natural language inference scores via prompting.

Uses Claude's messages API with a prompt to classify the relationship between premise and hypothesis.

Parameters:

Name Type Description Default
premise str

Premise text.

required
hypothesis str

Hypothesis text.

required

Returns:

Type Description
dict[str, float]

Dictionary with keys "entailment", "neutral", "contradiction" mapping to probability scores.

google

Google Generative AI adapter for item construction.

This module provides a ModelAdapter implementation for Google's Generative AI models (Gemini), supporting natural language inference via prompting and embeddings. Note that Gemini API does not provide direct access to log probabilities.

GoogleAdapter

Bases: ModelAdapter

Adapter for Google Generative AI models (Gemini).

Provides access to Gemini models for natural language inference and embeddings. Note that Gemini API does not support log probability computation.

Parameters:

Name Type Description Default
model_name str

Gemini model identifier (default: "gemini-pro").

'gemini-pro'
api_key str | None

Google API key. If None, uses GOOGLE_API_KEY environment variable.

None
cache ModelOutputCache | None

Cache for model outputs. If None, creates in-memory cache.

None
model_version str

Model version for cache tracking (default: "latest").

'latest'
embedding_model str

Model to use for embeddings (default: "models/embedding-001").

'models/embedding-001'

Attributes:

Name Type Description
model_name str

Gemini model identifier (e.g., "gemini-pro").

model GenerativeModel

Google Generative AI model instance.

embedding_model str

Model to use for embeddings (default: "models/embedding-001").

Raises:

Type Description
ValueError

If no API key is provided and GOOGLE_API_KEY is not set.

compute_log_probability(text: str) -> float

Compute log probability of text.

Not supported by Google Generative AI API.

Raises:

Type Description
NotImplementedError

Always raised - Gemini API does not provide log probabilities.

compute_perplexity(text: str) -> float

Compute perplexity of text.

Not supported by Google Generative AI API (requires log probabilities).

Raises:

Type Description
NotImplementedError

Always raised - requires log probability support.

get_embedding(text: str) -> np.ndarray

Get embedding vector for text using Google's embedding model.

Parameters:

Name Type Description Default
text str

Text to embed.

required

Returns:

Type Description
ndarray

Embedding vector for the text.

compute_nli(premise: str, hypothesis: str) -> dict[str, float]

Compute natural language inference scores via prompting.

Uses Gemini's generation API with a prompt to classify the relationship between premise and hypothesis.

Parameters:

Name Type Description Default
premise str

Premise text.

required
hypothesis str

Hypothesis text.

required

Returns:

Type Description
dict[str, float]

Dictionary with keys "entailment", "neutral", "contradiction" mapping to probability scores.

togetherai

Together AI adapter for item construction.

This module provides a ModelAdapter implementation for Together AI's API, which provides access to various open-source models. Together AI uses an OpenAI-compatible API, so we use the OpenAI client with a custom base URL.

TogetherAIAdapter

Bases: ModelAdapter

Adapter for Together AI models.

Together AI provides access to various open-source models through an OpenAI-compatible API. This adapter uses the OpenAI client with a custom base URL.

Parameters:

Name Type Description Default
model_name str

Together AI model identifier (default: "meta-llama/Llama-3-70b-chat-hf").

'meta-llama/Llama-3-70b-chat-hf'
api_key str | None

Together AI API key. If None, uses TOGETHER_API_KEY environment variable.

None
cache ModelOutputCache | None

Cache for model outputs. If None, creates in-memory cache.

None
model_version str

Model version for cache tracking (default: "latest").

'latest'

Attributes:

Name Type Description
model_name str

Together AI model identifier (e.g., "meta-llama/Llama-3-70b-chat-hf").

client OpenAI

OpenAI-compatible client configured for Together AI.

Raises:

Type Description
ValueError

If no API key is provided and TOGETHER_API_KEY is not set.

compute_log_probability(text: str) -> float

Compute log probability of text using Together AI API.

Uses the completions API with logprobs to get token-level log probabilities and sums them to get the total log probability.

Parameters:

Name Type Description Default
text str

Text to compute log probability for.

required

Returns:

Type Description
float

Log probability of the text (sum of token log probabilities).

compute_perplexity(text: str) -> float

Compute perplexity of text.

Perplexity is computed as exp(-log_prob / num_tokens).

Parameters:

Name Type Description Default
text str

Text to compute perplexity for.

required

Returns:

Type Description
float

Perplexity of the text (must be positive).

Raises:

Type Description
NotImplementedError

If log probability computation is not supported.

get_embedding(text: str) -> np.ndarray

Get embedding vector for text.

Not supported by Together AI (no embedding-specific models).

Raises:

Type Description
NotImplementedError

Always raised - Together AI does not provide embeddings.

compute_nli(premise: str, hypothesis: str) -> dict[str, float]

Compute natural language inference scores via prompting.

Uses chat completions API with a prompt to classify the relationship between premise and hypothesis.

Parameters:

Name Type Description Default
premise str

Premise text.

required
hypothesis str

Hypothesis text.

required

Returns:

Type Description
dict[str, float]

Dictionary with keys "entailment", "neutral", "contradiction" mapping to probability scores.

sentence_transformers

Sentence transformer adapter for semantic embeddings.

This module provides an adapter for sentence-transformers models, which are optimized for generating sentence embeddings for semantic similarity tasks.

HuggingFaceSentenceTransformer

Bases: ModelAdapter

Adapter for sentence-transformers models.

Supports sentence-transformers models like "all-MiniLM-L6-v2", "all-mpnet-base-v2", etc. These models are optimized for generating sentence embeddings for semantic similarity tasks.

Parameters:

Name Type Description Default
model_name str

Sentence transformer model identifier.

required
cache ModelOutputCache

Cache instance for storing model outputs.

required
device str | None

Device to run model on. If None, uses sentence-transformers default.

None
model_version str

Version string for cache tracking.

'unknown'
normalize_embeddings bool

Whether to normalize embeddings to unit length.

True

Examples:

>>> from pathlib import Path
>>> from bead.items.cache import ModelOutputCache
>>> cache = ModelOutputCache(cache_dir=Path(".cache"))
>>> model = HuggingFaceSentenceTransformer("all-MiniLM-L6-v2", cache)
>>> embedding = model.get_embedding("The cat sat on the mat.")
>>> similarity = model.compute_similarity("The cat sat.", "The dog stood.")

model: SentenceTransformer property

Get the model, loading if necessary.

compute_log_probability(text: str) -> float

Compute log probability of text.

Not supported for sentence transformer models.

Raises:

Type Description
NotImplementedError

Always raised, as sentence transformers don't provide log probabilities.

compute_perplexity(text: str) -> float

Compute perplexity of text.

Not supported for sentence transformer models.

Raises:

Type Description
NotImplementedError

Always raised, as sentence transformers don't provide perplexity.

get_embedding(text: str) -> np.ndarray

Get embedding vector for text.

Uses sentence-transformers encode() method to generate optimized sentence embeddings.

Parameters:

Name Type Description Default
text str

Text to embed.

required

Returns:

Type Description
ndarray

Embedding vector for the text.

compute_nli(premise: str, hypothesis: str) -> dict[str, float]

Compute natural language inference scores.

Not supported for sentence transformer models.

Raises:

Type Description
NotImplementedError

Always raised, as sentence transformers don't support NLI directly.

compute_similarity(text1: str, text2: str) -> float

Compute similarity between two texts.

Uses cosine similarity of embeddings. For sentence transformers, this is optimized as embeddings are already normalized (if normalize_embeddings=True).

Parameters:

Name Type Description Default
text1 str

First text.

required
text2 str

Second text.

required

Returns:

Type Description
float

Similarity score in [-1, 1] (cosine similarity).

registry

Model adapter registry for centralized adapter management.

This module provides a registry for managing all model adapters, both local (HuggingFace) and API-based (OpenAI, Anthropic, etc.).

AdapterKwargs

Bases: TypedDict

Keyword arguments for adapter initialization.

ModelAdapterRegistry

Registry for all model adapters (local and API-based).

Provides centralized management of adapter types and instances, with automatic instance caching to avoid redundant initialization.

Attributes:

Name Type Description
adapters dict[str, type[ModelAdapter]]

Registered adapter classes keyed by adapter type name.

instances dict[str, ModelAdapter]

Cached adapter instances keyed by unique identifier.

register(name: str, adapter_class: type[ModelAdapter]) -> None

Register an adapter class.

Parameters:

Name Type Description Default
name str

Unique name for the adapter type (e.g., "openai", "huggingface_lm").

required
adapter_class type[ModelAdapter]

Adapter class to register (must inherit from ModelAdapter).

required

Raises:

Type Description
ValueError

If adapter class does not inherit from ModelAdapter.

get_adapter(adapter_type: str, model_name: str, **kwargs: Unpack[AdapterKwargs]) -> ModelAdapter

Get or create adapter instance (with caching).

Creates a new adapter instance if not cached, otherwise returns the cached instance. Instances are cached by adapter type and model name.

Parameters:

Name Type Description Default
adapter_type str

Type of adapter (must be registered).

required
model_name str

Model identifier for the adapter.

required
**kwargs Unpack[AdapterKwargs]

Additional keyword arguments to pass to adapter constructor (api_key, device, model_version, embedding_model, etc.).

{}

Returns:

Type Description
ModelAdapter

Adapter instance (cached or newly created).

Raises:

Type Description
ValueError

If adapter type is not registered.

Examples:

>>> registry = ModelAdapterRegistry()
>>> registry.register("openai", OpenAIAdapter)
>>> adapter = registry.get_adapter("openai", "gpt-4", api_key="...")

clear_cache() -> None

Clear all cached adapter instances.

Useful for testing or when you want to force recreation of adapters with different parameters.

list_adapters() -> list[str]

List all registered adapter types.

Returns:

Type Description
list[str]

List of registered adapter type names.

api_utils

Utilities for API-based model adapters.

This module provides shared utilities for API-based model adapters, including retry logic with exponential backoff and rate limiting.

RateLimiter

Rate limiter for API calls.

Tracks call timestamps and enforces a maximum rate of calls per minute. Uses a sliding window algorithm to ensure the rate limit is respected.

Parameters:

Name Type Description Default
calls_per_minute int

Maximum number of calls allowed per minute (default: 60).

60

Attributes:

Name Type Description
calls_per_minute int

Maximum number of calls allowed per minute.

call_times list[float]

Timestamps of recent API calls.

wait_if_needed() -> None

Wait if rate limit would be exceeded.

Checks if making a call now would exceed the rate limit. If so, sleeps until enough time has passed.

retry_with_backoff(max_retries: int = 3, initial_delay: float = 1.0, backoff_factor: float = 2.0, exceptions: tuple[type[Exception], ...] = (Exception,)) -> Callable[[Callable[..., T]], Callable[..., T]]

Decorate function with retry logic and exponential backoff.

Retries a function call on specified exceptions with exponential backoff between attempts. The delay between retries grows exponentially: delay = initial_delay * (backoff_factor ** attempt).

Parameters:

Name Type Description Default
max_retries int

Maximum number of retry attempts (default: 3).

3
initial_delay float

Initial delay in seconds before first retry (default: 1.0).

1.0
backoff_factor float

Multiplicative factor for delay between retries (default: 2.0).

2.0
exceptions tuple[type[Exception], ...]

Tuple of exception types to catch and retry on (default: (Exception,)).

(Exception,)

Returns:

Type Description
Callable

Decorated function with retry logic.

Examples:

>>> @retry_with_backoff(max_retries=3, initial_delay=1.0)
... def call_api():
...     # May raise transient errors
...     return api.get_data()

rate_limit(calls_per_minute: int = 60) -> Callable[[Callable[P, T]], Callable[P, T]]

Decorate function with rate limiting for API calls.

Enforces a maximum rate of API calls per minute using a shared RateLimiter instance. Calls that would exceed the rate limit will block until the limit resets.

Parameters:

Name Type Description Default
calls_per_minute int

Maximum number of calls allowed per minute (default: 60).

60

Returns:

Type Description
Callable

Decorated function with rate limiting.

Examples:

>>> @rate_limit(calls_per_minute=30)
... def call_api():
...     return api.get_data()