bead.resources

Stage 1 of the bead pipeline: lexical items, templates, and constraints.

Lexical Items and Lexicons

lexical_item

Lexical item models for words and multi-word expressions.

Lexical items are the atomic units that fill template slots during sentence generation. The module covers single words, multi-word expressions (MWEs), and the components that make up an MWE.

LexicalItem

Bases: BeadBaseModel

A lexical item with linguistic features.

Follows the UniMorph structure of lemma, surface form, and feature bundle.

Attributes:

Name Type Description
lemma str

Base / citation form (e.g. "walk", "the").

form str | None

Inflected surface form. None means the form equals the lemma.

language_code LanguageCode

ISO 639-3 language code.

features dict[str, JsonValue]

Feature bundle (POS, morphological features, lexical-resource information).

source str | None

Provenance (e.g. "VerbNet", "UniMorph", "manual").

MWEComponent

Bases: LexicalItem

A component of a multi-word expression.

Attributes:

Name Type Description
role str

Role within the MWE (e.g. "verb", "particle").

required bool

Whether the component must be present.

constraints tuple[Constraint, ...]

Component-specific constraints (in addition to base LexicalItem constraints).

MultiWordExpression

Bases: LexicalItem

Multi-word expression as a lexical item.

Attributes:

Name Type Description
components tuple[MWEComponent, ...]

Component lexical items that make up the MWE.

separable bool

Whether components can be separated by intervening words.

adjacency_pattern str | None

DSL expression defining valid adjacency patterns. Variables are component roles plus distance between components.

lexicon

Lexicon management for collections of lexical items.

Provides the Lexicon class for managing, querying, and manipulating collections of lexical items. Supports filtering, searching, merging, and conversion to and from pandas / polars DataFrames.

Lexicon

Bases: BeadBaseModel

A collection of lexical items keyed by their UUIDs.

Items are stored as a tuple; by_id provides O(n) lookup. Mutating methods (with_item, without_item, with_items) return new instances.

Attributes:

Name Type Description
name str

Name of the lexicon.

description str | None

Optional description.

language_code LanguageCode | None

ISO 639-1 or ISO 639-3 language code.

items tuple[LexicalItem, ...]

Items in insertion order.

tags tuple[str, ...]

Categorization tags.

__len__() -> int

Return the number of items in the lexicon.

__iter__() -> Iterator[LexicalItem]

Iterate over the lexicon's items.

__contains__(item_id: UUID) -> bool

Return whether item_id is present.

by_id(item_id: UUID) -> LexicalItem | None

Return the item with the matching UUID, or None.

with_item(item: LexicalItem) -> Self

Return a new lexicon with item appended.

Raises:

Type Description
ValueError

If an item with the same id already exists.

with_items(items: tuple[LexicalItem, ...] | list[LexicalItem]) -> Self

Return a new lexicon with each of items appended.

without_item(item_id: UUID) -> tuple[Self, LexicalItem]

Return (new_lexicon, removed_item) with item_id removed.

Raises:

Type Description
KeyError

If item_id is not present.

filter(predicate: Callable[[LexicalItem], bool]) -> Self

Return a new lexicon containing only items satisfying predicate.

filter_by_pos(pos: str) -> Self

Return items whose features['pos'] equals pos.

filter_by_lemma(lemma: str) -> Self

Return items whose lemma equals lemma.

filter_by_feature(feature_name: str, feature_value: JsonValue) -> Self

Return items whose feature equals feature_value.

filter_by_attribute(attr_name: str, attr_value: JsonValue) -> Self

Alias for :meth:filter_by_feature.

search(query: str, field: str = 'lemma') -> Self

Return a new lexicon with case-insensitive substring matches on field.

Parameters:

Name Type Description Default
query str

Substring to look for.

required
field str

One of "lemma", "pos", or "form".

'lemma'

Raises:

Type Description
ValueError

If field is not one of the supported names.

merge(other: Lexicon, strategy: Literal['keep_first', 'keep_second', 'error'] = 'keep_first') -> Lexicon

Combine self and other into a new lexicon.

Parameters:

Name Type Description Default
other Lexicon

Lexicon to merge into self.

required
strategy Literal['keep_first', 'keep_second', 'error']

Conflict policy when items share an id.

'keep_first'

Raises:

Type Description
ValueError

If strategy="error" and any duplicate ids are present.

to_dataframe(backend: Literal['pandas', 'polars'] = 'pandas') -> DataFrame

Render the lexicon as a pandas or polars DataFrame.

Columns include id, lemma, form, language_code, source, created_at, modified_at, plus a feature_<name> column for every feature key seen across all items.

from_dataframe(df: DataFrame, name: str) -> Lexicon classmethod

Build a lexicon from a pandas or polars DataFrame.

The DataFrame must have a lemma column. Columns named pos, feature_<name>, or attr_<name> populate each item's features dict; language_code, form, and source populate the corresponding fields.

to_jsonl(path: str) -> None

Write the lexicon as JSONLines, one LexicalItem per line.

from_jsonl(path: str, name: str) -> Lexicon classmethod

Read a JSONLines file and return a new lexicon.

Templates and Collections

template

Template and structure models for sentence generation.

Templates contain slots that are filled with lexical items during sentence generation. Templates may be combined into sequences or hierarchical trees.

Slot

Bases: BeadBaseModel

A slot in a template that can be filled with a lexical item.

Attributes:

Name Type Description
name str

Unique name for the slot within the template.

description str | None

Human-readable description.

constraints tuple[Constraint, ...]

Constraints that determine valid fillers.

required bool

Whether the slot must be filled.

default_value str | None

Default string used if the slot is not filled.

Template

Bases: BeadBaseModel

A sentence template with named slots.

Attributes:

Name Type Description
name str

Unique template name.

template_string str

Template body with {slot_name} placeholders.

slots dict[str, Slot]

Slot definitions keyed by slot name.

constraints tuple[Constraint, ...]

Multi-slot constraints (slot names appear as DSL variables).

description str | None

Human-readable description.

language_code LanguageCode | None

ISO 639-1 or 639-3 language code.

tags tuple[str, ...]

Categorization tags.

metadata dict[str, JsonValue]

Additional metadata.

required_slot_names: frozenset[str] property

Names of all slots flagged as required.

fill_with_values(slot_values: dict[str, str], strategy_name: str = 'manual') -> FilledTemplate

Build a FilledTemplate from a mapping of slot names to strings.

Each slot value becomes a minimal LexicalItem whose lemma is the supplied string.

TemplateSequence

Bases: BeadBaseModel

A sequence of templates to be filled together.

Attributes:

Name Type Description
name str

Unique name for the sequence.

templates tuple[Template, ...]

Ordered list of templates.

constraints tuple[Constraint, ...]

Cross-template constraints.

TemplateTree

Bases: BeadBaseModel

A tree of templates, used to model discourse structure.

Attributes:

Name Type Description
name str

Unique tree name.

root Template

Root template.

children tuple[TemplateTree, ...]

Child subtrees.

slots_match_template(template: Template) -> None

Raise ValueError if template's slot dict and string disagree.

Validates that every {slot_name} placeholder has a matching entry in slots, no extraneous slots are defined, and each slot's name matches its dict key.

template_collection

Template collection management.

The TemplateCollection class manages collections of sentence templates.

TemplateCollection

Bases: BeadBaseModel

A collection of templates supporting filtering, search, and merging.

Templates are stored as a tuple in insertion order; mutating methods (with_template, without_template, with_templates) return new instances.

Attributes:

Name Type Description
name str

Collection name.

description str | None

Optional description.

language_code str | None

ISO 639-1 or 639-3 language code.

templates tuple[Template, ...]

Templates in insertion order.

tags tuple[str, ...]

Categorization tags.

__len__() -> int

Return the number of templates in the collection.

__iter__() -> Iterator[Template]

Iterate over the templates.

__contains__(template_id: UUID) -> bool

Return whether a template with template_id is present.

by_id(template_id: UUID) -> Template | None

Return the template with the matching id, or None.

with_template(template: Template) -> Self

Return a new collection with template appended.

with_templates(templates: tuple[Template, ...] | list[Template]) -> Self

Return a new collection with each template appended.

without_template(template_id: UUID) -> tuple[Self, Template]

Return (new_collection, removed_template).

filter(predicate: Callable[[Template], bool]) -> Self

Return a new collection containing only templates matching predicate.

filter_by_tag(tag: str) -> Self

Return a new collection of templates carrying tag.

filter_by_slot_count(count: int) -> Self

Return a new collection of templates with exactly count slots.

search(query: str, field: str = 'name') -> Self

Return a new collection of templates matching query in field.

Parameters:

Name Type Description Default
query str

Substring to search for (case-insensitive).

required
field str

One of "name" or "template_string".

'name'

Raises:

Type Description
ValueError

If field is not a supported name.

merge(other: TemplateCollection, strategy: Literal['keep_first', 'keep_second', 'error'] = 'keep_first') -> TemplateCollection

Combine self and other into a new collection.

Parameters:

Name Type Description Default
other TemplateCollection

Collection to merge with.

required
strategy Literal['keep_first', 'keep_second', 'error']

Conflict policy when templates share an id.

'keep_first'

Raises:

Type Description
ValueError

If strategy="error" and any duplicate ids are present.

to_dataframe(backend: Literal['pandas', 'polars'] = 'pandas') -> DataFrame

Render the collection as a pandas or polars DataFrame.

Columns are id, name, template_string, description, slot_count, slot_names (comma-joined), tags, created_at, modified_at.

from_dataframe(df: DataFrame, name: str) -> TemplateCollection classmethod

Build an empty collection bound to name.

DataFrame ingestion of Template objects requires their slot definitions, which are not present in tabular form. Use :meth:from_jsonl for full deserialization.

to_jsonl(path: str) -> None

Write the collection as JSONLines, one Template per line.

from_jsonl(path: str, name: str) -> TemplateCollection classmethod

Read a JSONLines file and return a collection.

Constraints

constraints

Constraint models for lexical item selection.

Universal constraint model based on DSL expressions. Each constraint is a DSL expression with optional context variables; scope is determined by storage location:

  • Slot.constraints — single-slot constraints (self = slot filler).
  • Template.constraints — multi-slot constraints (slot names as variables).
  • TemplateSequence.constraints — cross-template constraints.

Constraint

Bases: BeadBaseModel

Universal constraint expressed via a DSL expression.

Attributes:

Name Type Description
expression str

DSL expression (must return a boolean) evaluated against the context.

context dict[str, ContextValue]

Context variables available during evaluation. Values are JSON-shaped (scalars, lists, dicts); the DSL evaluator coerces list values into sets when the surrounding expression uses them as a membership test.

description str | None

Optional human-readable description.

combine(*constraints: Constraint, logic: str = 'and') -> Constraint classmethod

Combine multiple constraints into one with AND or OR logic.

Parameters:

Name Type Description Default
*constraints Constraint

Constraints to combine.

()
logic str

"and" or "or".

'and'

Returns:

Type Description
Constraint

New constraint with combined expressions and merged contexts.

Raises:

Type Description
ValueError

If no constraints are provided or logic is invalid.

constraint_builders

Abstract base classes for programmatic constraint generation.

This module provides language-agnostic base classes for building constraints programmatically. Language-specific implementations should extend these bases.

ConstraintBuilder

Bases: ABC

Abstract base class for programmatic constraint generation.

Constraint builders encapsulate logic for generating DSL constraints based on configuration and rules. Subclasses implement specific constraint generation strategies.

Examples:

>>> class NumberAgreementBuilder(ConstraintBuilder):
...     def build(self, *slot_names: str) -> Constraint:
...         # Generate number agreement constraint
...         pairs = []
...         for i, slot1 in enumerate(slot_names):
...             for slot2 in slot_names[i+1:]:
...                 pairs.append(f"{slot1}.number == {slot2}.number")
...         return Constraint(
...             expression=" and ".join(pairs),
...             description=f"Number agreement: {', '.join(slot_names)}"
...         )

build(*args: Any, **kwargs: Any) -> Constraint abstractmethod

Build a Constraint object.

Parameters:

Name Type Description Default
*args Any

Positional arguments (slot names, properties, etc.).

()
**kwargs Any

Keyword arguments (configuration options).

{}

Returns:

Type Description
Constraint

Generated constraint.

AgreementConstraintBuilder

Bases: ConstraintBuilder

Builder for feature agreement constraints.

Generates constraints that enforce feature agreement across slots (e.g., number, gender, case). Supports exact matching or equivalence classes via agreement rules.

Parameters:

Name Type Description Default
feature_name str

Name of the feature to enforce agreement on (e.g., "number", "gender").

required
agreement_rules dict[str, list[str]] | None

Optional equivalence classes. Maps canonical value to list of equivalent values. For example:

None

Examples:

Exact number agreement:

>>> builder = AgreementConstraintBuilder("number")
>>> constraint = builder.build("subject", "verb")
>>> expr = "subject.features.get('number') == verb.features.get('number')"
>>> expr in constraint.expression
True

Agreement with equivalence rules:

>>> rules = {"singular": ["sing", "sg"], "plural": ["pl"]}
>>> builder = AgreementConstraintBuilder("number", agreement_rules=rules)
>>> constraint = builder.build("det", "noun")
>>> "equiv_" in constraint.expression  # Uses equivalence class checks
True

build(*slot_names: str) -> Constraint

Build agreement constraint for given slots.

Parameters:

Name Type Description Default
*slot_names str

Names of slots to enforce agreement between (≥2 required).

()

Returns:

Type Description
Constraint

Agreement constraint.

Raises:

Type Description
ValueError

If fewer than 2 slot names provided.

ConditionalConstraintBuilder

Bases: ConstraintBuilder

Builder for IF-THEN (conditional) constraints.

Generates constraints that enforce requirements when conditions are met. Implements logical implication: IF condition THEN requirement.

Examples:

>>> builder = ConditionalConstraintBuilder()
>>> constraint = builder.build(
...     condition="det.lemma == 'a'",
...     requirement="noun.features.get('number') == 'singular'",
...     description="'a' requires singular noun"
... )
>>> "not (" in constraint.expression  # IF-THEN encoded as: not cond or req
True

build(*, condition: str, requirement: str, description: str | None = None, context: dict[str, Any] | None = None) -> Constraint

Build conditional constraint.

Parameters:

Name Type Description Default
condition str

Condition expression (IF part).

required
requirement str

Requirement expression (THEN part).

required
description str | None

Human-readable description.

None
context dict[str, Any] | None

Context variables for evaluation.

None

Returns:

Type Description
Constraint

Conditional constraint.

Notes

Logical implication (IF A THEN B) is encoded as: (NOT A) OR B

SetMembershipConstraintBuilder

Bases: ConstraintBuilder

Builder for whitelist/blacklist constraints.

Generates constraints that restrict slot properties to allowed values (whitelist) or exclude forbidden values (blacklist).

Parameters:

Name Type Description Default
slot_name str

Name of slot to constrain.

required
property_path str

Dot-separated path to property (e.g., "lemma", "features.number").

required
allowed_values set | None

Whitelist of allowed values (mutually exclusive with forbidden_values).

required
forbidden_values set | None

Blacklist of forbidden values.

required
description str | None

Custom description.

required

Examples:

Whitelist constraint:

>>> builder = SetMembershipConstraintBuilder()
>>> constraint = builder.build(
...     slot_name="verb",
...     property_path="lemma",
...     allowed_values={"walk", "run", "jump"},
...     description="Motion verbs only"
... )
>>> "verb.lemma in allowed_values" in constraint.expression
True

Blacklist constraint:

>>> constraint = builder.build(
...     slot_name="verb",
...     property_path="lemma",
...     forbidden_values={"be", "have"},
...     description="Exclude copula and auxiliary"
... )
>>> "verb.lemma not in forbidden_values" in constraint.expression
True

build(*, slot_name: str, property_path: str, allowed_values: set[str] | None = None, forbidden_values: set[str] | None = None, description: str | None = None) -> Constraint

Build set membership constraint.

Parameters:

Name Type Description Default
slot_name str

Slot to constrain.

required
property_path str

Property path within slot.

required
allowed_values set | None

Whitelist of allowed values.

None
forbidden_values set | None

Blacklist of forbidden values.

None
description str | None

Constraint description.

None

Returns:

Type Description
Constraint

Set membership constraint.

Raises:

Type Description
ValueError

If neither or both of allowed_values/forbidden_values provided.

Resource Loading

loaders

Lexicon loading utilities for various data formats.

This module provides class methods for loading Lexicon objects from various data formats (CSV, TSV) with flexible column mapping.

from_csv(path: str | Path, name: str, *, language_code: LanguageCode, column_mapping: dict[str, str] | None = None, feature_columns: list[str] | None = None, pos: str | None = None, description: str | None = None, **csv_kwargs: Any) -> Lexicon

Load lexicon from CSV file with flexible column mapping.

Parameters:

Name Type Description Default
path str | Path

Path to the CSV file.

required
name str

Name for the lexicon.

required
language_code LanguageCode

ISO 639-3 language code for all items.

required
column_mapping dict[str, str] | None

Mapping from CSV column names to feature names. Example: {"word": "lemma"}

None
feature_columns list[str] | None

CSV column names to include in features dict. Example: ["number", "tense", "countability", "semantic_class"]

None
pos str | None

Part-of-speech tag to assign to all items (e.g., "NOUN", "VERB"). Will be added to features dict as "pos".

None
description str | None

Optional description of the lexicon.

None
**csv_kwargs Any

Additional keyword arguments passed to pandas.read_csv().

{}

Returns:

Type Description
Lexicon

New lexicon loaded from CSV.

Raises:

Type Description
ValueError

If required "lemma" column/mapping is missing.

FileNotFoundError

If CSV file does not exist.

Examples:

>>> lexicon = from_csv(
...     "bleached_nouns.csv",
...     "nouns",
...     language_code="eng",
...     column_mapping={"word": "lemma"},
...     feature_columns=["number", "countability", "semantic_class"],
...     pos="NOUN"
... )

from_tsv(path: str | Path, name: str, *, language_code: LanguageCode, column_mapping: dict[str, str] | None = None, feature_columns: list[str] | None = None, pos: str | None = None, description: str | None = None, **tsv_kwargs: Any) -> Lexicon

Load lexicon from TSV file with flexible column mapping.

This is a convenience wrapper around from_csv() that sets sep="\t".

Parameters:

Name Type Description Default
path str | Path

Path to the TSV file.

required
name str

Name for the lexicon.

required
language_code LanguageCode

ISO 639-3 language code for all items.

required
column_mapping dict[str, str] | None

Mapping from TSV column names to feature names.

None
feature_columns list[str] | None

TSV column names to include in features dict.

None
pos str | None

Part-of-speech tag to assign to all items.

None
description str | None

Optional description of the lexicon.

None
**tsv_kwargs Any

Additional keyword arguments passed to pandas.read_csv().

{}

Returns:

Type Description
Lexicon

New lexicon loaded from TSV.

Examples:

>>> lexicon = from_tsv(
...     "verbs.tsv",
...     "verbs",
...     language_code="eng",
...     feature_columns=["tense", "aspect"],
...     pos="VERB"
... )

template_generation

Abstract base class for mapping external frame inventories to Templates.

This module provides language-agnostic base classes for generating Template objects from external linguistic frame inventories (e.g., VerbNet, FrameNet, PropBank, valency lexicons).

FrameToTemplateMapper

Bases: ABC

Abstract base class for mapping frame inventories to Templates.

This class provides a framework for generating Template objects from external linguistic frame data. Subclasses implement language- and resource-specific mapping logic.

Examples:

Implementing a VerbNet mapper:

>>> class VerbNetMapper(FrameToTemplateMapper):
...     def generate_from_frame(self, verb_lemma, frame_data):
...         slots = self.map_frame_to_slots(frame_data)
...         constraints = self.generate_constraints(frame_data, slots)
...         return Template(
...             name=f"{verb_lemma}_{frame_data['id']}",
...             template_string=frame_data['template_string'],
...             slots=slots,
...             constraints=constraints
...         )
...
...     def map_frame_to_slots(self, frame_data):
...         # Extract slots from VerbNet syntax
...         return {}
...
...     def generate_constraints(self, frame_data, slots):
...         # Generate constraints from VerbNet restrictions
...         return []

generate_from_frame(*args: Any, **kwargs: Any) -> Template | list[Template] abstractmethod

Generate Template(s) from a frame specification.

This is the main entry point for template generation. Subclasses implement the specific logic for their frame inventory.

Parameters:

Name Type Description Default
*args Any

Positional arguments (frame data, identifiers, etc.).

()
**kwargs Any

Keyword arguments (configuration options, etc.).

{}

Returns:

Type Description
Template | list[Template]

Generated template(s). May return multiple templates if the frame has multiple realizations (e.g., different complementizer types, alternations).

Examples:

VerbNet implementation:

>>> mapper.generate_from_frame(
...     verb_lemma="think",
...     verbnet_class="29.9",
...     frame_data={"primary": "NP V that S"}
... )

map_frame_to_slots(frame_data: Any) -> dict[str, Slot] abstractmethod

Map frame elements to Template slots.

Converts frame-specific element descriptions into Slot objects with appropriate constraints.

Parameters:

Name Type Description Default
frame_data Any

Frame specification from the external inventory. Type depends on the specific resource (dict, object, etc.).

required

Returns:

Type Description
dict[str, Slot]

Slots keyed by slot name.

Examples:

Mapping VerbNet syntax to slots:

>>> slots = mapper.map_frame_to_slots({
...     "syntax": [
...         ("NP", "Agent"),
...         ("V", None),
...         ("NP", "Theme")
...     ]
... })
>>> "subject" in slots
True

generate_constraints(frame_data: Any, slots: dict[str, Slot]) -> list[Constraint] abstractmethod

Generate multi-slot constraints from frame specifications.

Converts frame-specific restrictions into DSL Constraint objects that enforce relationships between slots.

Parameters:

Name Type Description Default
frame_data Any

Frame specification from the external inventory.

required
slots dict[str, Slot]

Slots that have been created for this frame.

required

Returns:

Type Description
list[Constraint]

Multi-slot constraints for the template.

Examples:

Generating constraints from VerbNet restrictions:

>>> constraints = mapper.generate_constraints(
...     frame_data={"restrictions": [...]},
...     slots={"subject": ..., "verb": ...}
... )

create_template_name(*identifiers: str, separator: str = '_') -> str

Create a unique template name from identifiers.

Utility method for generating consistent template names. Sanitizes identifiers by replacing spaces, dots, and hyphens.

Parameters:

Name Type Description Default
*identifiers str

Components to include in the name (e.g., verb, class, frame).

()
separator str

Separator between components (default: "_").

'_'

Returns:

Type Description
str

Sanitized template name.

Examples:

>>> mapper = ConcreteMapper()
>>> mapper.create_template_name("think", "29.9", "that-clause")
'think_29_9_that_clause'

create_template_metadata(frame_data: dict[str, Any], **additional_metadata: Any) -> dict[str, Any]

Create metadata dictionary for template.

Utility method for extracting and organizing frame metadata. Subclasses can override to add resource-specific metadata.

Parameters:

Name Type Description Default
frame_data dict[str, Any]

Frame specification from the external inventory.

required
**additional_metadata Any

Additional metadata to include.

{}

Returns:

Type Description
dict[str, Any]

Metadata dictionary for Template.metadata field.

Examples:

>>> mapper = ConcreteMapper()
>>> metadata = mapper.create_template_metadata(
...     frame_data={"id": "29.9-1", "examples": [...]},
...     verb_lemma="think"
... )

MultiFrameMapper

Bases: FrameToTemplateMapper

Mapper that generates multiple template variants from a single frame.

Some frame specifications support multiple realizations (e.g., different complementizer types, voice alternations). This class provides a framework for generating all variants.

Examples:

>>> class ClausalMapper(MultiFrameMapper):
...     def get_frame_variants(self, frame_data):
...         # Return list of variant specifications
...         return [
...             {"comp": "that", "mood": "declarative"},
...             {"comp": "whether", "mood": "interrogative"},
...         ]
...
...     def generate_from_frame(self, verb, frame_data):
...         variants = self.get_frame_variants(frame_data)
...         return [self._generate_variant(verb, v) for v in variants]
...
...     def map_frame_to_slots(self, frame_data):
...         return {}
...
...     def generate_constraints(self, frame_data, slots):
...         return []

get_frame_variants(frame_data: Any) -> list[Any] abstractmethod

Extract all variants from frame specification.

Parameters:

Name Type Description Default
frame_data Any

Frame specification from the external inventory.

required

Returns:

Type Description
list[Any]

List of variant specifications, each representing one possible realization of the frame.

Examples:

>>> variants = mapper.get_frame_variants({
...     "complementizers": ["that", "whether", "if"]
... })
>>> len(variants)
3

generate_from_frame(*args: Any, **kwargs: Any) -> list[Template]

Generate templates for all frame variants.

Default implementation calls get_frame_variants() and generates a template for each variant. Subclasses can override for custom logic.

Parameters:

Name Type Description Default
*args Any

Positional arguments passed to variant generation.

()
**kwargs Any

Keyword arguments passed to variant generation.

{}

Returns:

Type Description
list[Template]

Templates for all variants.

Classification

classification

Linguistic classification models for lexical items and templates.

Models for grouping lexical items and templates by linguistic properties. LexicalItemClass and TemplateClass support cross-linguistic analysis and alignment.

LexicalItemClass

Bases: BeadBaseModel

A group of lexical items sharing a linguistic property.

Items are stored as a tuple in insertion order; mutating methods (with_item, without_item) return new instances.

Attributes:

Name Type Description
name str

Class name.

description str | None

Description of the classification.

property_name str

The linguistic property defining the class.

property_value JsonValue

Specific value of the property.

items tuple[LexicalItem, ...]

Member items in insertion order.

tags tuple[str, ...]

Categorization tags.

class_metadata dict[str, JsonValue]

Additional metadata.

__len__() -> int

Return the number of items in the class.

__contains__(item_id: UUID) -> bool

Return whether an item with item_id is present.

__iter__() -> Iterator[LexicalItem]

Iterate over class members.

by_id(item_id: UUID) -> LexicalItem | None

Return the item with the matching id, or None.

with_item(item: LexicalItem) -> Self

Return a new class with item appended.

without_item(item_id: UUID) -> tuple[Self, LexicalItem]

Return (new_class, removed_item).

languages() -> frozenset[str]

Return the set of language codes (lowercased) present in the class.

get_items_by_language(language_code: str) -> tuple[LexicalItem, ...]

Return items whose language code matches language_code.

Codes are normalized via validate_iso639_code before comparison.

is_monolingual() -> bool

Return whether the class spans at most one language.

is_multilingual() -> bool

Return whether the class spans more than one language.

TemplateClass

Bases: BeadBaseModel

A group of templates sharing a linguistic property.

Attributes:

Name Type Description
name str

Class name.

description str | None

Description.

property_name str

Defining linguistic property.

property_value JsonValue

Specific property value.

templates tuple[Template, ...]

Member templates in insertion order.

tags tuple[str, ...]

Categorization tags.

class_metadata dict[str, JsonValue]

Additional metadata.

__len__() -> int

Return the number of templates in the class.

__contains__(template_id: UUID) -> bool

Return whether a template with template_id is present.

__iter__() -> Iterator[Template]

Iterate over the templates in the class.

by_id(template_id: UUID) -> Template | None

Return the template with the matching id, or None.

with_template(template: Template) -> Self

Return a new class with template appended.

without_template(template_id: UUID) -> tuple[Self, Template]

Return (new_class, removed_template).

languages() -> frozenset[str]

Return the set of language codes (lowercased) in the class.

get_templates_by_language(language_code: str) -> tuple[Template, ...]

Return templates whose language code matches language_code.

is_monolingual() -> bool

Return whether the class spans at most one language.

is_multilingual() -> bool

Return whether the class spans more than one language.

Resource Adapters

base

Abstract base class for external resource adapters.

This module defines the interface that all resource adapters must implement to fetch lexical items from external linguistic databases.

ResourceAdapter

Bases: ABC

Abstract base class for external resource adapters.

Resource adapters fetch lexical items from external linguistic databases and convert them to the bead LexicalItem format. All adapters must implement language_code filtering to support multi-language workflows.

Subclasses must implement: - fetch_items(): Retrieve items from the external resource - is_available(): Check if the external resource is accessible

Examples:

>>> class MyAdapter(ResourceAdapter):
...     def fetch_items(self, query=None, language_code=None, **kwargs):
...         # Fetch from external resource
...         return [LexicalItem(lemma="walk", pos="VERB", language_code="en")]
...     def is_available(self):
...         return True
>>> adapter = MyAdapter()
>>> items = adapter.fetch_items(query="walk", language_code="en")
>>> len(items) > 0
True

fetch_items(query: str | None = None, language_code: LanguageCode = None, **kwargs: Any) -> list[LexicalItem] abstractmethod

Fetch lexical items from external resource.

Parameters:

Name Type Description Default
query str | None

Query string in adapter-specific format (e.g., lemma, predicate name, class identifier). If None, behavior is adapter-specific (may return all items, raise error, or use default query).

None
language_code LanguageCode

ISO 639-1 (2-letter) or ISO 639-3 (3-letter) language code to filter results. Examples: "en", "eng", "ko", "kor". If None, returns items for all available languages.

None
**kwargs Any

Additional adapter-specific parameters (e.g., pos="VERB", resource="verbnet", include_features=True).

{}

Returns:

Type Description
list[LexicalItem]

Lexical items fetched from the external resource. Each item should have language_code set if known.

Raises:

Type Description
ValueError

If query is invalid or required parameters are missing.

RuntimeError

If the external resource is unavailable or the request fails.

Examples:

>>> adapter = MyAdapter()
>>> items = adapter.fetch_items(query="break", language_code="en")
>>> all(item.language_code == "en" for item in items)
True

is_available() -> bool abstractmethod

Check if the external resource is available.

This method should verify that the external resource can be accessed, whether via installed packages, accessible data files, or network APIs.

Returns:

Type Description
bool

True if the resource can be accessed, False otherwise.

Examples:

>>> adapter = MyAdapter()
>>> adapter.is_available()
True

glazing

Adapter for glazing package (VerbNet, PropBank, FrameNet).

This module provides an adapter to fetch lexical items from VerbNet, PropBank, and FrameNet via the glazing package using the proper loader classes.

GlazingAdapter

Bases: ResourceAdapter

Adapter for glazing package (VerbNet, PropBank, FrameNet).

This adapter fetches verb frame information from VerbNet, PropBank, or FrameNet and converts it to LexicalItem format. Frame information is stored in the attributes field.

Parameters:

Name Type Description Default
resource Literal['verbnet', 'propbank', 'framenet']

Which glazing resource to use.

'verbnet'
cache AdapterCache | None

Optional cache instance. If None, no caching is performed.

None

Examples:

>>> adapter = GlazingAdapter(resource="verbnet")
>>> items = adapter.fetch_items(query="break", language_code="en")
>>> all(item.language_code == "en" for item in items)
True

fetch_items(query: str | None = None, language_code: LanguageCode = None, **kwargs: Any) -> list[LexicalItem]

Fetch items from glazing resource.

Parameters:

Name Type Description Default
query str | None

Lemma or predicate to query (e.g., "break", "run"). If None, fetches ALL items from the resource.

None
language_code LanguageCode

Language code filter. Glazing resources are primarily English, so language_code="en" is typical. Other languages may not be supported.

None
**kwargs Any

Additional parameters: - include_frames (bool): Include detailed frame information (syntax, examples, descriptions). Default: False.

{}

Returns:

Type Description
list[LexicalItem]

Lexical items with frame information in attributes.

Raises:

Type Description
RuntimeError

If glazing resource access fails.

Examples:

>>> # Query specific verb
>>> adapter = GlazingAdapter(resource="verbnet")
>>> items = adapter.fetch_items(query="break", language_code="en")
>>> len(items) > 0
True
>>> # Fetch all items from resource
>>> all_items = adapter.fetch_items(query=None, language_code="en")
>>> len(all_items) > 100
True
>>> # Include detailed frame information
>>> items = adapter.fetch_items(
...     query="break", language_code="en", include_frames=True
... )
>>> "frames" in items[0].attributes
True

is_available() -> bool

Check if glazing package is available.

Returns:

Type Description
bool

True if glazing can be imported and data is initialized, False otherwise.

Examples:

>>> adapter = GlazingAdapter()
>>> adapter.is_available()
True

unimorph

Adapter for UniMorph morphological paradigms.

This module provides an adapter to fetch morphological paradigms from UniMorph data and convert them to LexicalItem format with morphological features.

UniMorphAdapter

Bases: ResourceAdapter

Adapter for UniMorph morphological paradigms.

This adapter fetches morphological paradigms from UniMorph and converts them to LexicalItem format. Morphological features are stored in the features field using UniMorph feature schema.

Parameters:

Name Type Description Default
cache AdapterCache | None

Optional cache instance. If None, no caching is performed.

None

Examples:

>>> adapter = UniMorphAdapter()
>>> items = adapter.fetch_items(query="walk", language_code="en")
>>> all(item.language_code == "en" for item in items)
True
>>> all("tense" in item.features for item in items if item.features)
True

__init__(cache: AdapterCache | None = None) -> None

Initialize UniMorph adapter.

Parameters:

Name Type Description Default
cache AdapterCache | None

Optional cache instance.

None

fetch_items(query: str | None = None, language_code: LanguageCode = None, **kwargs: Any) -> list[LexicalItem]

Fetch morphological paradigms from UniMorph.

Parameters:

Name Type Description Default
query str | None

Lemma to query (e.g., "walk", "먹다", "hamba").

None
language_code LanguageCode

Required language code (e.g., "en", "ko", "zu"). UniMorph is organized by language, so this parameter is essential.

None
**kwargs Any

Additional parameters (e.g., pos="VERB").

{}

Returns:

Type Description
list[LexicalItem]

Lexical items representing inflected forms with morphological features in the features field.

Raises:

Type Description
ValueError

If language_code is None (required for UniMorph).

RuntimeError

If UniMorph access fails.

Examples:

>>> adapter = UniMorphAdapter()
>>> items = adapter.fetch_items(query="walk", language_code="en")
>>> len(items) > 0
True
>>> items[0].features.get("pos") == "VERB"
True

is_available() -> bool

Check if UniMorph package is available.

Returns:

Type Description
bool

True if unimorph can be imported and accessed, False otherwise.

Examples:

>>> adapter = UniMorphAdapter()
>>> adapter.is_available()
True

cache

Caching for adapter fetch results.

This module provides an in-memory cache to avoid redundant fetches from external resources when the same query is repeated.

AdapterCache

In-memory cache for adapter fetch results.

The cache stores results keyed by a hash of query parameters. This avoids redundant fetches when the same query is made multiple times.

Examples:

>>> cache = AdapterCache()
>>> items = [LexicalItem(lemma="walk", pos="VERB")]
>>> key = cache.make_key("glazing", query="walk", language_code="en")
>>> cache.set(key, items)
>>> cached = cache.get(key)
>>> cached == items
True

get(key: str) -> list[LexicalItem] | None

Get cached result.

Parameters:

Name Type Description Default
key str

Cache key generated by make_key().

required

Returns:

Type Description
list[LexicalItem] | None

Cached items if key exists, None otherwise.

Examples:

>>> cache = AdapterCache()
>>> cache.get("nonexistent")
None

set(key: str, items: list[LexicalItem]) -> None

Cache result.

Parameters:

Name Type Description Default
key str

Cache key generated by make_key().

required
items list[LexicalItem]

Items to cache.

required

Examples:

>>> cache = AdapterCache()
>>> items = [LexicalItem(lemma="walk")]
>>> cache.set("key1", items)
>>> cache.get("key1") == items
True

clear() -> None

Clear entire cache.

Examples:

>>> cache = AdapterCache()
>>> cache.set("key1", [])
>>> cache.clear()
>>> cache.get("key1")
None

make_key(adapter_name: str, query: str | None = None, **kwargs: Any) -> str

Generate cache key from query parameters.

Create a deterministic hash key from adapter name, query, and additional parameters. Same inputs always produce same key.

Parameters:

Name Type Description Default
adapter_name str

Name of the adapter (e.g., "glazing", "unimorph").

required
query str | None

Query string.

None
**kwargs Any

Additional query parameters (e.g., language_code, pos).

{}

Returns:

Type Description
str

Cache key (hexadecimal hash string).

Examples:

>>> cache = AdapterCache()
>>> key1 = cache.make_key("glazing", query="walk", language_code="en")
>>> key2 = cache.make_key("glazing", query="walk", language_code="en")
>>> key1 == key2
True
>>> key3 = cache.make_key("glazing", query="run", language_code="en")
>>> key1 != key3
True

registry

Registry for managing resource adapters.

This module provides a registry for discovering and instantiating adapters by name.

AdapterRegistry

Registry for managing resource adapters.

The registry allows adapters to be registered by name and retrieved with custom initialization parameters.

Examples:

>>> from bead.resources.adapters.glazing import GlazingAdapter
>>> registry = AdapterRegistry()
>>> registry.register("glazing", GlazingAdapter)
>>> adapter = registry.get("glazing", resource="verbnet")
>>> isinstance(adapter, GlazingAdapter)
True

register(name: str, adapter_class: type[ResourceAdapter]) -> None

Register an adapter class.

Parameters:

Name Type Description Default
name str

Adapter name (e.g., "glazing", "unimorph").

required
adapter_class type[ResourceAdapter]

Adapter class (not instance) that subclasses ResourceAdapter.

required

Raises:

Type Description
ValueError

If name is empty or adapter_class is not a ResourceAdapter subclass.

Examples:

>>> from bead.resources.adapters.glazing import GlazingAdapter
>>> registry = AdapterRegistry()
>>> registry.register("glazing", GlazingAdapter)
>>> "glazing" in registry.list_available()
True

get(name: str, **kwargs: Any) -> ResourceAdapter

Get adapter instance by name.

Parameters:

Name Type Description Default
name str

Adapter name (must be registered).

required
**kwargs Any

Arguments passed to adapter constructor.

{}

Returns:

Type Description
ResourceAdapter

Adapter instance.

Raises:

Type Description
KeyError

If adapter name is not registered.

Examples:

>>> from bead.resources.adapters.glazing import GlazingAdapter
>>> registry = AdapterRegistry()
>>> registry.register("glazing", GlazingAdapter)
>>> adapter = registry.get("glazing", resource="verbnet")
>>> adapter.resource
'verbnet'

list_available() -> list[str]

List names of available adapters.

Returns:

Type Description
list[str]

Sorted list of registered adapter names.

Examples:

>>> registry = AdapterRegistry()
>>> registry.list_available()
[]
>>> from bead.resources.adapters.glazing import GlazingAdapter
>>> registry.register("glazing", GlazingAdapter)
>>> registry.list_available()
['glazing']