bead.resources¶
Stage 1 of the bead pipeline: lexical items, templates, and constraints.
Lexical Items and Lexicons¶
lexical_item
¶
Lexical item models for words and multi-word expressions.
Lexical items are the atomic units that fill template slots during sentence generation. The module covers single words, multi-word expressions (MWEs), and the components that make up an MWE.
LexicalItem
¶
Bases: BeadBaseModel
A lexical item with linguistic features.
Follows the UniMorph structure of lemma, surface form, and feature bundle.
Attributes:
| Name | Type | Description |
|---|---|---|
lemma |
str
|
Base / citation form (e.g. |
form |
str | None
|
Inflected surface form. |
language_code |
LanguageCode
|
ISO 639-3 language code. |
features |
dict[str, JsonValue]
|
Feature bundle (POS, morphological features, lexical-resource information). |
source |
str | None
|
Provenance (e.g. |
MWEComponent
¶
Bases: LexicalItem
A component of a multi-word expression.
Attributes:
| Name | Type | Description |
|---|---|---|
role |
str
|
Role within the MWE (e.g. |
required |
bool
|
Whether the component must be present. |
constraints |
tuple[Constraint, ...]
|
Component-specific constraints (in addition to base
|
MultiWordExpression
¶
Bases: LexicalItem
Multi-word expression as a lexical item.
Attributes:
| Name | Type | Description |
|---|---|---|
components |
tuple[MWEComponent, ...]
|
Component lexical items that make up the MWE. |
separable |
bool
|
Whether components can be separated by intervening words. |
adjacency_pattern |
str | None
|
DSL expression defining valid adjacency patterns. Variables are
component roles plus |
lexicon
¶
Lexicon management for collections of lexical items.
Provides the Lexicon class for managing, querying, and manipulating
collections of lexical items. Supports filtering, searching, merging, and
conversion to and from pandas / polars DataFrames.
Lexicon
¶
Bases: BeadBaseModel
A collection of lexical items keyed by their UUIDs.
Items are stored as a tuple; by_id provides O(n) lookup. Mutating
methods (with_item, without_item, with_items) return new
instances.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Name of the lexicon. |
description |
str | None
|
Optional description. |
language_code |
LanguageCode | None
|
ISO 639-1 or ISO 639-3 language code. |
items |
tuple[LexicalItem, ...]
|
Items in insertion order. |
tags |
tuple[str, ...]
|
Categorization tags. |
__len__() -> int
¶
Return the number of items in the lexicon.
__iter__() -> Iterator[LexicalItem]
¶
Iterate over the lexicon's items.
__contains__(item_id: UUID) -> bool
¶
Return whether item_id is present.
by_id(item_id: UUID) -> LexicalItem | None
¶
Return the item with the matching UUID, or None.
with_item(item: LexicalItem) -> Self
¶
Return a new lexicon with item appended.
Raises:
| Type | Description |
|---|---|
ValueError
|
If an item with the same id already exists. |
with_items(items: tuple[LexicalItem, ...] | list[LexicalItem]) -> Self
¶
Return a new lexicon with each of items appended.
without_item(item_id: UUID) -> tuple[Self, LexicalItem]
¶
Return (new_lexicon, removed_item) with item_id removed.
Raises:
| Type | Description |
|---|---|
KeyError
|
If item_id is not present. |
filter(predicate: Callable[[LexicalItem], bool]) -> Self
¶
Return a new lexicon containing only items satisfying predicate.
filter_by_pos(pos: str) -> Self
¶
Return items whose features['pos'] equals pos.
filter_by_lemma(lemma: str) -> Self
¶
Return items whose lemma equals lemma.
filter_by_feature(feature_name: str, feature_value: JsonValue) -> Self
¶
Return items whose feature equals feature_value.
filter_by_attribute(attr_name: str, attr_value: JsonValue) -> Self
¶
Alias for :meth:filter_by_feature.
search(query: str, field: str = 'lemma') -> Self
¶
Return a new lexicon with case-insensitive substring matches on field.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Substring to look for. |
required |
field
|
str
|
One of |
'lemma'
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If field is not one of the supported names. |
merge(other: Lexicon, strategy: Literal['keep_first', 'keep_second', 'error'] = 'keep_first') -> Lexicon
¶
Combine self and other into a new lexicon.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
Lexicon
|
Lexicon to merge into self. |
required |
strategy
|
Literal['keep_first', 'keep_second', 'error']
|
Conflict policy when items share an id. |
'keep_first'
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
to_dataframe(backend: Literal['pandas', 'polars'] = 'pandas') -> DataFrame
¶
Render the lexicon as a pandas or polars DataFrame.
Columns include id, lemma, form, language_code,
source, created_at, modified_at, plus a
feature_<name> column for every feature key seen across all
items.
from_dataframe(df: DataFrame, name: str) -> Lexicon
classmethod
¶
Build a lexicon from a pandas or polars DataFrame.
The DataFrame must have a lemma column. Columns named pos,
feature_<name>, or attr_<name> populate each item's
features dict; language_code, form, and source
populate the corresponding fields.
to_jsonl(path: str) -> None
¶
Write the lexicon as JSONLines, one LexicalItem per line.
from_jsonl(path: str, name: str) -> Lexicon
classmethod
¶
Read a JSONLines file and return a new lexicon.
Templates and Collections¶
template
¶
Template and structure models for sentence generation.
Templates contain slots that are filled with lexical items during sentence generation. Templates may be combined into sequences or hierarchical trees.
Slot
¶
Bases: BeadBaseModel
A slot in a template that can be filled with a lexical item.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique name for the slot within the template. |
description |
str | None
|
Human-readable description. |
constraints |
tuple[Constraint, ...]
|
Constraints that determine valid fillers. |
required |
bool
|
Whether the slot must be filled. |
default_value |
str | None
|
Default string used if the slot is not filled. |
Template
¶
Bases: BeadBaseModel
A sentence template with named slots.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique template name. |
template_string |
str
|
Template body with |
slots |
dict[str, Slot]
|
Slot definitions keyed by slot name. |
constraints |
tuple[Constraint, ...]
|
Multi-slot constraints (slot names appear as DSL variables). |
description |
str | None
|
Human-readable description. |
language_code |
LanguageCode | None
|
ISO 639-1 or 639-3 language code. |
tags |
tuple[str, ...]
|
Categorization tags. |
metadata |
dict[str, JsonValue]
|
Additional metadata. |
required_slot_names: frozenset[str]
property
¶
Names of all slots flagged as required.
fill_with_values(slot_values: dict[str, str], strategy_name: str = 'manual') -> FilledTemplate
¶
Build a FilledTemplate from a mapping of slot names to strings.
Each slot value becomes a minimal LexicalItem whose lemma is
the supplied string.
TemplateSequence
¶
Bases: BeadBaseModel
A sequence of templates to be filled together.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique name for the sequence. |
templates |
tuple[Template, ...]
|
Ordered list of templates. |
constraints |
tuple[Constraint, ...]
|
Cross-template constraints. |
TemplateTree
¶
Bases: BeadBaseModel
A tree of templates, used to model discourse structure.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique tree name. |
root |
Template
|
Root template. |
children |
tuple[TemplateTree, ...]
|
Child subtrees. |
slots_match_template(template: Template) -> None
¶
Raise ValueError if template's slot dict and string disagree.
Validates that every {slot_name} placeholder has a matching entry
in slots, no extraneous slots are defined, and each slot's name
matches its dict key.
template_collection
¶
Template collection management.
The TemplateCollection class manages collections of sentence templates.
TemplateCollection
¶
Bases: BeadBaseModel
A collection of templates supporting filtering, search, and merging.
Templates are stored as a tuple in insertion order; mutating methods
(with_template, without_template, with_templates) return new
instances.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Collection name. |
description |
str | None
|
Optional description. |
language_code |
str | None
|
ISO 639-1 or 639-3 language code. |
templates |
tuple[Template, ...]
|
Templates in insertion order. |
tags |
tuple[str, ...]
|
Categorization tags. |
__len__() -> int
¶
Return the number of templates in the collection.
__iter__() -> Iterator[Template]
¶
Iterate over the templates.
__contains__(template_id: UUID) -> bool
¶
Return whether a template with template_id is present.
by_id(template_id: UUID) -> Template | None
¶
Return the template with the matching id, or None.
with_template(template: Template) -> Self
¶
Return a new collection with template appended.
with_templates(templates: tuple[Template, ...] | list[Template]) -> Self
¶
Return a new collection with each template appended.
without_template(template_id: UUID) -> tuple[Self, Template]
¶
Return (new_collection, removed_template).
filter(predicate: Callable[[Template], bool]) -> Self
¶
Return a new collection containing only templates matching predicate.
filter_by_tag(tag: str) -> Self
¶
Return a new collection of templates carrying tag.
filter_by_slot_count(count: int) -> Self
¶
Return a new collection of templates with exactly count slots.
search(query: str, field: str = 'name') -> Self
¶
Return a new collection of templates matching query in field.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Substring to search for (case-insensitive). |
required |
field
|
str
|
One of |
'name'
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If field is not a supported name. |
merge(other: TemplateCollection, strategy: Literal['keep_first', 'keep_second', 'error'] = 'keep_first') -> TemplateCollection
¶
Combine self and other into a new collection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
TemplateCollection
|
Collection to merge with. |
required |
strategy
|
Literal['keep_first', 'keep_second', 'error']
|
Conflict policy when templates share an id. |
'keep_first'
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
to_dataframe(backend: Literal['pandas', 'polars'] = 'pandas') -> DataFrame
¶
Render the collection as a pandas or polars DataFrame.
Columns are id, name, template_string, description,
slot_count, slot_names (comma-joined), tags,
created_at, modified_at.
from_dataframe(df: DataFrame, name: str) -> TemplateCollection
classmethod
¶
Build an empty collection bound to name.
DataFrame ingestion of Template objects requires their slot
definitions, which are not present in tabular form. Use
:meth:from_jsonl for full deserialization.
to_jsonl(path: str) -> None
¶
Write the collection as JSONLines, one Template per line.
from_jsonl(path: str, name: str) -> TemplateCollection
classmethod
¶
Read a JSONLines file and return a collection.
Constraints¶
constraints
¶
Constraint models for lexical item selection.
Universal constraint model based on DSL expressions. Each constraint is a DSL expression with optional context variables; scope is determined by storage location:
Slot.constraints— single-slot constraints (self= slot filler).Template.constraints— multi-slot constraints (slot names as variables).TemplateSequence.constraints— cross-template constraints.
Constraint
¶
Bases: BeadBaseModel
Universal constraint expressed via a DSL expression.
Attributes:
| Name | Type | Description |
|---|---|---|
expression |
str
|
DSL expression (must return a boolean) evaluated against the context. |
context |
dict[str, ContextValue]
|
Context variables available during evaluation. Values are JSON-shaped (scalars, lists, dicts); the DSL evaluator coerces list values into sets when the surrounding expression uses them as a membership test. |
description |
str | None
|
Optional human-readable description. |
combine(*constraints: Constraint, logic: str = 'and') -> Constraint
classmethod
¶
Combine multiple constraints into one with AND or OR logic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*constraints
|
Constraint
|
Constraints to combine. |
()
|
logic
|
str
|
|
'and'
|
Returns:
| Type | Description |
|---|---|
Constraint
|
New constraint with combined expressions and merged contexts. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no constraints are provided or logic is invalid. |
constraint_builders
¶
Abstract base classes for programmatic constraint generation.
This module provides language-agnostic base classes for building constraints programmatically. Language-specific implementations should extend these bases.
ConstraintBuilder
¶
Bases: ABC
Abstract base class for programmatic constraint generation.
Constraint builders encapsulate logic for generating DSL constraints based on configuration and rules. Subclasses implement specific constraint generation strategies.
Examples:
>>> class NumberAgreementBuilder(ConstraintBuilder):
... def build(self, *slot_names: str) -> Constraint:
... # Generate number agreement constraint
... pairs = []
... for i, slot1 in enumerate(slot_names):
... for slot2 in slot_names[i+1:]:
... pairs.append(f"{slot1}.number == {slot2}.number")
... return Constraint(
... expression=" and ".join(pairs),
... description=f"Number agreement: {', '.join(slot_names)}"
... )
build(*args: Any, **kwargs: Any) -> Constraint
abstractmethod
¶
Build a Constraint object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*args
|
Any
|
Positional arguments (slot names, properties, etc.). |
()
|
**kwargs
|
Any
|
Keyword arguments (configuration options). |
{}
|
Returns:
| Type | Description |
|---|---|
Constraint
|
Generated constraint. |
AgreementConstraintBuilder
¶
Bases: ConstraintBuilder
Builder for feature agreement constraints.
Generates constraints that enforce feature agreement across slots (e.g., number, gender, case). Supports exact matching or equivalence classes via agreement rules.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
feature_name
|
str
|
Name of the feature to enforce agreement on (e.g., "number", "gender"). |
required |
agreement_rules
|
dict[str, list[str]] | None
|
Optional equivalence classes. Maps canonical value to list of equivalent values. For example: |
None
|
Examples:
Exact number agreement:
>>> builder = AgreementConstraintBuilder("number")
>>> constraint = builder.build("subject", "verb")
>>> expr = "subject.features.get('number') == verb.features.get('number')"
>>> expr in constraint.expression
True
Agreement with equivalence rules:
>>> rules = {"singular": ["sing", "sg"], "plural": ["pl"]}
>>> builder = AgreementConstraintBuilder("number", agreement_rules=rules)
>>> constraint = builder.build("det", "noun")
>>> "equiv_" in constraint.expression # Uses equivalence class checks
True
build(*slot_names: str) -> Constraint
¶
Build agreement constraint for given slots.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*slot_names
|
str
|
Names of slots to enforce agreement between (≥2 required). |
()
|
Returns:
| Type | Description |
|---|---|
Constraint
|
Agreement constraint. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If fewer than 2 slot names provided. |
ConditionalConstraintBuilder
¶
Bases: ConstraintBuilder
Builder for IF-THEN (conditional) constraints.
Generates constraints that enforce requirements when conditions are met. Implements logical implication: IF condition THEN requirement.
Examples:
>>> builder = ConditionalConstraintBuilder()
>>> constraint = builder.build(
... condition="det.lemma == 'a'",
... requirement="noun.features.get('number') == 'singular'",
... description="'a' requires singular noun"
... )
>>> "not (" in constraint.expression # IF-THEN encoded as: not cond or req
True
build(*, condition: str, requirement: str, description: str | None = None, context: dict[str, Any] | None = None) -> Constraint
¶
Build conditional constraint.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
condition
|
str
|
Condition expression (IF part). |
required |
requirement
|
str
|
Requirement expression (THEN part). |
required |
description
|
str | None
|
Human-readable description. |
None
|
context
|
dict[str, Any] | None
|
Context variables for evaluation. |
None
|
Returns:
| Type | Description |
|---|---|
Constraint
|
Conditional constraint. |
Notes
Logical implication (IF A THEN B) is encoded as: (NOT A) OR B
SetMembershipConstraintBuilder
¶
Bases: ConstraintBuilder
Builder for whitelist/blacklist constraints.
Generates constraints that restrict slot properties to allowed values (whitelist) or exclude forbidden values (blacklist).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
slot_name
|
str
|
Name of slot to constrain. |
required |
property_path
|
str
|
Dot-separated path to property (e.g., "lemma", "features.number"). |
required |
allowed_values
|
set | None
|
Whitelist of allowed values (mutually exclusive with forbidden_values). |
required |
forbidden_values
|
set | None
|
Blacklist of forbidden values. |
required |
description
|
str | None
|
Custom description. |
required |
Examples:
Whitelist constraint:
>>> builder = SetMembershipConstraintBuilder()
>>> constraint = builder.build(
... slot_name="verb",
... property_path="lemma",
... allowed_values={"walk", "run", "jump"},
... description="Motion verbs only"
... )
>>> "verb.lemma in allowed_values" in constraint.expression
True
Blacklist constraint:
>>> constraint = builder.build(
... slot_name="verb",
... property_path="lemma",
... forbidden_values={"be", "have"},
... description="Exclude copula and auxiliary"
... )
>>> "verb.lemma not in forbidden_values" in constraint.expression
True
build(*, slot_name: str, property_path: str, allowed_values: set[str] | None = None, forbidden_values: set[str] | None = None, description: str | None = None) -> Constraint
¶
Build set membership constraint.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
slot_name
|
str
|
Slot to constrain. |
required |
property_path
|
str
|
Property path within slot. |
required |
allowed_values
|
set | None
|
Whitelist of allowed values. |
None
|
forbidden_values
|
set | None
|
Blacklist of forbidden values. |
None
|
description
|
str | None
|
Constraint description. |
None
|
Returns:
| Type | Description |
|---|---|
Constraint
|
Set membership constraint. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If neither or both of allowed_values/forbidden_values provided. |
Resource Loading¶
loaders
¶
Lexicon loading utilities for various data formats.
This module provides class methods for loading Lexicon objects from various data formats (CSV, TSV) with flexible column mapping.
from_csv(path: str | Path, name: str, *, language_code: LanguageCode, column_mapping: dict[str, str] | None = None, feature_columns: list[str] | None = None, pos: str | None = None, description: str | None = None, **csv_kwargs: Any) -> Lexicon
¶
Load lexicon from CSV file with flexible column mapping.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to the CSV file. |
required |
name
|
str
|
Name for the lexicon. |
required |
language_code
|
LanguageCode
|
ISO 639-3 language code for all items. |
required |
column_mapping
|
dict[str, str] | None
|
Mapping from CSV column names to feature names. Example: {"word": "lemma"} |
None
|
feature_columns
|
list[str] | None
|
CSV column names to include in features dict. Example: ["number", "tense", "countability", "semantic_class"] |
None
|
pos
|
str | None
|
Part-of-speech tag to assign to all items (e.g., "NOUN", "VERB"). Will be added to features dict as "pos". |
None
|
description
|
str | None
|
Optional description of the lexicon. |
None
|
**csv_kwargs
|
Any
|
Additional keyword arguments passed to pandas.read_csv(). |
{}
|
Returns:
| Type | Description |
|---|---|
Lexicon
|
New lexicon loaded from CSV. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If required "lemma" column/mapping is missing. |
FileNotFoundError
|
If CSV file does not exist. |
Examples:
>>> lexicon = from_csv(
... "bleached_nouns.csv",
... "nouns",
... language_code="eng",
... column_mapping={"word": "lemma"},
... feature_columns=["number", "countability", "semantic_class"],
... pos="NOUN"
... )
from_tsv(path: str | Path, name: str, *, language_code: LanguageCode, column_mapping: dict[str, str] | None = None, feature_columns: list[str] | None = None, pos: str | None = None, description: str | None = None, **tsv_kwargs: Any) -> Lexicon
¶
Load lexicon from TSV file with flexible column mapping.
This is a convenience wrapper around from_csv() that sets sep="\t".
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to the TSV file. |
required |
name
|
str
|
Name for the lexicon. |
required |
language_code
|
LanguageCode
|
ISO 639-3 language code for all items. |
required |
column_mapping
|
dict[str, str] | None
|
Mapping from TSV column names to feature names. |
None
|
feature_columns
|
list[str] | None
|
TSV column names to include in features dict. |
None
|
pos
|
str | None
|
Part-of-speech tag to assign to all items. |
None
|
description
|
str | None
|
Optional description of the lexicon. |
None
|
**tsv_kwargs
|
Any
|
Additional keyword arguments passed to pandas.read_csv(). |
{}
|
Returns:
| Type | Description |
|---|---|
Lexicon
|
New lexicon loaded from TSV. |
Examples:
>>> lexicon = from_tsv(
... "verbs.tsv",
... "verbs",
... language_code="eng",
... feature_columns=["tense", "aspect"],
... pos="VERB"
... )
template_generation
¶
Abstract base class for mapping external frame inventories to Templates.
This module provides language-agnostic base classes for generating Template objects from external linguistic frame inventories (e.g., VerbNet, FrameNet, PropBank, valency lexicons).
FrameToTemplateMapper
¶
Bases: ABC
Abstract base class for mapping frame inventories to Templates.
This class provides a framework for generating Template objects from external linguistic frame data. Subclasses implement language- and resource-specific mapping logic.
Examples:
Implementing a VerbNet mapper:
>>> class VerbNetMapper(FrameToTemplateMapper):
... def generate_from_frame(self, verb_lemma, frame_data):
... slots = self.map_frame_to_slots(frame_data)
... constraints = self.generate_constraints(frame_data, slots)
... return Template(
... name=f"{verb_lemma}_{frame_data['id']}",
... template_string=frame_data['template_string'],
... slots=slots,
... constraints=constraints
... )
...
... def map_frame_to_slots(self, frame_data):
... # Extract slots from VerbNet syntax
... return {}
...
... def generate_constraints(self, frame_data, slots):
... # Generate constraints from VerbNet restrictions
... return []
generate_from_frame(*args: Any, **kwargs: Any) -> Template | list[Template]
abstractmethod
¶
Generate Template(s) from a frame specification.
This is the main entry point for template generation. Subclasses implement the specific logic for their frame inventory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*args
|
Any
|
Positional arguments (frame data, identifiers, etc.). |
()
|
**kwargs
|
Any
|
Keyword arguments (configuration options, etc.). |
{}
|
Returns:
| Type | Description |
|---|---|
Template | list[Template]
|
Generated template(s). May return multiple templates if the frame has multiple realizations (e.g., different complementizer types, alternations). |
Examples:
VerbNet implementation:
>>> mapper.generate_from_frame(
... verb_lemma="think",
... verbnet_class="29.9",
... frame_data={"primary": "NP V that S"}
... )
map_frame_to_slots(frame_data: Any) -> dict[str, Slot]
abstractmethod
¶
Map frame elements to Template slots.
Converts frame-specific element descriptions into Slot objects with appropriate constraints.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frame_data
|
Any
|
Frame specification from the external inventory. Type depends on the specific resource (dict, object, etc.). |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Slot]
|
Slots keyed by slot name. |
Examples:
Mapping VerbNet syntax to slots:
>>> slots = mapper.map_frame_to_slots({
... "syntax": [
... ("NP", "Agent"),
... ("V", None),
... ("NP", "Theme")
... ]
... })
>>> "subject" in slots
True
generate_constraints(frame_data: Any, slots: dict[str, Slot]) -> list[Constraint]
abstractmethod
¶
Generate multi-slot constraints from frame specifications.
Converts frame-specific restrictions into DSL Constraint objects that enforce relationships between slots.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frame_data
|
Any
|
Frame specification from the external inventory. |
required |
slots
|
dict[str, Slot]
|
Slots that have been created for this frame. |
required |
Returns:
| Type | Description |
|---|---|
list[Constraint]
|
Multi-slot constraints for the template. |
Examples:
Generating constraints from VerbNet restrictions:
>>> constraints = mapper.generate_constraints(
... frame_data={"restrictions": [...]},
... slots={"subject": ..., "verb": ...}
... )
create_template_name(*identifiers: str, separator: str = '_') -> str
¶
Create a unique template name from identifiers.
Utility method for generating consistent template names. Sanitizes identifiers by replacing spaces, dots, and hyphens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*identifiers
|
str
|
Components to include in the name (e.g., verb, class, frame). |
()
|
separator
|
str
|
Separator between components (default: "_"). |
'_'
|
Returns:
| Type | Description |
|---|---|
str
|
Sanitized template name. |
Examples:
>>> mapper = ConcreteMapper()
>>> mapper.create_template_name("think", "29.9", "that-clause")
'think_29_9_that_clause'
create_template_metadata(frame_data: dict[str, Any], **additional_metadata: Any) -> dict[str, Any]
¶
Create metadata dictionary for template.
Utility method for extracting and organizing frame metadata. Subclasses can override to add resource-specific metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frame_data
|
dict[str, Any]
|
Frame specification from the external inventory. |
required |
**additional_metadata
|
Any
|
Additional metadata to include. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Metadata dictionary for Template.metadata field. |
Examples:
>>> mapper = ConcreteMapper()
>>> metadata = mapper.create_template_metadata(
... frame_data={"id": "29.9-1", "examples": [...]},
... verb_lemma="think"
... )
MultiFrameMapper
¶
Bases: FrameToTemplateMapper
Mapper that generates multiple template variants from a single frame.
Some frame specifications support multiple realizations (e.g., different complementizer types, voice alternations). This class provides a framework for generating all variants.
Examples:
>>> class ClausalMapper(MultiFrameMapper):
... def get_frame_variants(self, frame_data):
... # Return list of variant specifications
... return [
... {"comp": "that", "mood": "declarative"},
... {"comp": "whether", "mood": "interrogative"},
... ]
...
... def generate_from_frame(self, verb, frame_data):
... variants = self.get_frame_variants(frame_data)
... return [self._generate_variant(verb, v) for v in variants]
...
... def map_frame_to_slots(self, frame_data):
... return {}
...
... def generate_constraints(self, frame_data, slots):
... return []
get_frame_variants(frame_data: Any) -> list[Any]
abstractmethod
¶
Extract all variants from frame specification.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frame_data
|
Any
|
Frame specification from the external inventory. |
required |
Returns:
| Type | Description |
|---|---|
list[Any]
|
List of variant specifications, each representing one possible realization of the frame. |
Examples:
>>> variants = mapper.get_frame_variants({
... "complementizers": ["that", "whether", "if"]
... })
>>> len(variants)
3
generate_from_frame(*args: Any, **kwargs: Any) -> list[Template]
¶
Generate templates for all frame variants.
Default implementation calls get_frame_variants() and generates a template for each variant. Subclasses can override for custom logic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*args
|
Any
|
Positional arguments passed to variant generation. |
()
|
**kwargs
|
Any
|
Keyword arguments passed to variant generation. |
{}
|
Returns:
| Type | Description |
|---|---|
list[Template]
|
Templates for all variants. |
Classification¶
classification
¶
Linguistic classification models for lexical items and templates.
Models for grouping lexical items and templates by linguistic properties.
LexicalItemClass and TemplateClass support cross-linguistic analysis
and alignment.
LexicalItemClass
¶
Bases: BeadBaseModel
A group of lexical items sharing a linguistic property.
Items are stored as a tuple in insertion order; mutating methods
(with_item, without_item) return new instances.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Class name. |
description |
str | None
|
Description of the classification. |
property_name |
str
|
The linguistic property defining the class. |
property_value |
JsonValue
|
Specific value of the property. |
items |
tuple[LexicalItem, ...]
|
Member items in insertion order. |
tags |
tuple[str, ...]
|
Categorization tags. |
class_metadata |
dict[str, JsonValue]
|
Additional metadata. |
__len__() -> int
¶
Return the number of items in the class.
__contains__(item_id: UUID) -> bool
¶
Return whether an item with item_id is present.
__iter__() -> Iterator[LexicalItem]
¶
Iterate over class members.
by_id(item_id: UUID) -> LexicalItem | None
¶
Return the item with the matching id, or None.
with_item(item: LexicalItem) -> Self
¶
Return a new class with item appended.
without_item(item_id: UUID) -> tuple[Self, LexicalItem]
¶
Return (new_class, removed_item).
languages() -> frozenset[str]
¶
Return the set of language codes (lowercased) present in the class.
get_items_by_language(language_code: str) -> tuple[LexicalItem, ...]
¶
Return items whose language code matches language_code.
Codes are normalized via validate_iso639_code before comparison.
is_monolingual() -> bool
¶
Return whether the class spans at most one language.
is_multilingual() -> bool
¶
Return whether the class spans more than one language.
TemplateClass
¶
Bases: BeadBaseModel
A group of templates sharing a linguistic property.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Class name. |
description |
str | None
|
Description. |
property_name |
str
|
Defining linguistic property. |
property_value |
JsonValue
|
Specific property value. |
templates |
tuple[Template, ...]
|
Member templates in insertion order. |
tags |
tuple[str, ...]
|
Categorization tags. |
class_metadata |
dict[str, JsonValue]
|
Additional metadata. |
__len__() -> int
¶
Return the number of templates in the class.
__contains__(template_id: UUID) -> bool
¶
Return whether a template with template_id is present.
__iter__() -> Iterator[Template]
¶
Iterate over the templates in the class.
by_id(template_id: UUID) -> Template | None
¶
Return the template with the matching id, or None.
with_template(template: Template) -> Self
¶
Return a new class with template appended.
without_template(template_id: UUID) -> tuple[Self, Template]
¶
Return (new_class, removed_template).
languages() -> frozenset[str]
¶
Return the set of language codes (lowercased) in the class.
get_templates_by_language(language_code: str) -> tuple[Template, ...]
¶
Return templates whose language code matches language_code.
is_monolingual() -> bool
¶
Return whether the class spans at most one language.
is_multilingual() -> bool
¶
Return whether the class spans more than one language.
Resource Adapters¶
base
¶
Abstract base class for external resource adapters.
This module defines the interface that all resource adapters must implement to fetch lexical items from external linguistic databases.
ResourceAdapter
¶
Bases: ABC
Abstract base class for external resource adapters.
Resource adapters fetch lexical items from external linguistic databases and convert them to the bead LexicalItem format. All adapters must implement language_code filtering to support multi-language workflows.
Subclasses must implement: - fetch_items(): Retrieve items from the external resource - is_available(): Check if the external resource is accessible
Examples:
>>> class MyAdapter(ResourceAdapter):
... def fetch_items(self, query=None, language_code=None, **kwargs):
... # Fetch from external resource
... return [LexicalItem(lemma="walk", pos="VERB", language_code="en")]
... def is_available(self):
... return True
>>> adapter = MyAdapter()
>>> items = adapter.fetch_items(query="walk", language_code="en")
>>> len(items) > 0
True
fetch_items(query: str | None = None, language_code: LanguageCode = None, **kwargs: Any) -> list[LexicalItem]
abstractmethod
¶
Fetch lexical items from external resource.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str | None
|
Query string in adapter-specific format (e.g., lemma, predicate name, class identifier). If None, behavior is adapter-specific (may return all items, raise error, or use default query). |
None
|
language_code
|
LanguageCode
|
ISO 639-1 (2-letter) or ISO 639-3 (3-letter) language code to filter results. Examples: "en", "eng", "ko", "kor". If None, returns items for all available languages. |
None
|
**kwargs
|
Any
|
Additional adapter-specific parameters (e.g., pos="VERB", resource="verbnet", include_features=True). |
{}
|
Returns:
| Type | Description |
|---|---|
list[LexicalItem]
|
Lexical items fetched from the external resource. Each item should have language_code set if known. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If query is invalid or required parameters are missing. |
RuntimeError
|
If the external resource is unavailable or the request fails. |
Examples:
>>> adapter = MyAdapter()
>>> items = adapter.fetch_items(query="break", language_code="en")
>>> all(item.language_code == "en" for item in items)
True
is_available() -> bool
abstractmethod
¶
Check if the external resource is available.
This method should verify that the external resource can be accessed, whether via installed packages, accessible data files, or network APIs.
Returns:
| Type | Description |
|---|---|
bool
|
True if the resource can be accessed, False otherwise. |
Examples:
>>> adapter = MyAdapter()
>>> adapter.is_available()
True
glazing
¶
Adapter for glazing package (VerbNet, PropBank, FrameNet).
This module provides an adapter to fetch lexical items from VerbNet, PropBank, and FrameNet via the glazing package using the proper loader classes.
GlazingAdapter
¶
Bases: ResourceAdapter
Adapter for glazing package (VerbNet, PropBank, FrameNet).
This adapter fetches verb frame information from VerbNet, PropBank, or FrameNet and converts it to LexicalItem format. Frame information is stored in the attributes field.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
resource
|
Literal['verbnet', 'propbank', 'framenet']
|
Which glazing resource to use. |
'verbnet'
|
cache
|
AdapterCache | None
|
Optional cache instance. If None, no caching is performed. |
None
|
Examples:
>>> adapter = GlazingAdapter(resource="verbnet")
>>> items = adapter.fetch_items(query="break", language_code="en")
>>> all(item.language_code == "en" for item in items)
True
fetch_items(query: str | None = None, language_code: LanguageCode = None, **kwargs: Any) -> list[LexicalItem]
¶
Fetch items from glazing resource.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str | None
|
Lemma or predicate to query (e.g., "break", "run"). If None, fetches ALL items from the resource. |
None
|
language_code
|
LanguageCode
|
Language code filter. Glazing resources are primarily English, so language_code="en" is typical. Other languages may not be supported. |
None
|
**kwargs
|
Any
|
Additional parameters: - include_frames (bool): Include detailed frame information (syntax, examples, descriptions). Default: False. |
{}
|
Returns:
| Type | Description |
|---|---|
list[LexicalItem]
|
Lexical items with frame information in attributes. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If glazing resource access fails. |
Examples:
>>> # Query specific verb
>>> adapter = GlazingAdapter(resource="verbnet")
>>> items = adapter.fetch_items(query="break", language_code="en")
>>> len(items) > 0
True
>>> # Fetch all items from resource
>>> all_items = adapter.fetch_items(query=None, language_code="en")
>>> len(all_items) > 100
True
>>> # Include detailed frame information
>>> items = adapter.fetch_items(
... query="break", language_code="en", include_frames=True
... )
>>> "frames" in items[0].attributes
True
is_available() -> bool
¶
Check if glazing package is available.
Returns:
| Type | Description |
|---|---|
bool
|
True if glazing can be imported and data is initialized, False otherwise. |
Examples:
>>> adapter = GlazingAdapter()
>>> adapter.is_available()
True
unimorph
¶
Adapter for UniMorph morphological paradigms.
This module provides an adapter to fetch morphological paradigms from UniMorph data and convert them to LexicalItem format with morphological features.
UniMorphAdapter
¶
Bases: ResourceAdapter
Adapter for UniMorph morphological paradigms.
This adapter fetches morphological paradigms from UniMorph and converts them to LexicalItem format. Morphological features are stored in the features field using UniMorph feature schema.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache
|
AdapterCache | None
|
Optional cache instance. If None, no caching is performed. |
None
|
Examples:
>>> adapter = UniMorphAdapter()
>>> items = adapter.fetch_items(query="walk", language_code="en")
>>> all(item.language_code == "en" for item in items)
True
>>> all("tense" in item.features for item in items if item.features)
True
__init__(cache: AdapterCache | None = None) -> None
¶
Initialize UniMorph adapter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache
|
AdapterCache | None
|
Optional cache instance. |
None
|
fetch_items(query: str | None = None, language_code: LanguageCode = None, **kwargs: Any) -> list[LexicalItem]
¶
Fetch morphological paradigms from UniMorph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str | None
|
Lemma to query (e.g., "walk", "먹다", "hamba"). |
None
|
language_code
|
LanguageCode
|
Required language code (e.g., "en", "ko", "zu"). UniMorph is organized by language, so this parameter is essential. |
None
|
**kwargs
|
Any
|
Additional parameters (e.g., pos="VERB"). |
{}
|
Returns:
| Type | Description |
|---|---|
list[LexicalItem]
|
Lexical items representing inflected forms with morphological features in the features field. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If language_code is None (required for UniMorph). |
RuntimeError
|
If UniMorph access fails. |
Examples:
>>> adapter = UniMorphAdapter()
>>> items = adapter.fetch_items(query="walk", language_code="en")
>>> len(items) > 0
True
>>> items[0].features.get("pos") == "VERB"
True
is_available() -> bool
¶
Check if UniMorph package is available.
Returns:
| Type | Description |
|---|---|
bool
|
True if unimorph can be imported and accessed, False otherwise. |
Examples:
>>> adapter = UniMorphAdapter()
>>> adapter.is_available()
True
cache
¶
Caching for adapter fetch results.
This module provides an in-memory cache to avoid redundant fetches from external resources when the same query is repeated.
AdapterCache
¶
In-memory cache for adapter fetch results.
The cache stores results keyed by a hash of query parameters. This avoids redundant fetches when the same query is made multiple times.
Examples:
>>> cache = AdapterCache()
>>> items = [LexicalItem(lemma="walk", pos="VERB")]
>>> key = cache.make_key("glazing", query="walk", language_code="en")
>>> cache.set(key, items)
>>> cached = cache.get(key)
>>> cached == items
True
get(key: str) -> list[LexicalItem] | None
¶
Get cached result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
Cache key generated by make_key(). |
required |
Returns:
| Type | Description |
|---|---|
list[LexicalItem] | None
|
Cached items if key exists, None otherwise. |
Examples:
>>> cache = AdapterCache()
>>> cache.get("nonexistent")
None
set(key: str, items: list[LexicalItem]) -> None
¶
Cache result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
Cache key generated by make_key(). |
required |
items
|
list[LexicalItem]
|
Items to cache. |
required |
Examples:
>>> cache = AdapterCache()
>>> items = [LexicalItem(lemma="walk")]
>>> cache.set("key1", items)
>>> cache.get("key1") == items
True
clear() -> None
¶
Clear entire cache.
Examples:
>>> cache = AdapterCache()
>>> cache.set("key1", [])
>>> cache.clear()
>>> cache.get("key1")
None
make_key(adapter_name: str, query: str | None = None, **kwargs: Any) -> str
¶
Generate cache key from query parameters.
Create a deterministic hash key from adapter name, query, and additional parameters. Same inputs always produce same key.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adapter_name
|
str
|
Name of the adapter (e.g., "glazing", "unimorph"). |
required |
query
|
str | None
|
Query string. |
None
|
**kwargs
|
Any
|
Additional query parameters (e.g., language_code, pos). |
{}
|
Returns:
| Type | Description |
|---|---|
str
|
Cache key (hexadecimal hash string). |
Examples:
>>> cache = AdapterCache()
>>> key1 = cache.make_key("glazing", query="walk", language_code="en")
>>> key2 = cache.make_key("glazing", query="walk", language_code="en")
>>> key1 == key2
True
>>> key3 = cache.make_key("glazing", query="run", language_code="en")
>>> key1 != key3
True
registry
¶
Registry for managing resource adapters.
This module provides a registry for discovering and instantiating adapters by name.
AdapterRegistry
¶
Registry for managing resource adapters.
The registry allows adapters to be registered by name and retrieved with custom initialization parameters.
Examples:
>>> from bead.resources.adapters.glazing import GlazingAdapter
>>> registry = AdapterRegistry()
>>> registry.register("glazing", GlazingAdapter)
>>> adapter = registry.get("glazing", resource="verbnet")
>>> isinstance(adapter, GlazingAdapter)
True
register(name: str, adapter_class: type[ResourceAdapter]) -> None
¶
Register an adapter class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Adapter name (e.g., "glazing", "unimorph"). |
required |
adapter_class
|
type[ResourceAdapter]
|
Adapter class (not instance) that subclasses ResourceAdapter. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If name is empty or adapter_class is not a ResourceAdapter subclass. |
Examples:
>>> from bead.resources.adapters.glazing import GlazingAdapter
>>> registry = AdapterRegistry()
>>> registry.register("glazing", GlazingAdapter)
>>> "glazing" in registry.list_available()
True
get(name: str, **kwargs: Any) -> ResourceAdapter
¶
Get adapter instance by name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Adapter name (must be registered). |
required |
**kwargs
|
Any
|
Arguments passed to adapter constructor. |
{}
|
Returns:
| Type | Description |
|---|---|
ResourceAdapter
|
Adapter instance. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If adapter name is not registered. |
Examples:
>>> from bead.resources.adapters.glazing import GlazingAdapter
>>> registry = AdapterRegistry()
>>> registry.register("glazing", GlazingAdapter)
>>> adapter = registry.get("glazing", resource="verbnet")
>>> adapter.resource
'verbnet'
list_available() -> list[str]
¶
List names of available adapters.
Returns:
| Type | Description |
|---|---|
list[str]
|
Sorted list of registered adapter names. |
Examples:
>>> registry = AdapterRegistry()
>>> registry.list_available()
[]
>>> from bead.resources.adapters.glazing import GlazingAdapter
>>> registry.register("glazing", GlazingAdapter)
>>> registry.list_available()
['glazing']