bead.resources¶
Stage 1 of the bead pipeline: lexical items, templates, and constraints.
Lexical Items and Lexicons¶
lexical_item
¶
Lexical item models for words and multi-word expressions.
This module provides data models for representing lexical items in the bead system. Lexical items are the atomic units that fill template slots during sentence generation. Includes support for single words and multi-word expressions (MWEs).
LexicalItem
¶
Bases: BeadBaseModel
A lexical item with linguistic features.
Follows UniMorph structure: lemma, form, features bundle. - lemma: base/citation form - form: inflected surface form (None if same as lemma) - features: feature bundle (pos, tense, person, number, etc.)
Attributes:
| Name | Type | Description |
|---|---|---|
lemma |
str
|
Base/citation form (e.g., "walk", "the"). |
form |
str | None
|
Inflected surface form if different from lemma (e.g., "walked", "walking"). None means form equals lemma. |
language_code |
LanguageCode
|
ISO 639-3 language code (e.g., "eng"). |
features |
dict[str, Any]
|
Feature bundle with grammatical/linguistic features: - pos: str (e.g., "VERB", "DET", "NOUN", "ADJ", "ADP") - Morphological: tense, person, number, case, gender, etc. - unimorph_features: str (e.g., "V;PRS;3;SG") - Lexical resource info: verbnet_class, themroles, frame_info, etc. |
source |
str | None
|
Provenance (e.g., "VerbNet", "UniMorph", "manual"). |
Examples:
>>> # Inflected verb
>>> verb = LexicalItem(
... lemma="walk",
... form="walked",
... language_code="eng",
... features={"pos": "VERB", "tense": "PST"},
... source="UniMorph"
... )
>>> verb.form
'walked'
>>>
>>> # Uninflected determiner
>>> det = LexicalItem(
... lemma="the",
... form=None,
... language_code="eng",
... features={"pos": "DET"},
... source="manual"
... )
>>> det.form is None
True
validate_lemma(v: str) -> str
classmethod
¶
Validate that lemma is non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
The lemma value to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The validated lemma. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If lemma is empty or contains only whitespace. |
MWEComponent
¶
Bases: LexicalItem
A component of a multi-word expression.
Components represent individual parts of an MWE (e.g., verb and particle in a phrasal verb). Each component has a role within the MWE and can have its own constraints.
Attributes:
| Name | Type | Description |
|---|---|---|
role |
str
|
Role of this component in the MWE (e.g., "verb", "particle", "noun"). |
required |
bool
|
Whether this component must be present (default: True). |
constraints |
list[Constraint]
|
Component-specific constraints (in addition to base LexicalItem constraints). |
Examples:
>>> # Verb component of "take off"
>>> verb = MWEComponent(
... lemma="take",
... pos="VERB",
... role="verb",
... required=True
... )
>>> # Particle component
>>> particle = MWEComponent(
... lemma="off",
... pos="PART",
... role="particle",
... required=True
... )
MultiWordExpression
¶
Bases: LexicalItem
Multi-word expression as a lexical item.
MWEs are lexical items composed of multiple components. They can be separable (components can be non-adjacent) or inseparable. MWEs support component-level constraints and adjacency patterns.
Attributes:
| Name | Type | Description |
|---|---|---|
components |
list[MWEComponent]
|
Components that make up this MWE. |
separable |
bool
|
Whether components can be separated by other words (default: False). Example: "take the ball off" (separable) vs "kick the bucket" (inseparable). |
adjacency_pattern |
str | None
|
DSL expression defining valid adjacency patterns. Variables: component roles, 'distance' between components. Example: "distance(verb, particle) <= 3" |
Examples:
>>> # Inseparable phrasal verb "look after"
>>> mwe1 = MultiWordExpression(
... lemma="look after",
... pos="VERB",
... components=[
... MWEComponent(lemma="look", pos="VERB", role="verb"),
... MWEComponent(lemma="after", pos="ADP", role="particle")
... ],
... separable=False
... )
>>>
>>> # Separable phrasal verb "take off"
>>> mwe2 = MultiWordExpression(
... lemma="take off",
... pos="VERB",
... components=[
... MWEComponent(lemma="take", pos="VERB", role="verb"),
... MWEComponent(lemma="off", pos="PART", role="particle")
... ],
... separable=True,
... adjacency_pattern="distance(verb, particle) <= 3"
... )
>>>
>>> # MWE with constraints on components
>>> mwe3 = MultiWordExpression(
... lemma="break down",
... pos="VERB",
... components=[
... MWEComponent(
... lemma="break",
... pos="VERB",
... role="verb",
... constraints=[
... Constraint(
... expression="self.lemma in motion_verbs",
... context={"motion_verbs": {"break", "take", "give"}}
... )
... ]
... ),
... MWEComponent(lemma="down", pos="PART", role="particle")
... ],
... separable=True
... )
lexicon
¶
Lexicon management for collections of lexical items.
This module provides the Lexicon class for managing, querying, and manipulating collections of lexical items. It supports filtering, searching, merging, and conversion to/from pandas and polars DataFrames.
Lexicon
¶
Bases: BeadBaseModel
A collection of lexical items with operations for filtering and analysis.
The Lexicon class manages collections of LexicalItem objects and provides methods for: - Adding and removing items (CRUD operations) - Filtering by properties, features, and attributes - Searching by text - Merging with other lexicons - Converting to/from pandas and polars DataFrames - Serialization to JSONLines
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Name of the lexicon. |
description |
str | None
|
Optional description of the lexicon's purpose. |
language_code |
LanguageCode | None
|
ISO 639-1 (2-letter) or ISO 639-3 (3-letter) language code. Examples: "en", "eng", "ko", "kor", "zu", "zul". Automatically validated and normalized to lowercase. |
items |
dict[UUID, LexicalItem]
|
Dictionary of items indexed by their UUIDs. |
tags |
list[str]
|
Tags for categorizing the lexicon. |
Examples:
>>> lexicon = Lexicon(name="verbs")
>>> item = LexicalItem(lemma="walk", pos="VERB")
>>> lexicon.add(item)
>>> len(lexicon)
1
>>> verbs = lexicon.filter_by_pos("VERB")
>>> len(verbs.items)
1
__len__() -> int
¶
__iter__() -> Iterator[LexicalItem]
¶
Iterate over items in lexicon.
Returns:
| Type | Description |
|---|---|
Iterator[LexicalItem]
|
Iterator over lexical items. |
Examples:
__contains__(item_id: UUID) -> bool
¶
Check if item ID is in lexicon.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item_id
|
UUID
|
The item ID to check. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if item ID exists in lexicon. |
Examples:
add(item: LexicalItem) -> None
¶
Add a lexical item to the lexicon.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
LexicalItem
|
The item to add. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If item with same ID already exists. |
Examples:
add_many(items: list[LexicalItem]) -> None
¶
Add multiple items to the lexicon.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
items
|
list[LexicalItem]
|
The items to add. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If any item with same ID already exists. |
Examples:
remove(item_id: UUID) -> LexicalItem
¶
Remove and return an item by ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item_id
|
UUID
|
The ID of the item to remove. |
required |
Returns:
| Type | Description |
|---|---|
LexicalItem
|
The removed item. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If item ID not found. |
Examples:
get(item_id: UUID) -> LexicalItem | None
¶
Get an item by ID, or None if not found.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item_id
|
UUID
|
The ID of the item to get. |
required |
Returns:
| Type | Description |
|---|---|
LexicalItem | None
|
The item if found, None otherwise. |
Examples:
filter(predicate: Callable[[LexicalItem], bool]) -> Lexicon
¶
Filter items by a predicate function.
Creates a new lexicon containing only items that satisfy the predicate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predicate
|
Callable[[LexicalItem], bool]
|
Function that returns True for items to include. |
required |
Returns:
| Type | Description |
|---|---|
Lexicon
|
New lexicon with filtered items. |
Examples:
filter_by_pos(pos: str) -> Lexicon
¶
Filter items by part of speech.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pos
|
str
|
The part of speech to filter by. |
required |
Returns:
| Type | Description |
|---|---|
Lexicon
|
New lexicon with items matching the POS. |
Examples:
>>> lexicon = Lexicon(name="test", language_code="eng")
>>> lexicon.add(LexicalItem(
... lemma="walk", language_code="eng", features={"pos": "VERB"}
... ))
>>> lexicon.add(LexicalItem(
... lemma="dog", language_code="eng", features={"pos": "NOUN"}
... ))
>>> verbs = lexicon.filter_by_pos("VERB")
>>> len(verbs.items)
1
filter_by_lemma(lemma: str) -> Lexicon
¶
Filter items by lemma (exact match).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lemma
|
str
|
The lemma to filter by. |
required |
Returns:
| Type | Description |
|---|---|
Lexicon
|
New lexicon with items matching the lemma. |
Examples:
filter_by_feature(feature_name: str, feature_value: Any) -> Lexicon
¶
Filter items by a specific feature value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
feature_name
|
str
|
The name of the feature. |
required |
feature_value
|
Any
|
The value to match. |
required |
Returns:
| Type | Description |
|---|---|
Lexicon
|
New lexicon with items having the specified feature value. |
Examples:
filter_by_attribute(attr_name: str, attr_value: Any) -> Lexicon
¶
Filter items by a specific attribute value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
attr_name
|
str
|
The name of the attribute. |
required |
attr_value
|
Any
|
The value to match. |
required |
Returns:
| Type | Description |
|---|---|
Lexicon
|
New lexicon with items having the specified attribute value. |
Examples:
>>> lexicon = Lexicon(name="test")
>>> lexicon.add(LexicalItem(
... lemma="walk", language_code="eng", features={"frequency": 1000}
... ))
>>> lexicon.add(LexicalItem(
... lemma="saunter", language_code="eng", features={"frequency": 10}
... ))
>>> high_freq = lexicon.filter_by_attribute("frequency", 1000)
>>> len(high_freq.items)
1
search(query: str, field: str = 'lemma') -> Lexicon
¶
Search for items containing query string in specified field.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Search string (case-insensitive substring match). |
required |
field
|
str
|
Field to search in ("lemma", "pos", "form"). |
'lemma'
|
Returns:
| Type | Description |
|---|---|
Lexicon
|
New lexicon with matching items. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If field is not a valid searchable field. |
Examples:
merge(other: Lexicon, strategy: Literal['keep_first', 'keep_second', 'error'] = 'keep_first') -> Lexicon
¶
Merge with another lexicon.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
Lexicon
|
The lexicon to merge with. |
required |
strategy
|
Literal['keep_first', 'keep_second', 'error']
|
How to handle duplicate IDs: - "keep_first": Keep item from self - "keep_second": Keep item from other - "error": Raise error on duplicates |
'keep_first'
|
Returns:
| Type | Description |
|---|---|
Lexicon
|
New merged lexicon. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If strategy is "error" and duplicates found. |
Examples:
to_dataframe(backend: Literal['pandas', 'polars'] = 'pandas') -> DataFrame
¶
Convert lexicon to DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
backend
|
Literal['pandas', 'polars']
|
DataFrame backend to use (default: "pandas"). |
'pandas'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
pandas or polars DataFrame with columns: id, lemma, pos, form, source, created_at, modified_at, plus separate columns for each feature and attribute. |
Examples:
from_dataframe(df: DataFrame, name: str) -> Lexicon
classmethod
¶
Create lexicon from DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
pandas or polars DataFrame with at minimum a 'lemma' column. |
required |
name
|
str
|
Name for the lexicon. |
required |
Returns:
| Type | Description |
|---|---|
Lexicon
|
New lexicon created from DataFrame. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If DataFrame does not have a 'lemma' column. |
Examples:
to_jsonl(path: str) -> None
¶
Templates and Collections¶
template
¶
Template and structure models for sentence generation.
This module provides models for sentence templates and their structures. Templates contain slots that are filled with lexical items during sentence generation.
Slot
¶
Bases: BeadBaseModel
A slot in a template that can be filled with a lexical item.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique name for the slot within the template. |
description |
str | None
|
Human-readable description of the slot's purpose. |
constraints |
list[Constraint]
|
Constraints that determine valid fillers. |
required |
bool
|
Whether the slot must be filled. |
default_value |
str | None
|
Default value if slot is not filled. |
Examples:
>>> from bead.resources.constraints import Constraint
>>> slot = Slot(
... name="subject",
... description="The subject of the sentence",
... constraints=[
... Constraint(expression="self.features.pos == 'NOUN'")
... ],
... required=True
... )
validate_name(v: str) -> str
classmethod
¶
Validate that name is a valid Python identifier.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
The slot name to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The validated slot name. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If name is not a valid Python identifier. |
Template
¶
Bases: BeadBaseModel
A sentence template with slots for lexical items.
Templates define the structure of generated sentences. They contain: - A template string with slot placeholders (e.g., "{subject} {verb} {object}") - Slot definitions with constraints - Optional language code - Optional metadata
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique name for the template. |
template_string |
str
|
Template with {slot_name} placeholders. |
slots |
dict[str, Slot]
|
Slot definitions keyed by slot name. |
constraints |
list[Constraint]
|
Multi-slot constraints (slot names as variables in DSL expressions). |
description |
str | None
|
Human-readable description. |
language_code |
LanguageCode | None
|
ISO 639-1 (2-letter) or ISO 639-3 (3-letter) language code. Examples: "en", "eng", "ko", "kor", "zu", "zul". Required for cross-linguistic classification via TemplateClass. |
tags |
list[str]
|
Tags for categorization. |
metadata |
dict[str, MetadataValue]
|
Additional metadata. |
Examples:
>>> template = Template(
... name="simple_transitive",
... template_string="{subject} {verb} {object}.",
... slots={
... "subject": Slot(name="subject", required=True),
... "verb": Slot(name="verb", required=True),
... "object": Slot(name="object", required=True)
... },
... tags=["transitive", "simple"]
... )
required_slot_names: set[str]
property
¶
Get names of all required slots.
Returns:
| Type | Description |
|---|---|
set[str]
|
Set of slot names where required=True. |
validate_name(v: str) -> str
classmethod
¶
Validate that name is non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
The template name to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The validated template name. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If name is empty. |
validate_template_string(v: str) -> str
classmethod
¶
Validate that template_string is non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
The template string to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The validated template string. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If template_string is empty. |
validate_slots_match_template() -> Template
¶
Validate that template_string and slots are consistent.
Ensures that: - All slot names in template_string exist in slots dict - All slots in dict are referenced in template_string - Slot names match their keys in the dict
Returns:
| Type | Description |
|---|---|
Template
|
The validated template. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If template_string and slots are inconsistent. |
fill_with_values(slot_values: dict[str, str], strategy_name: str = 'manual') -> FilledTemplate
¶
Create a FilledTemplate by filling slots with string values.
This is a lightweight alternative to CSPFiller for cases where you already have the values and just need a FilledTemplate object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
slot_values
|
dict[str, str]
|
Mapping of slot names to string values to fill them with. |
required |
strategy_name
|
str
|
Name of strategy used (for metadata). |
'manual'
|
Returns:
| Type | Description |
|---|---|
FilledTemplate
|
A filled template with the provided values. |
Examples:
TemplateSequence
¶
Bases: BeadBaseModel
A sequence of templates to be filled together.
Template sequences allow multiple templates to be filled with related constraints (e.g., relational constraints across templates).
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique name for the sequence. |
templates |
list[Template]
|
Ordered list of templates. |
constraints |
list[Constraint]
|
Cross-template constraints (span multiple templates). |
Examples:
>>> sequence = TemplateSequence(
... name="question_answer",
... templates=[question_template, answer_template],
... constraints=[...]
... )
validate_name(v: str) -> str
classmethod
¶
Validate that name is non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
The sequence name to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The validated sequence name. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If name is empty. |
TemplateTree
¶
Bases: BeadBaseModel
A tree structure of templates.
Template trees represent hierarchical relationships between templates (e.g., a discourse structure).
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique name for the tree. |
root |
Template
|
Root template. |
children |
list[TemplateTree]
|
Child subtrees. |
Examples:
>>> tree = TemplateTree(
... name="discourse",
... root=intro_template,
... children=[
... TemplateTree(name="body", root=body_template, children=[]),
... TemplateTree(name="conclusion", root=conclusion_template, children=[])
... ]
... )
validate_name(v: str) -> str
classmethod
¶
Validate that name is non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
The tree name to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The validated tree name. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If name is empty. |
template_collection
¶
Template collection management.
This module provides the TemplateCollection class for managing collections of sentence templates.
TemplateCollection
¶
Bases: BeadBaseModel
A collection of templates with operations for filtering and analysis.
Similar to Lexicon but for Template objects. The TemplateCollection class manages collections of Template objects and provides methods for: - Adding and removing templates (CRUD operations) - Filtering by properties and tags - Searching by name or template string - Merging with other collections - Converting to/from pandas and polars DataFrames - Serialization to JSONLines
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Name of the collection. |
description |
str | None
|
Optional description. |
language_code |
str | None
|
ISO 639-1 or 639-3 language code (e.g., "en", "es", "eng"). |
templates |
dict[UUID, Template]
|
Dictionary of templates indexed by their UUIDs. |
tags |
list[str]
|
Tags for categorization. |
Examples:
>>> from bead.resources import Slot
>>> collection = TemplateCollection(name="transitive")
>>> template = Template(
... name="simple",
... template_string="{subject} {verb} {object}.",
... slots={
... "subject": Slot(name="subject"),
... "verb": Slot(name="verb"),
... "object": Slot(name="object"),
... }
... )
>>> collection.add(template)
>>> len(collection)
1
__len__() -> int
¶
__iter__() -> Iterator[Template]
¶
Iterate over templates in collection.
Returns:
| Type | Description |
|---|---|
Iterator[Template]
|
Iterator over templates. |
Examples:
>>> from bead.resources import Slot
>>> collection = TemplateCollection(name="test")
>>> t1 = Template(
... name="t1", template_string="{x}.", slots={"x": Slot(name="x")}
... )
>>> t2 = Template(
... name="t2", template_string="{y}.", slots={"y": Slot(name="y")}
... )
>>> collection.add(t1)
>>> collection.add(t2)
>>> [t.name for t in collection]
['t1', 't2']
__contains__(template_id: UUID) -> bool
¶
Check if template ID is in collection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
template_id
|
UUID
|
The template ID to check. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if template ID exists in collection. |
Examples:
add(template: Template) -> None
¶
Add a template to the collection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
template
|
Template
|
The template to add. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If template with same ID already exists. |
Examples:
add_many(templates: list[Template]) -> None
¶
Add multiple templates to the collection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
templates
|
list[Template]
|
The templates to add. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If any template with same ID already exists. |
Examples:
>>> from bead.resources import Slot
>>> collection = TemplateCollection(name="test")
>>> t1 = Template(
... name="t1", template_string="{x}.", slots={"x": Slot(name="x")}
... )
>>> t2 = Template(
... name="t2", template_string="{y}.", slots={"y": Slot(name="y")}
... )
>>> collection.add_many([t1, t2])
>>> len(collection)
2
remove(template_id: UUID) -> Template
¶
Remove and return a template by ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
template_id
|
UUID
|
The ID of the template to remove. |
required |
Returns:
| Type | Description |
|---|---|
Template
|
The removed template. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If template ID not found. |
Examples:
>>> from bead.resources import Slot
>>> collection = TemplateCollection(name="test")
>>> template = Template(
... name="test", template_string="{x}.", slots={"x": Slot(name="x")}
... )
>>> collection.add(template)
>>> removed = collection.remove(template.id)
>>> removed.name
'test'
>>> len(collection)
0
get(template_id: UUID) -> Template | None
¶
Get a template by ID, or None if not found.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
template_id
|
UUID
|
The ID of the template to get. |
required |
Returns:
| Type | Description |
|---|---|
Template | None
|
The template if found, None otherwise. |
Examples:
>>> from bead.resources import Slot
>>> collection = TemplateCollection(name="test")
>>> template = Template(
... name="test", template_string="{x}.", slots={"x": Slot(name="x")}
... )
>>> collection.add(template)
>>> retrieved = collection.get(template.id)
>>> retrieved.name
'test'
>>> from uuid import uuid4
>>> collection.get(uuid4()) is None
True
filter(predicate: Callable[[Template], bool]) -> TemplateCollection
¶
Filter templates by a predicate function.
Creates a new collection containing only templates that satisfy the predicate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predicate
|
Callable[[Template], bool]
|
Function that returns True for templates to include. |
required |
Returns:
| Type | Description |
|---|---|
TemplateCollection
|
New collection with filtered templates. |
Examples:
>>> from bead.resources import Slot
>>> collection = TemplateCollection(name="test")
>>> t1 = Template(
... name="t1",
... template_string="{x}.",
... slots={"x": Slot(name="x")},
... tags=["simple"],
... )
>>> t2 = Template(
... name="t2",
... template_string="{y} {z}.",
... slots={"y": Slot(name="y"), "z": Slot(name="z")},
... tags=["complex"],
... )
>>> collection.add(t1)
>>> collection.add(t2)
>>> simple = collection.filter(lambda t: "simple" in t.tags)
>>> len(simple.templates)
1
filter_by_tag(tag: str) -> TemplateCollection
¶
Filter templates by tag.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tag
|
str
|
The tag to filter by. |
required |
Returns:
| Type | Description |
|---|---|
TemplateCollection
|
New collection with templates having the specified tag. |
Examples:
>>> from bead.resources import Slot
>>> collection = TemplateCollection(name="test")
>>> t1 = Template(
... name="t1",
... template_string="{x}.",
... slots={"x": Slot(name="x")},
... tags=["simple"],
... )
>>> t2 = Template(
... name="t2",
... template_string="{y}.",
... slots={"y": Slot(name="y")},
... tags=["complex"],
... )
>>> collection.add(t1)
>>> collection.add(t2)
>>> simple = collection.filter_by_tag("simple")
>>> len(simple.templates)
1
filter_by_slot_count(count: int) -> TemplateCollection
¶
Filter templates by number of slots.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
count
|
int
|
The number of slots to filter by. |
required |
Returns:
| Type | Description |
|---|---|
TemplateCollection
|
New collection with templates having the specified slot count. |
Examples:
>>> from bead.resources import Slot
>>> collection = TemplateCollection(name="test")
>>> t1 = Template(
... name="t1", template_string="{x}.", slots={"x": Slot(name="x")}
... )
>>> t2 = Template(
... name="t2",
... template_string="{y} {z}.",
... slots={"y": Slot(name="y"), "z": Slot(name="z")},
... )
>>> collection.add(t1)
>>> collection.add(t2)
>>> single_slot = collection.filter_by_slot_count(1)
>>> len(single_slot.templates)
1
search(query: str, field: str = 'name') -> TemplateCollection
¶
Search for templates containing query string in specified field.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Search string (case-insensitive substring match). |
required |
field
|
str
|
Field to search in ("name", "template_string"). |
'name'
|
Returns:
| Type | Description |
|---|---|
TemplateCollection
|
New collection with matching templates. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If field is not a valid searchable field. |
Examples:
merge(other: TemplateCollection, strategy: Literal['keep_first', 'keep_second', 'error'] = 'keep_first') -> TemplateCollection
¶
Merge with another collection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
TemplateCollection
|
The collection to merge with. |
required |
strategy
|
Literal['keep_first', 'keep_second', 'error']
|
How to handle duplicate IDs: - "keep_first": Keep template from self - "keep_second": Keep template from other - "error": Raise error on duplicates |
'keep_first'
|
Returns:
| Type | Description |
|---|---|
TemplateCollection
|
New merged collection. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If strategy is "error" and duplicates found. |
Examples:
>>> from bead.resources import Slot
>>> c1 = TemplateCollection(name="c1")
>>> c1.add(
... Template(
... name="t1", template_string="{x}.", slots={"x": Slot(name="x")}
... )
... )
>>> c2 = TemplateCollection(name="c2")
>>> c2.add(
... Template(
... name="t2", template_string="{y}.", slots={"y": Slot(name="y")}
... )
... )
>>> merged = c1.merge(c2)
>>> len(merged.templates)
2
to_dataframe(backend: Literal['pandas', 'polars'] = 'pandas') -> DataFrame
¶
Convert collection to DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
backend
|
Literal['pandas', 'polars']
|
DataFrame backend to use (default: "pandas"). |
'pandas'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
pandas or polars DataFrame with columns: id, name, template_string, description, slot_count, slot_names, tags, created_at, modified_at. |
Examples:
>>> from bead.resources import Slot
>>> collection = TemplateCollection(name="test")
>>> template = Template(
... name="test", template_string="{x}.", slots={"x": Slot(name="x")}
... )
>>> collection.add(template)
>>> df = collection.to_dataframe()
>>> "name" in df.columns
True
>>> "template_string" in df.columns
True
from_dataframe(df: DataFrame, name: str) -> TemplateCollection
classmethod
¶
Create collection from DataFrame.
Note: This method creates templates without slot definitions since DataFrame representation doesn't include full slot information. Use from_jsonl for full template serialization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
pandas or polars DataFrame with at minimum 'name' and 'template_string' columns. |
required |
name
|
str
|
Name for the collection. |
required |
Returns:
| Type | Description |
|---|---|
TemplateCollection
|
New collection created from DataFrame. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If DataFrame does not have required columns. |
Examples:
to_jsonl(path: str) -> None
¶
Save collection to JSONLines file (one template per line).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to the output file. |
required |
Examples:
from_jsonl(path: str, name: str) -> TemplateCollection
classmethod
¶
Load collection from JSONLines file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to the input file. |
required |
name
|
str
|
Name for the collection. |
required |
Returns:
| Type | Description |
|---|---|
TemplateCollection
|
New collection loaded from file. |
Examples:
Constraints¶
constraints
¶
Constraint models for lexical item selection.
This module provides a universal constraint model based on DSL expressions. Constraints are pure DSL expressions with optional context variables.
Scope is determined by storage location: - Slot.constraints → single-slot constraints (self = slot filler) - Template.constraints → multi-slot constraints (slot names as variables) - TemplateSequence.constraints → cross-template constraints
Constraint
¶
Bases: BeadBaseModel
Universal constraint expressed via DSL.
All constraints are DSL expressions evaluated with a context dictionary. The scope of the constraint is determined by where it is stored: - Slot.constraints: single-slot constraints where 'self' refers to the slot filler - Template.constraints: multi-slot constraints where slot names are variables - TemplateSequence.constraints: cross-template constraints
Attributes:
| Name | Type | Description |
|---|---|---|
expression |
str
|
DSL expression to evaluate (must return boolean). |
context |
dict[str, ContextValue]
|
Context variables available during evaluation (e.g., whitelists, constants). |
description |
str | None
|
Optional human-readable description of the constraint. |
compiled |
ASTNode | None
|
Cached compiled AST after first compilation (optimization). |
Examples:
Extensional (whitelist):
>>> constraint = Constraint(
... expression="self.lemma in motion_verbs",
... context={"motion_verbs": {"walk", "run", "jump"}}
... )
Intensional (feature-based):
>>> constraint = Constraint(
... expression="self.pos == 'VERB' and self.features.number == 'singular'"
... )
Binary agreement:
IF-THEN conditional:
>>> constraint = Constraint(
... expression="det.lemma != 'a' or noun.features.number == 'singular'"
... )
combine(*constraints: Constraint, logic: str = 'and') -> Constraint
classmethod
¶
Combine multiple constraints with AND or OR logic.
Merges all context dictionaries from input constraints and combines their expressions using the specified logical operator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*constraints
|
Constraint
|
Variable number of constraints to combine. |
()
|
logic
|
str
|
Logical operator to use: "and" or "or" (default: "and"). |
'and'
|
Returns:
| Type | Description |
|---|---|
Constraint
|
New constraint with combined expressions and merged contexts. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no constraints provided or invalid logic operator. |
Examples:
>>> c1 = Constraint(
... expression="self.pos == 'VERB'",
... description="Must be a verb"
... )
>>> c2 = Constraint(
... expression="self.features.tense == 'present'",
... description="Must be present tense"
... )
>>> combined = Constraint.combine(c1, c2)
>>> "and" in combined.expression
True
>>> combined.description
'Must be a verb; Must be present tense'
With OR logic and contexts:
>>> c1 = Constraint(
... expression="self.lemma in verbs",
... context={"verbs": {"walk", "run"}},
... description="Motion verb"
... )
>>> c2 = Constraint(
... expression="self.lemma in actions",
... context={"actions": {"jump", "hop"}},
... description="Action verb"
... )
>>> combined = Constraint.combine(c1, c2, logic="or")
>>> " or " in combined.expression
True
>>> len(combined.context)
2
constraint_builders
¶
Abstract base classes for programmatic constraint generation.
This module provides language-agnostic base classes for building constraints programmatically. Language-specific implementations should extend these bases.
ConstraintBuilder
¶
Bases: ABC
Abstract base class for programmatic constraint generation.
Constraint builders encapsulate logic for generating DSL constraints based on configuration and rules. Subclasses implement specific constraint generation strategies.
Examples:
>>> class NumberAgreementBuilder(ConstraintBuilder):
... def build(self, *slot_names: str) -> Constraint:
... # Generate number agreement constraint
... pairs = []
... for i, slot1 in enumerate(slot_names):
... for slot2 in slot_names[i+1:]:
... pairs.append(f"{slot1}.number == {slot2}.number")
... return Constraint(
... expression=" and ".join(pairs),
... description=f"Number agreement: {', '.join(slot_names)}"
... )
build(*args: Any, **kwargs: Any) -> Constraint
abstractmethod
¶
Build a Constraint object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*args
|
Any
|
Positional arguments (slot names, properties, etc.). |
()
|
**kwargs
|
Any
|
Keyword arguments (configuration options). |
{}
|
Returns:
| Type | Description |
|---|---|
Constraint
|
Generated constraint. |
AgreementConstraintBuilder
¶
Bases: ConstraintBuilder
Builder for feature agreement constraints.
Generates constraints that enforce feature agreement across slots (e.g., number, gender, case). Supports exact matching or equivalence classes via agreement rules.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
feature_name
|
str
|
Name of the feature to enforce agreement on (e.g., "number", "gender"). |
required |
agreement_rules
|
dict[str, list[str]] | None
|
Optional equivalence classes. Maps canonical value to list of equivalent values. For example: {"singular": ["singular", "sing", "sg"], "plural": ["plural", "pl"]} |
None
|
Examples:
Exact number agreement:
>>> builder = AgreementConstraintBuilder("number")
>>> constraint = builder.build("subject", "verb")
>>> expr = "subject.features.get('number') == verb.features.get('number')"
>>> expr in constraint.expression
True
Agreement with equivalence rules:
>>> rules = {"singular": ["sing", "sg"], "plural": ["pl"]}
>>> builder = AgreementConstraintBuilder("number", agreement_rules=rules)
>>> constraint = builder.build("det", "noun")
>>> "equiv_" in constraint.expression # Uses equivalence class checks
True
build(*slot_names: str) -> Constraint
¶
Build agreement constraint for given slots.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*slot_names
|
str
|
Names of slots to enforce agreement between (≥2 required). |
()
|
Returns:
| Type | Description |
|---|---|
Constraint
|
Agreement constraint. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If fewer than 2 slot names provided. |
ConditionalConstraintBuilder
¶
Bases: ConstraintBuilder
Builder for IF-THEN (conditional) constraints.
Generates constraints that enforce requirements when conditions are met. Implements logical implication: IF condition THEN requirement.
Examples:
>>> builder = ConditionalConstraintBuilder()
>>> constraint = builder.build(
... condition="det.lemma == 'a'",
... requirement="noun.features.get('number') == 'singular'",
... description="'a' requires singular noun"
... )
>>> "not (" in constraint.expression # IF-THEN encoded as: not cond or req
True
build(*, condition: str, requirement: str, description: str | None = None, context: dict[str, Any] | None = None) -> Constraint
¶
Build conditional constraint.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
condition
|
str
|
Condition expression (IF part). |
required |
requirement
|
str
|
Requirement expression (THEN part). |
required |
description
|
str | None
|
Human-readable description. |
None
|
context
|
dict[str, Any] | None
|
Context variables for evaluation. |
None
|
Returns:
| Type | Description |
|---|---|
Constraint
|
Conditional constraint. |
Notes
Logical implication (IF A THEN B) is encoded as: (NOT A) OR B
SetMembershipConstraintBuilder
¶
Bases: ConstraintBuilder
Builder for whitelist/blacklist constraints.
Generates constraints that restrict slot properties to allowed values (whitelist) or exclude forbidden values (blacklist).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
slot_name
|
str
|
Name of slot to constrain. |
required |
property_path
|
str
|
Dot-separated path to property (e.g., "lemma", "features.number"). |
required |
allowed_values
|
set | None
|
Whitelist of allowed values (mutually exclusive with forbidden_values). |
required |
forbidden_values
|
set | None
|
Blacklist of forbidden values. |
required |
description
|
str | None
|
Custom description. |
required |
Examples:
Whitelist constraint:
>>> builder = SetMembershipConstraintBuilder()
>>> constraint = builder.build(
... slot_name="verb",
... property_path="lemma",
... allowed_values={"walk", "run", "jump"},
... description="Motion verbs only"
... )
>>> "verb.lemma in allowed_values" in constraint.expression
True
Blacklist constraint:
>>> constraint = builder.build(
... slot_name="verb",
... property_path="lemma",
... forbidden_values={"be", "have"},
... description="Exclude copula and auxiliary"
... )
>>> "verb.lemma not in forbidden_values" in constraint.expression
True
build(*, slot_name: str, property_path: str, allowed_values: set[str] | None = None, forbidden_values: set[str] | None = None, description: str | None = None) -> Constraint
¶
Build set membership constraint.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
slot_name
|
str
|
Slot to constrain. |
required |
property_path
|
str
|
Property path within slot. |
required |
allowed_values
|
set | None
|
Whitelist of allowed values. |
None
|
forbidden_values
|
set | None
|
Blacklist of forbidden values. |
None
|
description
|
str | None
|
Constraint description. |
None
|
Returns:
| Type | Description |
|---|---|
Constraint
|
Set membership constraint. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If neither or both of allowed_values/forbidden_values provided. |
Resource Loading¶
loaders
¶
Lexicon loading utilities for various data formats.
This module provides class methods for loading Lexicon objects from various data formats (CSV, TSV) with flexible column mapping.
from_csv(path: str | Path, name: str, *, language_code: LanguageCode, column_mapping: dict[str, str] | None = None, feature_columns: list[str] | None = None, pos: str | None = None, description: str | None = None, **csv_kwargs: Any) -> Lexicon
¶
Load lexicon from CSV file with flexible column mapping.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to the CSV file. |
required |
name
|
str
|
Name for the lexicon. |
required |
language_code
|
LanguageCode
|
ISO 639-3 language code for all items. |
required |
column_mapping
|
dict[str, str] | None
|
Mapping from CSV column names to feature names. Example: {"word": "lemma"} |
None
|
feature_columns
|
list[str] | None
|
CSV column names to include in features dict. Example: ["number", "tense", "countability", "semantic_class"] |
None
|
pos
|
str | None
|
Part-of-speech tag to assign to all items (e.g., "NOUN", "VERB"). Will be added to features dict as "pos". |
None
|
description
|
str | None
|
Optional description of the lexicon. |
None
|
**csv_kwargs
|
Any
|
Additional keyword arguments passed to pandas.read_csv(). |
{}
|
Returns:
| Type | Description |
|---|---|
Lexicon
|
New lexicon loaded from CSV. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If required "lemma" column/mapping is missing. |
FileNotFoundError
|
If CSV file does not exist. |
Examples:
from_tsv(path: str | Path, name: str, *, language_code: LanguageCode, column_mapping: dict[str, str] | None = None, feature_columns: list[str] | None = None, pos: str | None = None, description: str | None = None, **tsv_kwargs: Any) -> Lexicon
¶
Load lexicon from TSV file with flexible column mapping.
This is a convenience wrapper around from_csv() that sets sep="\t".
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to the TSV file. |
required |
name
|
str
|
Name for the lexicon. |
required |
language_code
|
LanguageCode
|
ISO 639-3 language code for all items. |
required |
column_mapping
|
dict[str, str] | None
|
Mapping from TSV column names to feature names. |
None
|
feature_columns
|
list[str] | None
|
TSV column names to include in features dict. |
None
|
pos
|
str | None
|
Part-of-speech tag to assign to all items. |
None
|
description
|
str | None
|
Optional description of the lexicon. |
None
|
**tsv_kwargs
|
Any
|
Additional keyword arguments passed to pandas.read_csv(). |
{}
|
Returns:
| Type | Description |
|---|---|
Lexicon
|
New lexicon loaded from TSV. |
Examples:
template_generation
¶
Abstract base class for mapping external frame inventories to Templates.
This module provides language-agnostic base classes for generating Template objects from external linguistic frame inventories (e.g., VerbNet, FrameNet, PropBank, valency lexicons).
FrameToTemplateMapper
¶
Bases: ABC
Abstract base class for mapping frame inventories to Templates.
This class provides a framework for generating Template objects from external linguistic frame data. Subclasses implement language- and resource-specific mapping logic.
Examples:
Implementing a VerbNet mapper:
>>> class VerbNetMapper(FrameToTemplateMapper):
... def generate_from_frame(self, verb_lemma, frame_data):
... slots = self.map_frame_to_slots(frame_data)
... constraints = self.generate_constraints(frame_data, slots)
... return Template(
... name=f"{verb_lemma}_{frame_data['id']}",
... template_string=frame_data['template_string'],
... slots=slots,
... constraints=constraints
... )
...
... def map_frame_to_slots(self, frame_data):
... # Extract slots from VerbNet syntax
... return {}
...
... def generate_constraints(self, frame_data, slots):
... # Generate constraints from VerbNet restrictions
... return []
generate_from_frame(*args: Any, **kwargs: Any) -> Template | list[Template]
abstractmethod
¶
Generate Template(s) from a frame specification.
This is the main entry point for template generation. Subclasses implement the specific logic for their frame inventory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*args
|
Any
|
Positional arguments (frame data, identifiers, etc.). |
()
|
**kwargs
|
Any
|
Keyword arguments (configuration options, etc.). |
{}
|
Returns:
| Type | Description |
|---|---|
Template | list[Template]
|
Generated template(s). May return multiple templates if the frame has multiple realizations (e.g., different complementizer types, alternations). |
Examples:
VerbNet implementation:
map_frame_to_slots(frame_data: Any) -> dict[str, Slot]
abstractmethod
¶
Map frame elements to Template slots.
Converts frame-specific element descriptions into Slot objects with appropriate constraints.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frame_data
|
Any
|
Frame specification from the external inventory. Type depends on the specific resource (dict, object, etc.). |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Slot]
|
Slots keyed by slot name. |
Examples:
Mapping VerbNet syntax to slots:
generate_constraints(frame_data: Any, slots: dict[str, Slot]) -> list[Constraint]
abstractmethod
¶
Generate multi-slot constraints from frame specifications.
Converts frame-specific restrictions into DSL Constraint objects that enforce relationships between slots.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frame_data
|
Any
|
Frame specification from the external inventory. |
required |
slots
|
dict[str, Slot]
|
Slots that have been created for this frame. |
required |
Returns:
| Type | Description |
|---|---|
list[Constraint]
|
Multi-slot constraints for the template. |
Examples:
Generating constraints from VerbNet restrictions:
create_template_name(*identifiers: str, separator: str = '_') -> str
¶
Create a unique template name from identifiers.
Utility method for generating consistent template names. Sanitizes identifiers by replacing spaces, dots, and hyphens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*identifiers
|
str
|
Components to include in the name (e.g., verb, class, frame). |
()
|
separator
|
str
|
Separator between components (default: "_"). |
'_'
|
Returns:
| Type | Description |
|---|---|
str
|
Sanitized template name. |
Examples:
create_template_metadata(frame_data: dict[str, Any], **additional_metadata: Any) -> dict[str, Any]
¶
Create metadata dictionary for template.
Utility method for extracting and organizing frame metadata. Subclasses can override to add resource-specific metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frame_data
|
dict[str, Any]
|
Frame specification from the external inventory. |
required |
**additional_metadata
|
Any
|
Additional metadata to include. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Metadata dictionary for Template.metadata field. |
Examples:
MultiFrameMapper
¶
Bases: FrameToTemplateMapper
Mapper that generates multiple template variants from a single frame.
Some frame specifications support multiple realizations (e.g., different complementizer types, voice alternations). This class provides a framework for generating all variants.
Examples:
>>> class ClausalMapper(MultiFrameMapper):
... def get_frame_variants(self, frame_data):
... # Return list of variant specifications
... return [
... {"comp": "that", "mood": "declarative"},
... {"comp": "whether", "mood": "interrogative"},
... ]
...
... def generate_from_frame(self, verb, frame_data):
... variants = self.get_frame_variants(frame_data)
... return [self._generate_variant(verb, v) for v in variants]
...
... def map_frame_to_slots(self, frame_data):
... return {}
...
... def generate_constraints(self, frame_data, slots):
... return []
get_frame_variants(frame_data: Any) -> list[Any]
abstractmethod
¶
Extract all variants from frame specification.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frame_data
|
Any
|
Frame specification from the external inventory. |
required |
Returns:
| Type | Description |
|---|---|
list[Any]
|
List of variant specifications, each representing one possible realization of the frame. |
Examples:
generate_from_frame(*args: Any, **kwargs: Any) -> list[Template]
¶
Generate templates for all frame variants.
Default implementation calls get_frame_variants() and generates a template for each variant. Subclasses can override for custom logic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*args
|
Any
|
Positional arguments passed to variant generation. |
()
|
**kwargs
|
Any
|
Keyword arguments passed to variant generation. |
{}
|
Returns:
| Type | Description |
|---|---|
list[Template]
|
Templates for all variants. |
Classification¶
classification
¶
Linguistic classification models for lexical items and templates.
This module provides models for grouping lexical items and templates by linguistic properties. These classifications enable cross-linguistic analysis and alignment, supporting both monolingual and multilingual classification.
LexicalItemClass and TemplateClass are NOT subclasses of Lexicon and TemplateCollection. This is a deliberate architectural choice: - Lexicon/TemplateCollection: Operational resource management for experiments - LexicalItemClass/TemplateClass: Analytical linguistic classification
Primary use cases: - Cross-linguistic analysis and comparison - Aligning resources across languages for meta-analysis - Combining experimental results by linguistic class - Linguistic typology studies
LexicalItemClass
¶
Bases: BeadBaseModel
Groups lexical items that share linguistic properties.
LexicalItemClass represents a linguistic classification that can span a single language (e.g., "all causative verbs in English") or multiple languages (e.g., "all causative verbs across English, Korean, Zulu").
Primary use cases: - Cross-linguistic analysis and comparison - Aligning lexical items across languages for meta-analysis - Combining experimental results by lexical class - Linguistic typology studies
NOT typically used for: - Experiment generation (use Lexicon for that) - Resource storage (use Lexicon for that)
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Name of this lexical item class. |
description |
str | None
|
Description of the classification (e.g., "Causative verbs"). |
property_name |
str
|
The linguistic property that defines this class (e.g., "causative", "transitive", "stative"). |
property_value |
Any | None
|
Optional specific value for the property (e.g., True, "agentive"). |
items |
dict[UUID, LexicalItem]
|
Dictionary of lexical items in this class, indexed by UUID. |
tags |
list[str]
|
Tags for organization and search. |
class_metadata |
dict[str, Any]
|
Additional metadata about this classification. |
Examples:
>>> # Monolingual classification
>>> causative_en = LexicalItemClass(
... name="causative_verbs_en",
... description="Causative verbs in English",
... property_name="causative",
... property_value=True
... )
>>> # Multilingual cross-linguistic classification
>>> causatives_multi = LexicalItemClass(
... name="causative_verbs_crossling",
... description="Causative verbs across EN, KO, ZU",
... property_name="causative",
... property_value=True
... )
>>> english_break = LexicalItem(lemma="break", language_code="en")
>>> korean_kkakta = LexicalItem(lemma="kkakta", language_code="ko")
>>> causatives_multi.add(english_break)
>>> causatives_multi.add(korean_kkakta)
>>> len(causatives_multi)
2
>>> causatives_multi.is_multilingual()
True
>>> for lang in causatives_multi.languages():
... items = causatives_multi.get_items_by_language(lang)
... print(f"{lang}: {len(items)} causative verbs")
en: 1 causative verbs
ko: 1 causative verbs
validate_name(v: str) -> str
classmethod
¶
Validate that name is non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
The name to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The validated name. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If name is empty or contains only whitespace. |
validate_property_name(v: str) -> str
classmethod
¶
Validate that property_name is non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
The property name to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The validated property name. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If property_name is empty or contains only whitespace. |
languages() -> set[str]
¶
Return set of language codes present in this class.
Items without language_code are excluded from the result.
Returns:
| Type | Description |
|---|---|
set[str]
|
Set of language codes (lowercase) found in this class. |
Examples:
get_items_by_language(language_code: str) -> list[LexicalItem]
¶
Filter items by language code.
Accepts both ISO 639-1 (2-letter) and ISO 639-3 (3-letter) codes. The query code is normalized to ISO 639-3 for comparison.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
language_code
|
str
|
Language code to filter by (e.g., "en", "eng", "ko", "kor"). |
required |
Returns:
| Type | Description |
|---|---|
list[LexicalItem]
|
List of items matching the language code. |
Examples:
is_monolingual() -> bool
¶
Check if class contains only one language.
Returns:
| Type | Description |
|---|---|
bool
|
True if class contains items from only one language (or no items). |
Examples:
is_multilingual() -> bool
¶
Check if class contains multiple languages.
Returns:
| Type | Description |
|---|---|
bool
|
True if class contains items from more than one language. |
Examples:
add(item: LexicalItem) -> None
¶
Add a lexical item to the class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
LexicalItem
|
The item to add. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If item with same ID already exists. |
Examples:
remove(item_id: UUID) -> LexicalItem
¶
Remove and return an item by ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item_id
|
UUID
|
The ID of the item to remove. |
required |
Returns:
| Type | Description |
|---|---|
LexicalItem
|
The removed item. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If item ID not found. |
Examples:
get(item_id: UUID) -> LexicalItem | None
¶
Get an item by ID, or None if not found.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item_id
|
UUID
|
The ID of the item to get. |
required |
Returns:
| Type | Description |
|---|---|
LexicalItem | None
|
The item if found, None otherwise. |
Examples:
__len__() -> int
¶
__contains__(item_id: UUID) -> bool
¶
Check if item ID is in class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item_id
|
UUID
|
The item ID to check. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if item ID exists in class. |
Examples:
__iter__() -> Iterator[LexicalItem]
¶
Iterate over items in class.
Returns:
| Type | Description |
|---|---|
Iterator[LexicalItem]
|
Iterator over lexical items. |
Examples:
TemplateClass
¶
Bases: BeadBaseModel
Groups templates that share linguistic properties.
TemplateClass represents a linguistic classification that can span a single language (e.g., "transitive templates in English that vary only in adjuncts") or multiple languages (e.g., "causative-inchoative alternation templates across languages").
Primary use cases: - Cross-linguistic analysis and comparison - Identifying systematic variation patterns (e.g., adjunct variation) - Aligning templates across languages for meta-analysis - Combining experimental results by template class - Linguistic typology studies
NOT typically used for: - Experiment generation (use TemplateCollection for that) - Operational template storage (use TemplateCollection for that)
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Name of this template class. |
description |
str | None
|
Description of the classification (e.g., "Transitive with adjunct variation"). |
property_name |
str
|
The linguistic property that defines this class (e.g., "transitive", "causative_inchoative", "wh_question"). |
property_value |
Any | None
|
Optional specific value for the property. |
templates |
dict[UUID, Template]
|
Dictionary of templates in this class, indexed by UUID. |
tags |
list[str]
|
Tags for organization and search. |
class_metadata |
dict[str, Any]
|
Additional metadata about this classification. |
Examples:
>>> from bead.resources.structures import Slot
>>> # Monolingual classification
>>> transitive_en = TemplateClass(
... name="transitive_templates_en",
... description="Transitive templates in English",
... property_name="transitive",
... property_value=True
... )
>>> # Multilingual cross-linguistic classification
>>> transitives_multi = TemplateClass(
... name="transitive_templates_crossling",
... description="Transitive templates across languages",
... property_name="transitive",
... property_value=True
... )
>>> en_template = Template(
... name="svo",
... template_string="{subject} {verb} {object}.",
... slots={"subject": Slot(name="subject"), "verb": Slot(name="verb"),
... "object": Slot(name="object")},
... language_code="en"
... )
>>> transitives_multi.add(en_template)
>>> len(transitives_multi)
1
validate_name(v: str) -> str
classmethod
¶
Validate that name is non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
The name to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The validated name. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If name is empty or contains only whitespace. |
validate_property_name(v: str) -> str
classmethod
¶
Validate that property_name is non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
The property name to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The validated property name. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If property_name is empty or contains only whitespace. |
languages() -> set[str]
¶
Return set of language codes present in this class.
Templates without language_code are excluded from the result.
Returns:
| Type | Description |
|---|---|
set[str]
|
Set of language codes (lowercase) found in this class. |
Examples:
>>> from bead.resources.structures import Slot
>>> cls = TemplateClass(name="test", property_name="transitive")
>>> t1 = Template(
... name="en_svo",
... template_string="{s} {v} {o}.",
... slots={"s": Slot(name="s"), "v": Slot(name="v"), "o": Slot(name="o")},
... language_code="en"
... )
>>> cls.add(t1)
>>> cls.languages()
{'en'}
get_templates_by_language(language_code: str) -> list[Template]
¶
Filter templates by language code.
Accepts both ISO 639-1 (2-letter) and ISO 639-3 (3-letter) codes. The query code is normalized to ISO 639-3 for comparison.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
language_code
|
str
|
Language code to filter by (e.g., "en", "eng", "ko", "kor"). |
required |
Returns:
| Type | Description |
|---|---|
list[Template]
|
List of templates matching the language code. |
Examples:
>>> from bead.resources.structures import Slot
>>> cls = TemplateClass(name="test", property_name="transitive")
>>> t1 = Template(
... name="en_svo",
... template_string="{s} {v} {o}.",
... slots={"s": Slot(name="s"), "v": Slot(name="v"), "o": Slot(name="o")},
... language_code="en"
... )
>>> cls.add(t1)
>>> en_templates = cls.get_templates_by_language("en")
>>> len(en_templates)
1
>>> en_templates[0].name
'en_svo'
is_monolingual() -> bool
¶
Check if class contains only one language.
Returns:
| Type | Description |
|---|---|
bool
|
True if class contains templates from only one language (or no templates). |
Examples:
>>> from bead.resources.structures import Slot
>>> cls = TemplateClass(name="test", property_name="transitive")
>>> t1 = Template(
... name="en_svo",
... template_string="{s} {v} {o}.",
... slots={"s": Slot(name="s"), "v": Slot(name="v"), "o": Slot(name="o")},
... language_code="en"
... )
>>> cls.add(t1)
>>> cls.is_monolingual()
True
is_multilingual() -> bool
¶
Check if class contains multiple languages.
Returns:
| Type | Description |
|---|---|
bool
|
True if class contains templates from more than one language. |
Examples:
>>> from bead.resources.structures import Slot
>>> cls = TemplateClass(name="test", property_name="transitive")
>>> t1 = Template(
... name="en_svo",
... template_string="{s} {v} {o}.",
... slots={"s": Slot(name="s"), "v": Slot(name="v"), "o": Slot(name="o")},
... language_code="en"
... )
>>> cls.add(t1)
>>> cls.is_multilingual()
False
add(template: Template) -> None
¶
Add a template to the class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
template
|
Template
|
The template to add. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If template with same ID already exists. |
Examples:
remove(template_id: UUID) -> Template
¶
Remove and return a template by ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
template_id
|
UUID
|
The ID of the template to remove. |
required |
Returns:
| Type | Description |
|---|---|
Template
|
The removed template. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If template ID not found. |
Examples:
>>> from bead.resources.structures import Slot
>>> cls = TemplateClass(name="test", property_name="transitive")
>>> t1 = Template(
... name="svo",
... template_string="{s} {v} {o}.",
... slots={"s": Slot(name="s"), "v": Slot(name="v"), "o": Slot(name="o")}
... )
>>> cls.add(t1)
>>> removed = cls.remove(t1.id)
>>> removed.name
'svo'
>>> len(cls)
0
get(template_id: UUID) -> Template | None
¶
Get a template by ID, or None if not found.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
template_id
|
UUID
|
The ID of the template to get. |
required |
Returns:
| Type | Description |
|---|---|
Template | None
|
The template if found, None otherwise. |
Examples:
>>> from bead.resources.structures import Slot
>>> cls = TemplateClass(name="test", property_name="transitive")
>>> t1 = Template(
... name="svo",
... template_string="{s} {v} {o}.",
... slots={"s": Slot(name="s"), "v": Slot(name="v"), "o": Slot(name="o")}
... )
>>> cls.add(t1)
>>> retrieved = cls.get(t1.id)
>>> retrieved.name
'svo'
>>> from uuid import uuid4
>>> cls.get(uuid4()) is None
True
__len__() -> int
¶
__contains__(template_id: UUID) -> bool
¶
Check if template ID is in class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
template_id
|
UUID
|
The template ID to check. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if template ID exists in class. |
Examples:
>>> from bead.resources.structures import Slot
>>> cls = TemplateClass(name="test", property_name="transitive")
>>> t1 = Template(
... name="svo",
... template_string="{s} {v} {o}.",
... slots={"s": Slot(name="s"), "v": Slot(name="v"), "o": Slot(name="o")}
... )
>>> cls.add(t1)
>>> t1.id in cls
True
__iter__() -> Iterator[Template]
¶
Iterate over templates in class.
Returns:
| Type | Description |
|---|---|
Iterator[Template]
|
Iterator over templates. |
Examples:
>>> from bead.resources.structures import Slot
>>> cls = TemplateClass(name="test", property_name="transitive")
>>> t1 = Template(
... name="svo1",
... template_string="{s} {v} {o}.",
... slots={"s": Slot(name="s"), "v": Slot(name="v"), "o": Slot(name="o")}
... )
>>> t2 = Template(
... name="svo2",
... template_string="{s} {v} {o}.",
... slots={"s": Slot(name="s"), "v": Slot(name="v"), "o": Slot(name="o")}
... )
>>> cls.add(t1)
>>> cls.add(t2)
>>> [t.name for t in cls]
['svo1', 'svo2']
Resource Adapters¶
base
¶
Abstract base class for external resource adapters.
This module defines the interface that all resource adapters must implement to fetch lexical items from external linguistic databases.
ResourceAdapter
¶
Bases: ABC
Abstract base class for external resource adapters.
Resource adapters fetch lexical items from external linguistic databases and convert them to the bead LexicalItem format. All adapters must implement language_code filtering to support multi-language workflows.
Subclasses must implement: - fetch_items(): Retrieve items from the external resource - is_available(): Check if the external resource is accessible
Examples:
>>> class MyAdapter(ResourceAdapter):
... def fetch_items(self, query=None, language_code=None, **kwargs):
... # Fetch from external resource
... return [LexicalItem(lemma="walk", pos="VERB", language_code="en")]
... def is_available(self):
... return True
>>> adapter = MyAdapter()
>>> items = adapter.fetch_items(query="walk", language_code="en")
>>> len(items) > 0
True
fetch_items(query: str | None = None, language_code: LanguageCode = None, **kwargs: Any) -> list[LexicalItem]
abstractmethod
¶
Fetch lexical items from external resource.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str | None
|
Query string in adapter-specific format (e.g., lemma, predicate name, class identifier). If None, behavior is adapter-specific (may return all items, raise error, or use default query). |
None
|
language_code
|
LanguageCode
|
ISO 639-1 (2-letter) or ISO 639-3 (3-letter) language code to filter results. Examples: "en", "eng", "ko", "kor". If None, returns items for all available languages. |
None
|
**kwargs
|
Any
|
Additional adapter-specific parameters (e.g., pos="VERB", resource="verbnet", include_features=True). |
{}
|
Returns:
| Type | Description |
|---|---|
list[LexicalItem]
|
Lexical items fetched from the external resource. Each item should have language_code set if known. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If query is invalid or required parameters are missing. |
RuntimeError
|
If the external resource is unavailable or the request fails. |
Examples:
is_available() -> bool
abstractmethod
¶
Check if the external resource is available.
This method should verify that the external resource can be accessed, whether via installed packages, accessible data files, or network APIs.
Returns:
| Type | Description |
|---|---|
bool
|
True if the resource can be accessed, False otherwise. |
Examples:
glazing
¶
Adapter for glazing package (VerbNet, PropBank, FrameNet).
This module provides an adapter to fetch lexical items from VerbNet, PropBank, and FrameNet via the glazing package using the proper loader classes.
GlazingAdapter
¶
Bases: ResourceAdapter
Adapter for glazing package (VerbNet, PropBank, FrameNet).
This adapter fetches verb frame information from VerbNet, PropBank, or FrameNet and converts it to LexicalItem format. Frame information is stored in the attributes field.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
resource
|
Literal['verbnet', 'propbank', 'framenet']
|
Which glazing resource to use. |
'verbnet'
|
cache
|
AdapterCache | None
|
Optional cache instance. If None, no caching is performed. |
None
|
Examples:
>>> adapter = GlazingAdapter(resource="verbnet")
>>> items = adapter.fetch_items(query="break", language_code="en")
>>> all(item.language_code == "en" for item in items)
True
fetch_items(query: str | None = None, language_code: LanguageCode = None, **kwargs: Any) -> list[LexicalItem]
¶
Fetch items from glazing resource.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str | None
|
Lemma or predicate to query (e.g., "break", "run"). If None, fetches ALL items from the resource. |
None
|
language_code
|
LanguageCode
|
Language code filter. Glazing resources are primarily English, so language_code="en" is typical. Other languages may not be supported. |
None
|
**kwargs
|
Any
|
Additional parameters: - include_frames (bool): Include detailed frame information (syntax, examples, descriptions). Default: False. |
{}
|
Returns:
| Type | Description |
|---|---|
list[LexicalItem]
|
Lexical items with frame information in attributes. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If glazing resource access fails. |
Examples:
>>> # Query specific verb
>>> adapter = GlazingAdapter(resource="verbnet")
>>> items = adapter.fetch_items(query="break", language_code="en")
>>> len(items) > 0
True
>>> # Fetch all items from resource
>>> all_items = adapter.fetch_items(query=None, language_code="en")
>>> len(all_items) > 100
True
>>> # Include detailed frame information
>>> items = adapter.fetch_items(
... query="break", language_code="en", include_frames=True
... )
>>> "frames" in items[0].attributes
True
is_available() -> bool
¶
unimorph
¶
Adapter for UniMorph morphological paradigms.
This module provides an adapter to fetch morphological paradigms from UniMorph data and convert them to LexicalItem format with morphological features.
UniMorphAdapter
¶
Bases: ResourceAdapter
Adapter for UniMorph morphological paradigms.
This adapter fetches morphological paradigms from UniMorph and converts them to LexicalItem format. Morphological features are stored in the features field using UniMorph feature schema.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache
|
AdapterCache | None
|
Optional cache instance. If None, no caching is performed. |
None
|
Examples:
>>> adapter = UniMorphAdapter()
>>> items = adapter.fetch_items(query="walk", language_code="en")
>>> all(item.language_code == "en" for item in items)
True
>>> all("tense" in item.features for item in items if item.features)
True
__init__(cache: AdapterCache | None = None) -> None
¶
Initialize UniMorph adapter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache
|
AdapterCache | None
|
Optional cache instance. |
None
|
fetch_items(query: str | None = None, language_code: LanguageCode = None, **kwargs: Any) -> list[LexicalItem]
¶
Fetch morphological paradigms from UniMorph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str | None
|
Lemma to query (e.g., "walk", "먹다", "hamba"). |
None
|
language_code
|
LanguageCode
|
Required language code (e.g., "en", "ko", "zu"). UniMorph is organized by language, so this parameter is essential. |
None
|
**kwargs
|
Any
|
Additional parameters (e.g., pos="VERB"). |
{}
|
Returns:
| Type | Description |
|---|---|
list[LexicalItem]
|
Lexical items representing inflected forms with morphological features in the features field. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If language_code is None (required for UniMorph). |
RuntimeError
|
If UniMorph access fails. |
Examples:
is_available() -> bool
¶
cache
¶
Caching for adapter fetch results.
This module provides an in-memory cache to avoid redundant fetches from external resources when the same query is repeated.
AdapterCache
¶
In-memory cache for adapter fetch results.
The cache stores results keyed by a hash of query parameters. This avoids redundant fetches when the same query is made multiple times.
Examples:
>>> cache = AdapterCache()
>>> items = [LexicalItem(lemma="walk", pos="VERB")]
>>> key = cache.make_key("glazing", query="walk", language_code="en")
>>> cache.set(key, items)
>>> cached = cache.get(key)
>>> cached == items
True
get(key: str) -> list[LexicalItem] | None
¶
Get cached result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
Cache key generated by make_key(). |
required |
Returns:
| Type | Description |
|---|---|
list[LexicalItem] | None
|
Cached items if key exists, None otherwise. |
Examples:
set(key: str, items: list[LexicalItem]) -> None
¶
Cache result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
Cache key generated by make_key(). |
required |
items
|
list[LexicalItem]
|
Items to cache. |
required |
Examples:
clear() -> None
¶
make_key(adapter_name: str, query: str | None = None, **kwargs: Any) -> str
¶
Generate cache key from query parameters.
Create a deterministic hash key from adapter name, query, and additional parameters. Same inputs always produce same key.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adapter_name
|
str
|
Name of the adapter (e.g., "glazing", "unimorph"). |
required |
query
|
str | None
|
Query string. |
None
|
**kwargs
|
Any
|
Additional query parameters (e.g., language_code, pos). |
{}
|
Returns:
| Type | Description |
|---|---|
str
|
Cache key (hexadecimal hash string). |
Examples:
registry
¶
Registry for managing resource adapters.
This module provides a registry for discovering and instantiating adapters by name.
AdapterRegistry
¶
Registry for managing resource adapters.
The registry allows adapters to be registered by name and retrieved with custom initialization parameters.
Examples:
>>> from bead.resources.adapters.glazing import GlazingAdapter
>>> registry = AdapterRegistry()
>>> registry.register("glazing", GlazingAdapter)
>>> adapter = registry.get("glazing", resource="verbnet")
>>> isinstance(adapter, GlazingAdapter)
True
register(name: str, adapter_class: type[ResourceAdapter]) -> None
¶
Register an adapter class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Adapter name (e.g., "glazing", "unimorph"). |
required |
adapter_class
|
type[ResourceAdapter]
|
Adapter class (not instance) that subclasses ResourceAdapter. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If name is empty or adapter_class is not a ResourceAdapter subclass. |
Examples:
get(name: str, **kwargs: Any) -> ResourceAdapter
¶
Get adapter instance by name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Adapter name (must be registered). |
required |
**kwargs
|
Any
|
Arguments passed to adapter constructor. |
{}
|
Returns:
| Type | Description |
|---|---|
ResourceAdapter
|
Adapter instance. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If adapter name is not registered. |
Examples:
list_available() -> list[str]
¶
List names of available adapters.
Returns:
| Type | Description |
|---|---|
list[str]
|
Sorted list of registered adapter names. |
Examples: