bead.transforms

Value-level text transforms (str -> str, parameterised by a TransformContext) used when rendering template slots and item prompts. Transforms are registered by name in a TransformRegistry; any callable conforming to the SpanTextTransform protocol can be registered.

Core Abstractions

base

Core abstractions for the span text transform system.

Defines the :class:SpanTextTransform protocol, :class:TransformContext for passing metadata to transforms, :class:TransformPipeline for composing transforms, and :class:TransformRegistry for name-based lookup.

The transforms operate at the value level (str -> str parameterised by a TransformContext). Use dx.Iso or dx.Lens directly when the transformation crosses schema boundaries.

TransformContext

Bases: BeadBaseModel

Metadata available to transforms at resolution time.

Attributes:

Name Type Description
language_code str | None

ISO 639 code (e.g. "eng", "en").

lemma str | None

Lemma of the span head, if known.

pos str | None

Universal POS tag of the span head (e.g. "VERB").

head_index int | None

Token index of the syntactic head within the span.

tokens tuple[str, ...]

Individual tokens of the span text. Empty when unknown.

metadata dict[str, JsonValue]

Arbitrary extra metadata.

SpanTextTransform

Bases: Protocol

Protocol for a single text transform.

Any callable (str, TransformContext) -> str satisfies this protocol. Implementations may ignore the context when the transform is purely textual (e.g. lowercasing).

__call__(text: str, context: TransformContext) -> str

Apply the transform to text.

TransformPipeline

An ordered chain of transforms applied left-to-right.

Examples:

>>> from bead.transforms.text import LowerTransform, CapitalizeTransform
>>> ctx = TransformContext()
>>> pipe = TransformPipeline([LowerTransform(), CapitalizeTransform()])
>>> pipe("HELLO WORLD", ctx)
'Hello world'

__call__(text: str, context: TransformContext) -> str

Apply each transform in sequence.

__len__() -> int

Return the number of transforms in the pipeline.

__repr__() -> str

Return a debug-friendly representation of the pipeline.

append(transform: SpanTextTransform) -> None

Append a transform to the end of the pipeline.

prepend(transform: SpanTextTransform) -> None

Insert a transform at the beginning of the pipeline.

TransformRegistry

Name-to-transform mapping with pipeline construction.

Transforms are registered under short string names (e.g. "gerund", "lower") and looked up when resolving [[label|name1|name2]] prompt references.

Examples:

>>> from bead.transforms.text import LowerTransform
>>> reg = TransformRegistry()
>>> reg.register("lower", LowerTransform())
>>> t = reg.get("lower")
>>> t("HELLO", TransformContext())
'hello'

register(name: str, transform: SpanTextTransform | Callable[[str, TransformContext], str]) -> None

Register transform under name (case-insensitive).

get(name: str) -> SpanTextTransform

Return the transform registered under name.

Raises:

Type Description
KeyError

If no transform with that name exists.

resolve_pipeline(names: list[str]) -> TransformPipeline

Return a pipeline applying the named transforms left-to-right.

available() -> list[str]

Return the registered transform names, sorted.

__contains__(name: str) -> bool

Return whether name is registered.

__len__() -> int

Return the number of registered transforms.

__repr__() -> str

Return a debug-friendly representation of the registry.

Text Transforms

Pure surface-string transforms. In addition to case transforms (lower, upper, capitalize, title), this module provides MarkdownStripTransform and RedditCleanupTransform for cleaning web/markdown text into plain prose, and split_sentences for sentence segmentation (parser-backed when a spaCy/Stanza config is given, with a regular-expression fallback otherwise).

text

Pure text transforms that require no external resources.

These transforms operate on the surface string and ignore the :class:TransformContext. They are always safe to register regardless of language.

LowerTransform

Convert text to lowercase.

Examples:

>>> LowerTransform()("Hello World", TransformContext())
'hello world'

__call__(text: str, context: TransformContext) -> str

Apply str.lower to text.

UpperTransform

Convert text to uppercase.

Examples:

>>> UpperTransform()("Hello World", TransformContext())
'HELLO WORLD'

__call__(text: str, context: TransformContext) -> str

Apply str.upper to text.

CapitalizeTransform

Capitalize the first character, lowercase the rest.

Examples:

>>> CapitalizeTransform()("hELLO WORLD", TransformContext())
'Hello world'

__call__(text: str, context: TransformContext) -> str

Apply str.capitalize to text.

TitleTransform

Title-case each word.

Examples:

>>> TitleTransform()("hello world", TransformContext())
'Hello World'

__call__(text: str, context: TransformContext) -> str

Apply str.title to text.

MarkdownStripTransform

Strip common Markdown markup, keeping the human-readable text.

Removes link/image targets (keeping the visible text), emphasis markers, inline code backticks, heading markers, and blockquote markers.

Examples:

>>> MarkdownStripTransform()("**bold** and [a link](http://x)", TransformContext())
'bold and a link'

__call__(text: str, context: TransformContext) -> str

Strip Markdown markup from text.

RedditCleanupTransform

Clean Reddit comment text into plain prose.

Unescapes HTML entities, strips Markdown (reusing :class:MarkdownStripTransform), removes URLs and [deleted]/ [removed] markers, and collapses runs of intra-line whitespace.

Examples:

>>> RedditCleanupTransform()("see [here](http://x) & more", TransformContext())
'see here & more'

__call__(text: str, context: TransformContext) -> str

Clean Reddit markup from text.

split_sentences(text: str, *, tokenizer_config: TokenizerConfig | None = None) -> tuple[str, ...]

Split text into sentences.

When tokenizer_config selects a spacy or stanza backend, sentence boundaries come from that parser's segmenter. Otherwise a regular-expression fallback splits on sentence-final punctuation followed by whitespace.

Parameters:

Name Type Description Default
text str

Text to split.

required
tokenizer_config TokenizerConfig | None

Backend selector. None or the whitespace backend uses the regex fallback.

None

Returns:

Type Description
tuple[str, ...]

The sentences, with surrounding whitespace stripped (empties dropped).

Morphological Transforms

morphology

Morphological transforms backed by UniMorph paradigms.

Each :class:MorphologicalTransform targets a specific inflectional feature bundle (e.g. present participle) and applies the inflection to the head token of the span text. Non-head tokens are preserved as-is, producing natural multi-word results like "running to the store" from a span "run to the store" with a gerund transform.

The system is language-agnostic at the protocol level: the same :class:MorphologicalTransform class works for any language supported by UniMorph — the language is selected via language_code at construction time.

InflectionSpec dataclass

Specification for a target inflectional form.

Attributes:

Name Type Description
name str

Human-readable name (e.g. "gerund").

predicate FeaturePredicate

A callable that returns True for a UniMorph feature dict matching the desired form.

description str

Short description of the inflection.

MorphologicalTransform

Apply a morphological inflection to the head token of span text.

Given a span like "run to the store" and an inflection spec for the present participle, this transform produces "running to the store" by inflecting only the head token (defaulting to the first token when context.head_index is not set).

Parameters:

Name Type Description Default
inflection_spec InflectionSpec

Specifies which inflected form to target.

required
language_code str

ISO 639 language code for UniMorph lookup.

required
lemmatize bool

If True and context.lemma is None, attempt to find the paradigm by trying the head token directly as a lemma. Defaults to True.

True

Examples:

>>> spec = InflectionSpec(
...     name="gerund",
...     predicate=lambda f: (
...         f.get("verb_form") == "V.PTCP" and f.get("tense") == "PRS"
...     ),
... )
>>> t = MorphologicalTransform(spec, language_code="eng")
>>> ctx = TransformContext(
...     lemma="run", head_index=0, tokens=["run", "to", "the", "store"]
... )
>>> t("run to the store", ctx)
'running to the store'

inflection_spec: InflectionSpec property

The inflection specification for this transform.

__call__(text: str, context: TransformContext) -> str

Apply the inflection to the span text.

Parameters:

Name Type Description Default
text str

The span text to transform.

required
context TransformContext

Metadata about the span (lemma, head_index, etc.).

required

Returns:

Type Description
str

Text with the head token inflected. Falls back to the original text if the inflection cannot be resolved.

__repr__() -> str

Return a debug representation.

register_morphological_transforms(registry: TransformRegistry, language_code: str) -> None

Register standard morphological transforms for a language.

Adds gerund, past_tense, past_participle, present_3sg, and infinitive transforms backed by UniMorph.

Parameters:

Name Type Description Default
registry TransformRegistry

Registry to populate.

required
language_code str

ISO 639 language code.

required