bead.transforms¶

Value-level text transforms (str -> str, parameterised by a TransformContext) used when rendering template slots and item prompts. Transforms are registered by name in a TransformRegistry; any callable conforming to the SpanTextTransform protocol can be registered.

Core Abstractions¶

`base` ¶

Core abstractions for the span text transform system.

Defines the :class:SpanTextTransform protocol, :class:TransformContext for passing metadata to transforms, :class:TransformPipeline for composing transforms, and :class:TransformRegistry for name-based lookup.

The transforms operate at the value level (str -> str parameterised by a TransformContext). Use dx.Iso or dx.Lens directly when the transformation crosses schema boundaries.

`TransformContext` ¶

Bases: BeadBaseModel

Metadata available to transforms at resolution time.

Attributes:

Name	Type	Description
`language_code`	`str \| None`	ISO 639 code (e.g. `"eng"`, `"en"`).
`lemma`	`str \| None`	Lemma of the span head, if known.
`pos`	`str \| None`	Universal POS tag of the span head (e.g. `"VERB"`).
`head_index`	`int \| None`	Token index of the syntactic head within the span.
`tokens`	`tuple[str, ...]`	Individual tokens of the span text. Empty when unknown.
`metadata`	`dict[str, JsonValue]`	Arbitrary extra metadata.

`SpanTextTransform` ¶

Bases: Protocol

Protocol for a single text transform.

Any callable (str, TransformContext) -> str satisfies this protocol. Implementations may ignore the context when the transform is purely textual (e.g. lowercasing).

`call(text: str, context: TransformContext) -> str` ¶

Apply the transform to text.

`TransformPipeline` ¶

An ordered chain of transforms applied left-to-right.

Examples:

>>> from bead.transforms.text import LowerTransform, CapitalizeTransform
>>> ctx = TransformContext()
>>> pipe = TransformPipeline([LowerTransform(), CapitalizeTransform()])
>>> pipe("HELLO WORLD", ctx)
'Hello world'

`call(text: str, context: TransformContext) -> str` ¶

Apply each transform in sequence.

`len() -> int` ¶

Return the number of transforms in the pipeline.

`repr() -> str` ¶

Return a debug-friendly representation of the pipeline.

`append(transform: SpanTextTransform) -> None` ¶

Append a transform to the end of the pipeline.

`prepend(transform: SpanTextTransform) -> None` ¶

Insert a transform at the beginning of the pipeline.

`TransformRegistry` ¶

Name-to-transform mapping with pipeline construction.

Transforms are registered under short string names (e.g. "gerund", "lower") and looked up when resolving [[label|name1|name2]] prompt references.

Examples:

>>> from bead.transforms.text import LowerTransform
>>> reg = TransformRegistry()
>>> reg.register("lower", LowerTransform())
>>> t = reg.get("lower")
>>> t("HELLO", TransformContext())
'hello'

`register(name: str, transform: SpanTextTransform | Callable[[str, TransformContext], str]) -> None` ¶

`get(name: str) -> SpanTextTransform` ¶

Return the transform registered under name.

Raises:

Type	Description
`KeyError`	If no transform with that name exists.

`resolve_pipeline(names: list[str]) -> TransformPipeline` ¶

Return a pipeline applying the named transforms left-to-right.

`available() -> list[str]` ¶

Return the registered transform names, sorted.

`contains(name: str) -> bool` ¶

Return whether name is registered.

`len() -> int` ¶

Return the number of registered transforms.

`repr() -> str` ¶

Return a debug-friendly representation of the registry.

Text Transforms¶

Pure surface-string transforms. In addition to case transforms (lower, upper, capitalize, title), this module provides MarkdownStripTransform and RedditCleanupTransform for cleaning web/markdown text into plain prose, and split_sentences for sentence segmentation (parser-backed when a spaCy/Stanza config is given, with a regular-expression fallback otherwise).

`text` ¶

Pure text transforms that require no external resources.

These transforms operate on the surface string and ignore the :class:TransformContext. They are always safe to register regardless of language.

`LowerTransform` ¶

Convert text to lowercase.

Examples:

>>> LowerTransform()("Hello World", TransformContext())
'hello world'

`call(text: str, context: TransformContext) -> str` ¶

Apply str.lower to text.

`UpperTransform` ¶

Convert text to uppercase.

Examples:

>>> UpperTransform()("Hello World", TransformContext())
'HELLO WORLD'

`call(text: str, context: TransformContext) -> str` ¶

Apply str.upper to text.

`CapitalizeTransform` ¶

Capitalize the first character, lowercase the rest.

Examples:

>>> CapitalizeTransform()("hELLO WORLD", TransformContext())
'Hello world'

`call(text: str, context: TransformContext) -> str` ¶

Apply str.capitalize to text.

`TitleTransform` ¶

Title-case each word.

Examples:

>>> TitleTransform()("hello world", TransformContext())
'Hello World'

`call(text: str, context: TransformContext) -> str` ¶

Apply str.title to text.

`MarkdownStripTransform` ¶

Strip common Markdown markup, keeping the human-readable text.

Removes link/image targets (keeping the visible text), emphasis markers, inline code backticks, heading markers, and blockquote markers.

Examples:

>>> MarkdownStripTransform()("**bold** and [a link](http://x)", TransformContext())
'bold and a link'

`call(text: str, context: TransformContext) -> str` ¶

Strip Markdown markup from text.

`RedditCleanupTransform` ¶

Clean Reddit comment text into plain prose.

Unescapes HTML entities, strips Markdown (reusing :class:MarkdownStripTransform), removes URLs and [deleted]/ [removed] markers, and collapses runs of intra-line whitespace.

Examples:

>>> RedditCleanupTransform()("see [here](http://x) &amp; more", TransformContext())
'see here & more'

`call(text: str, context: TransformContext) -> str` ¶

Clean Reddit markup from text.

`split_sentences(text: str, *, tokenizer_config: TokenizerConfig | None = None) -> tuple[str, ...]` ¶

Split text into sentences.

When tokenizer_config selects a spacy or stanza backend, sentence boundaries come from that parser's segmenter. Otherwise a regular-expression fallback splits on sentence-final punctuation followed by whitespace.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to split.	required
`tokenizer_config`	`TokenizerConfig \| None`	Backend selector. `None` or the `whitespace` backend uses the regex fallback.	`None`

Returns:

Type	Description
`tuple[str, ...]`	The sentences, with surrounding whitespace stripped (empties dropped).

Morphological Transforms¶

`morphology` ¶

Morphological transforms backed by UniMorph paradigms.

Each :class:MorphologicalTransform targets a specific inflectional feature bundle (e.g. present participle) and applies the inflection to the head token of the span text. Non-head tokens are preserved as-is, producing natural multi-word results like "running to the store" from a span "run to the store" with a gerund transform.

The system is language-agnostic at the protocol level: the same :class:MorphologicalTransform class works for any language supported by UniMorph — the language is selected via language_code at construction time.

`InflectionSpec` `dataclass` ¶

Specification for a target inflectional form.

Attributes:

Name	Type	Description
`name`	`str`	Human-readable name (e.g. `"gerund"`).
`predicate`	`FeaturePredicate`	A callable that returns `True` for a UniMorph feature dict matching the desired form.
`description`	`str`	Short description of the inflection.

`MorphologicalTransform` ¶

Apply a morphological inflection to the head token of span text.

Given a span like "run to the store" and an inflection spec for the present participle, this transform produces "running to the store" by inflecting only the head token (defaulting to the first token when context.head_index is not set).

Parameters:

Name	Type	Description	Default
`inflection_spec`	`InflectionSpec`	Specifies which inflected form to target.	required
`language_code`	`str`	ISO 639 language code for UniMorph lookup.	required
`lemmatize`	`bool`	If `True` and `context.lemma` is `None`, attempt to find the paradigm by trying the head token directly as a lemma. Defaults to `True`.	`True`

Examples:

>>> spec = InflectionSpec(
...     name="gerund",
...     predicate=lambda f: (
...         f.get("verb_form") == "V.PTCP" and f.get("tense") == "PRS"
...     ),
... )
>>> t = MorphologicalTransform(spec, language_code="eng")
>>> ctx = TransformContext(
...     lemma="run", head_index=0, tokens=["run", "to", "the", "store"]
... )
>>> t("run to the store", ctx)
'running to the store'

`inflection_spec: InflectionSpec` `property` ¶

The inflection specification for this transform.

`call(text: str, context: TransformContext) -> str` ¶

Apply the inflection to the span text.

Parameters:

Name	Type	Description	Default
`text`	`str`	The span text to transform.	required
`context`	`TransformContext`	Metadata about the span (lemma, head_index, etc.).	required

Returns:

Type	Description
`str`	Text with the head token inflected. Falls back to the original text if the inflection cannot be resolved.

`repr() -> str` ¶

Return a debug representation.

`register_morphological_transforms(registry: TransformRegistry, language_code: str) -> None` ¶

Adds gerund, past_tense, past_participle, present_3sg, and infinitive transforms backed by UniMorph.

Parameters:

Name	Type	Description	Default
`registry`	`TransformRegistry`	Registry to populate.	required
`language_code`	`str`	ISO 639 language code.	required

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search

bead.transforms¶

Core Abstractions¶

base ¶

TransformContext ¶

SpanTextTransform ¶

__call__(text: str, context: TransformContext) -> str ¶

TransformPipeline ¶

__call__(text: str, context: TransformContext) -> str ¶

__len__() -> int ¶

__repr__() -> str ¶

append(transform: SpanTextTransform) -> None ¶

prepend(transform: SpanTextTransform) -> None ¶

TransformRegistry ¶

register(name: str, transform: SpanTextTransform | Callable[[str, TransformContext], str]) -> None ¶

get(name: str) -> SpanTextTransform ¶

resolve_pipeline(names: list[str]) -> TransformPipeline ¶

available() -> list[str] ¶

__contains__(name: str) -> bool ¶

__len__() -> int ¶

__repr__() -> str ¶

Text Transforms¶

text ¶

LowerTransform ¶

__call__(text: str, context: TransformContext) -> str ¶

UpperTransform ¶

__call__(text: str, context: TransformContext) -> str ¶

CapitalizeTransform ¶

__call__(text: str, context: TransformContext) -> str ¶

TitleTransform ¶

__call__(text: str, context: TransformContext) -> str ¶

MarkdownStripTransform ¶

__call__(text: str, context: TransformContext) -> str ¶

RedditCleanupTransform ¶

__call__(text: str, context: TransformContext) -> str ¶

split_sentences(text: str, *, tokenizer_config: TokenizerConfig | None = None) -> tuple[str, ...] ¶

Morphological Transforms¶

morphology ¶

InflectionSpec dataclass ¶

MorphologicalTransform ¶

inflection_spec: InflectionSpec property ¶

__call__(text: str, context: TransformContext) -> str ¶

__repr__() -> str ¶

register_morphological_transforms(registry: TransformRegistry, language_code: str) -> None ¶

`base` ¶

`TransformContext` ¶

`SpanTextTransform` ¶

`call(text: str, context: TransformContext) -> str` ¶

`TransformPipeline` ¶

`call(text: str, context: TransformContext) -> str` ¶

`len() -> int` ¶

`repr() -> str` ¶

`append(transform: SpanTextTransform) -> None` ¶

`prepend(transform: SpanTextTransform) -> None` ¶

`TransformRegistry` ¶

`register(name: str, transform: SpanTextTransform | Callable[[str, TransformContext], str]) -> None` ¶

`get(name: str) -> SpanTextTransform` ¶

`resolve_pipeline(names: list[str]) -> TransformPipeline` ¶

`available() -> list[str]` ¶

`contains(name: str) -> bool` ¶

`len() -> int` ¶

`repr() -> str` ¶

`text` ¶

`LowerTransform` ¶

`call(text: str, context: TransformContext) -> str` ¶

`UpperTransform` ¶

`call(text: str, context: TransformContext) -> str` ¶

`CapitalizeTransform` ¶

`call(text: str, context: TransformContext) -> str` ¶

`TitleTransform` ¶

`call(text: str, context: TransformContext) -> str` ¶

`MarkdownStripTransform` ¶

`call(text: str, context: TransformContext) -> str` ¶

`RedditCleanupTransform` ¶

`call(text: str, context: TransformContext) -> str` ¶

`split_sentences(text: str, *, tokenizer_config: TokenizerConfig | None = None) -> tuple[str, ...]` ¶

`morphology` ¶

`InflectionSpec` `dataclass` ¶

`MorphologicalTransform` ¶

`inflection_spec: InflectionSpec` `property` ¶

`call(text: str, context: TransformContext) -> str` ¶

`repr() -> str` ¶

`register_morphological_transforms(registry: TransformRegistry, language_code: str) -> None` ¶