bead.data¶

Core data models and utilities used throughout the bead pipeline.

Base Models¶

`base` ¶

Base Pydantic model for all bead objects.

This module provides BeadBaseModel, the foundational Pydantic v2 model that all bead data models should inherit from. It provides automatic ID generation, timestamp tracking, and versioning.

`BeadBaseModel` ¶

Bases: BaseModel

Base Pydantic model for all bead objects.

This model provides foundational fields and configuration that all bead data models inherit. It includes automatic ID generation using UUIDv7, timestamp tracking for creation and modification, versioning, and metadata.

Attributes:

Name	Type	Description
`id`	`UUID`	Unique identifier (UUIDv7) automatically generated on creation
`created_at`	`datetime`	UTC timestamp when object was created
`modified_at`	`datetime`	UTC timestamp when object was last modified
`version`	`str`	Version string for schema versioning (default: "1.0.0")
`metadata`	`dict[str, JsonValue]`	Optional metadata dictionary for arbitrary key-value pairs

Examples:

>>> class MyModel(BeadBaseModel):
...     name: str
...     value: int
>>> obj = MyModel(name="test", value=42)
>>> obj.id
UUID('...')
>>> obj.version
'1.0.0'
>>> obj.update_modified_time()
>>> obj.modified_at > obj.created_at
True

`update_modified_time() -> None` ¶

Update the modified_at timestamp to current UTC time.

This method should be called whenever the object is modified to maintain accurate modification tracking.

Examples:

>>> obj = BeadBaseModel()
>>> original_time = obj.modified_at
>>> import time
>>> time.sleep(0.01)  # Small delay to ensure different timestamp
>>> obj.update_modified_time()
>>> obj.modified_at > original_time
True

Identifiers and Timestamps¶

`identifiers` ¶

UUIDv7 generation and utilities for bead package.

This module provides functions for generating time-ordered UUIDv7 identifiers, extracting timestamps from them, and validating UUID versions.

`generate_uuid() -> UUID` ¶

Generate a time-ordered UUIDv7.

UUIDv7 is a time-ordered UUID format that embeds a timestamp in the first 48 bits, making UUIDs sortable by creation time. This is useful for maintaining chronological ordering of database records.

Returns:

Type	Description
`UUID`	A newly generated UUIDv7 with embedded timestamp

Examples:

>>> uuid1 = generate_uuid()
>>> uuid2 = generate_uuid()
>>> uuid1 < uuid2  # uuids are time-ordered
True

`extract_timestamp(uuid: UUID) -> int` ¶

Extract timestamp in milliseconds from a UUIDv7.

The timestamp is stored in the first 48 bits of the UUID and represents milliseconds since Unix epoch (January 1, 1970 00:00:00 UTC).

Parameters:

Name	Type	Description	Default
`uuid`	`UUID`	The UUIDv7 to extract timestamp from.	required

Returns:

Type	Description
`int`	Timestamp in milliseconds since Unix epoch

Examples:

>>> import time
>>> uuid = generate_uuid()
>>> timestamp = extract_timestamp(uuid)
>>> current_time = int(time.time() * 1000)
>>> abs(timestamp - current_time) < 1000  # within 1 second
True

`is_valid_uuid7(uuid: UUID) -> bool` ¶

Check if a UUID is a valid UUIDv7.

Validates that the UUID has version 7 by checking the version bits (bits 48-51) which should be 0111 (7).

Parameters:

Name	Type	Description	Default
`uuid`	`UUID`	The UUID to validate.	required

Returns:

Type	Description
`bool`	True if the UUID is version 7, False otherwise

Examples:

>>> uuid7 = generate_uuid()
>>> is_valid_uuid7(uuid7)
True
>>> from uuid import uuid4
>>> uuid4_val = uuid4()
>>> is_valid_uuid7(uuid4_val)
False

`timestamps` ¶

ISO 8601 timestamp utilities for bead package.

This module provides functions for creating, parsing, and formatting ISO 8601 timestamps with timezone information. All timestamps use UTC timezone.

`now_iso8601() -> datetime` ¶

Get current UTC datetime with timezone information.

Returns the current time in UTC with timezone info attached. This is preferred over datetime.utcnow() which is deprecated and doesn't include timezone information.

Returns:

Type	Description
`datetime`	Current UTC datetime with timezone information

Examples:

>>> dt = now_iso8601()
>>> dt.tzinfo is not None
True
>>> dt.tzinfo == UTC
True

`parse_iso8601(timestamp: str) -> datetime` ¶

Parse ISO 8601 timestamp string to datetime.

Parses an ISO 8601 formatted string into a datetime object. The string should include timezone information.

Parameters:

Name	Type	Description	Default
`timestamp`	`str`	ISO 8601 formatted timestamp string (e.g., "2025-10-17T14:23:45.123456+00:00").	required

Returns:

Type	Description
`datetime`	Parsed datetime with timezone information

Examples:

>>> dt_str = "2025-10-17T14:23:45.123456+00:00"
>>> dt = parse_iso8601(dt_str)
>>> dt.year
2025
>>> dt.month
10

`format_iso8601(dt: datetime) -> str` ¶

Format datetime as ISO 8601 string.

Converts a datetime object to an ISO 8601 formatted string. If the datetime doesn't have timezone information, it will be assumed to be UTC.

Parameters:

Name	Type	Description	Default
`dt`	`datetime`	Datetime to format.	required

Returns:

Type	Description
`str`	ISO 8601 formatted string

Examples:

>>> dt = now_iso8601()
>>> formatted = format_iso8601(dt)
>>> "+00:00" in formatted or "Z" in formatted
True

Metadata and Validation¶

`metadata` ¶

Metadata tracking models for provenance and processing history.

This module provides models for tracking provenance chains and processing history for all bead objects. This enables full traceability of data transformations.

`ProvenanceRecord` ¶

Bases: BeadBaseModel

Record of a provenance relationship between objects.

Tracks a single parent-child relationship in the provenance chain, including what the parent was, its type, and the nature of the relationship.

Attributes:

Name	Type	Description
`parent_id`	`UUID`	UUID of the parent object in the provenance chain
`parent_type`	`str`	Type name of the parent object (e.g., "LexicalItem", "Template")
`relationship`	`str`	Type of relationship (e.g., "derived_from", "filled_from", "generated_from")
`timestamp`	`datetime`	When this relationship was established (UTC with timezone)

Examples:

>>> from uuid import uuid4
>>> parent_id = uuid4()
>>> record = ProvenanceRecord(
...     parent_id=parent_id,
...     parent_type="Template",
...     relationship="filled_from"
... )
>>> record.parent_type
'Template'
>>> record.timestamp is not None
True

`ProcessingRecord` ¶

Bases: BeadBaseModel

Record of a processing operation applied to an object.

Tracks a single operation in the processing history, including the operation name, parameters used, when it was performed, and who/what performed it.

Attributes:

Name	Type	Description
`operation`	`str`	Name of the operation (e.g., "fill_template", "apply_constraint", "filter")
`parameters`	`dict[str, JsonValue]`	Parameters passed to the operation (default: empty dict)
`timestamp`	`datetime`	When the operation was performed (UTC with timezone)
`operator`	`str \| None`	Who/what performed the operation (e.g., "TemplateFiller-v1.0", user ID) (default: None)

Examples:

>>> record = ProcessingRecord(
...     operation="fill_template",
...     parameters={"strategy": "exhaustive", "max_items": 100},
...     operator="TemplateFiller-v1.0"
... )
>>> record.operation
'fill_template'
>>> record.parameters["strategy"]
'exhaustive'
>>> record.timestamp is not None
True

`MetadataTracker` ¶

Bases: BeadBaseModel

Metadata tracking for provenance and processing history.

Tracks both provenance (where data came from) and processing history (what operations were applied) for complete data lineage.

Attributes:

Name	Type	Description
`provenance`	`list[ProvenanceRecord]`	Chain of provenance relationships (default: empty list)
`processing_history`	`list[ProcessingRecord]`	History of processing operations (default: empty list)
`custom_metadata`	`dict[str, JsonValue]`	Custom metadata fields (default: empty dict)

Examples:

>>> from uuid import uuid4
>>> tracker = MetadataTracker()
>>> parent_id = uuid4()
>>> tracker.add_provenance(parent_id, "Template", "filled_from")
>>> tracker.add_processing("fill_template", {"strategy": "exhaustive"})
>>> len(tracker.provenance)
1
>>> len(tracker.processing_history)
1
>>> chain = tracker.get_provenance_chain()
>>> len(chain)
1

`add_provenance(parent_id: UUID, parent_type: str, relationship: str) -> None` ¶

Add a provenance record to the chain.

Creates a new provenance record and adds it to the provenance list. The timestamp is automatically set to the current time.

Parameters:

Name	Type	Description	Default
`parent_id`	`UUID`	UUID of the parent object	required
`parent_type`	`str`	Type name of the parent object (e.g., "Template", "LexicalItem")	required
`relationship`	`str`	Type of relationship (e.g., "derived_from", "filled_from")	required

Examples:

>>> from uuid import uuid4
>>> tracker = MetadataTracker()
>>> parent_id = uuid4()
>>> tracker.add_provenance(parent_id, "Template", "filled_from")
>>> len(tracker.provenance)
1
>>> tracker.provenance[0].parent_type
'Template'

`add_processing(operation: str, parameters: dict[str, JsonValue] | None = None, operator: str | None = None) -> None` ¶

Add a processing record to the history.

Creates a new processing record and adds it to the processing history. The timestamp is automatically set to the current time.

Parameters:

Name	Type	Description	Default
`operation`	`str`	Name of the operation performed	required
`parameters`	`dict[str, JsonValue] \| None`	Parameters passed to the operation (default: None, which creates empty dict)	`None`
`operator`	`str \| None`	Who/what performed the operation (default: None)	`None`

Examples:

>>> tracker = MetadataTracker()
>>> tracker.add_processing("fill_template", {"strategy": "exhaustive"})
>>> len(tracker.processing_history)
1
>>> tracker.processing_history[0].operation
'fill_template'
>>> tracker.add_processing("filter", operator="FilterSystem-v2.0")
>>> tracker.processing_history[1].operator
'FilterSystem-v2.0'

`get_provenance_chain() -> list[UUID]` ¶

Get the full provenance chain as a list of parent UUIDs.

Returns the parent UUIDs in the order they were added to the provenance list.

Returns:

Type	Description
`list[UUID]`	List of parent UUIDs in chronological order

Examples:

>>> from uuid import uuid4
>>> tracker = MetadataTracker()
>>> parent1 = uuid4()
>>> parent2 = uuid4()
>>> tracker.add_provenance(parent1, "Template", "filled_from")
>>> tracker.add_provenance(parent2, "LexicalItem", "derived_from")
>>> chain = tracker.get_provenance_chain()
>>> len(chain)
2
>>> chain[0] == parent1
True

`get_recent_processing(n: int = 5) -> list[ProcessingRecord]` ¶

Get the N most recent processing records.

Returns the most recent processing records, up to N records. If there are fewer than N records, returns all available records.

Parameters:

Name	Type	Description	Default
`n`	`int`	Number of recent records to return (default: 5)	`5`

Returns:

Type	Description
`list[ProcessingRecord]`	List of up to N most recent processing records, newest first

Examples:

>>> tracker = MetadataTracker()
>>> tracker.add_processing("operation1")
>>> tracker.add_processing("operation2")
>>> tracker.add_processing("operation3")
>>> recent = tracker.get_recent_processing(n=2)
>>> len(recent)
2
>>> recent[0].operation
'operation3'
>>> recent[1].operation
'operation2'

`validation` ¶

Validation utilities for data integrity checks.

This module provides validation functions beyond Pydantic's built-in validation, including file validation, reference validation, and provenance chain validation.

`ValidationReport` ¶

Bases: BaseModel

Report of validation results.

A lightweight model for collecting and reporting validation results, including errors, warnings, and statistics about validated objects.

Attributes:

Name	Type	Description
`valid`	`bool`	Overall validation status (False if any errors)
`errors`	`list[str]`	List of error messages (default: empty list)
`warnings`	`list[str]`	List of warning messages (default: empty list)
`object_count`	`int`	Number of objects validated (default: 0)

Examples:

>>> report = ValidationReport(valid=True)
>>> report.add_error("Invalid field")
>>> report.valid
False
>>> report.has_errors()
True
>>> len(report.errors)
1

`add_error(message: str) -> None` ¶

Add an error message and set valid to False.

Parameters:

Name	Type	Description	Default
`message`	`str`	Error message to add.	required

Examples:

>>> report = ValidationReport(valid=True)
>>> report.add_error("Something went wrong")
>>> report.valid
False
>>> "Something went wrong" in report.errors
True

`add_warning(message: str) -> None` ¶

Add a warning message.

Warnings do not affect the valid status.

Parameters:

Name	Type	Description	Default
`message`	`str`	Warning message to add.	required

Examples:

>>> report = ValidationReport(valid=True)
>>> report.add_warning("This might be an issue")
>>> report.valid
True
>>> report.has_warnings()
True

`has_errors() -> bool` ¶

Check if report has any errors.

Returns:

Type	Description
`bool`	True if errors list is non-empty

Examples:

>>> report = ValidationReport(valid=True)
>>> report.has_errors()
False
>>> report.add_error("error")
>>> report.has_errors()
True

`has_warnings() -> bool` ¶

Check if report has any warnings.

Returns:

Type	Description
`bool`	True if warnings list is non-empty

Examples:

>>> report = ValidationReport(valid=True)
>>> report.has_warnings()
False
>>> report.add_warning("warning")
>>> report.has_warnings()
True

`validate_jsonlines_file(path: Path, model_class: type[BaseModel], strict: bool = True) -> ValidationReport` ¶

Validate JSONLines file against Pydantic model schema.

Reads and validates each line in a JSONLines file against the provided model class. Empty lines are skipped.

Parameters:

Name	Type	Description	Default
`path`	`Path`	Path to JSONLines file to validate.	required
`model_class`	`type[BaseModel]`	Pydantic model class to validate against.	required
`strict`	`bool`	If True, stop at first error. If False, collect all errors (default: True).	`True`

Returns:

Type	Description
`ValidationReport`	Validation report with results

Examples:

>>> from pathlib import Path
>>> from bead.data.base import BeadBaseModel
>>> class TestModel(BeadBaseModel):
...     name: str
>>> # validate file
>>> report = validate_jsonlines_file(
...     Path("data.jsonl"), TestModel
... )
>>> report.valid
True

`validate_uuid_references(objects: list[BaseModel], reference_pool: dict[UUID, BaseModel]) -> ValidationReport` ¶

Validate that UUID references point to existing objects.

Checks all UUID fields in objects to ensure they reference valid objects in the reference pool. Supports both single UUID fields and list[UUID] fields.

Parameters:

Name	Type	Description	Default
`objects`	`list[BaseModel]`	List of objects to validate.	required
`reference_pool`	`dict[UUID, BaseModel]`	Dictionary of valid UUIDs to objects.	required

Returns:

Type	Description
`ValidationReport`	Validation report with results

Examples:

>>> from uuid import uuid4
>>> from bead.data.base import BeadBaseModel
>>> class Item(BeadBaseModel):
...     name: str
>>> items = [Item(name="test")]
>>> pool = {items[0].id: items[0]}
>>> report = validate_uuid_references(items, pool)
>>> report.valid
True

`validate_provenance_chain(metadata: MetadataTracker, repository: dict[UUID, BaseModel]) -> ValidationReport` ¶

Validate provenance chain references are valid.

Checks that all parent_id references in the provenance chain exist in the repository and that parent_type matches the actual type.

Parameters:

Name	Type	Description	Default
`metadata`	`MetadataTracker`	Metadata tracker with provenance chain to validate.	required
`repository`	`dict[UUID, BaseModel]`	Dictionary of valid UUIDs to objects.	required

Returns:

Type	Description
`ValidationReport`	Validation report with results

Examples:

>>> from uuid import uuid4
>>> from bead.data.base import BeadBaseModel
>>> from bead.data.metadata import MetadataTracker
>>> class Template(BeadBaseModel):
...     name: str
>>> template = Template(name="test")
>>> metadata = MetadataTracker()
>>> metadata.add_provenance(template.id, "Template", "filled_from")
>>> repo = {template.id: template}
>>> report = validate_provenance_chain(metadata, repo)
>>> report.valid
True

Serialization¶

`serialization` ¶

JSONLines serialization utilities for bead package.

This module provides functions for reading, writing, streaming, and appending Pydantic models to/from JSONLines format files. JSONLines is a convenient format for storing multiple JSON objects, with one object per line.

`SerializationError` ¶

Bases: Exception

Exception raised when serialization to JSONLines fails.

This exception is raised when writing Pydantic objects to JSONLines format encounters an error, such as file I/O issues or validation failures.

`DeserializationError` ¶

Bases: Exception

Exception raised when deserialization from JSONLines fails.

This exception is raised when reading JSONLines format into Pydantic objects encounters an error, such as file not found, invalid JSON, or validation failures.

`write_jsonlines(objects: Sequence[T], path: Path | str, validate: bool = True, append: bool = False) -> None` ¶

Write Pydantic objects to JSONLines file.

Serializes a sequence of Pydantic model instances to a JSONLines file, with one JSON object per line. Each object is validated before writing if validate=True.

Parameters:

Name	Type	Description	Default
`objects`	`Sequence[T]`	Sequence of Pydantic model instances to serialize.	required
`path`	`Path \| str`	Path to the output file.	required
`validate`	`bool`	Whether to validate objects before writing (default: True).	`True`
`append`	`bool`	Whether to append to existing file or overwrite (default: False).	`False`

Raises:

Type	Description
`SerializationError`	If writing fails due to I/O error or validation failure

Examples:

>>> from pathlib import Path
>>> from bead.data.base import BeadBaseModel
>>> class TestModel(BeadBaseModel):
...     name: str
>>> objects = [TestModel(name="test1"), TestModel(name="test2")]
>>> write_jsonlines(objects, Path("output.jsonl"))

`read_jsonlines(path: Path | str, model_class: type[T], validate: bool = True, skip_errors: bool = False) -> list[T]` ¶

Read JSONLines file into list of Pydantic objects.

Deserializes a JSONLines file into a list of Pydantic model instances. Each line should contain a valid JSON object. Empty lines are skipped.

Parameters:

Name	Type	Description	Default
`path`	`Path \| str`	Path to the input file.	required
`model_class`	`type[T]`	Pydantic model class to deserialize into.	required
`validate`	`bool`	Whether to validate objects during parsing (default: True).	`True`
`skip_errors`	`bool`	Whether to skip invalid lines or raise error (default: False).	`False`

Returns:

Type	Description
`list[T]`	List of deserialized Pydantic objects

Raises:

Type	Description
`DeserializationError`	If reading fails due to file not found, invalid JSON, or validation failure (unless skip_errors=True)

Examples:

>>> from pathlib import Path
>>> from bead.data.base import BeadBaseModel
>>> class TestModel(BeadBaseModel):
...     name: str
>>> objects = read_jsonlines(Path("input.jsonl"), TestModel)

`stream_jsonlines(path: Path | str, model_class: type[T], validate: bool = True) -> Iterator[T]` ¶

Stream JSONLines file as iterator of Pydantic objects.

Memory-efficient iterator that yields Pydantic model instances one at a time from a JSONLines file. Useful for processing large files without loading everything into memory.

Parameters:

Name	Type	Description	Default
`path`	`Path \| str`	Path to the input file.	required
`model_class`	`type[T]`	Pydantic model class to deserialize into.	required
`validate`	`bool`	Whether to validate objects during parsing (default: True).	`True`

Yields:

Type	Description
`T`	Pydantic model instances one at a time.

Raises:

Type	Description
`DeserializationError`	If reading fails due to file not found, invalid JSON, or validation failure

Examples:

>>> from pathlib import Path
>>> from bead.data.base import BeadBaseModel
>>> class TestModel(BeadBaseModel):
...     name: str
>>> for obj in stream_jsonlines(Path("input.jsonl"), TestModel):
...     print(obj.name)

`append_jsonlines(objects: Sequence[T], path: Path | str, validate: bool = True) -> None` ¶

Append Pydantic objects to existing JSONLines file.

Convenience wrapper around write_jsonlines with append=True. Adds objects to the end of an existing JSONLines file, or creates a new file if it doesn't exist.

Parameters:

Name	Type	Description	Default
`objects`	`Sequence[T]`	Sequence of Pydantic model instances to serialize.	required
`path`	`Path \| str`	Path to the output file.	required
`validate`	`bool`	Whether to validate objects before writing (default: True).	`True`

Raises:

Type	Description
`SerializationError`	If appending fails due to I/O error or validation failure

Examples:

>>> from pathlib import Path
>>> from bead.data.base import BeadBaseModel
>>> class TestModel(BeadBaseModel):
...     name: str
>>> objects = [TestModel(name="test3"), TestModel(name="test4")]
>>> append_jsonlines(objects, Path("output.jsonl"))

Utilities¶

`language_codes` ¶

ISO 639 language code validation and utilities.

`validate_iso639_code(code: str | None) -> str | None` ¶

Validate language code against ISO 639-1 or ISO 639-3.

Parameters:

Name	Type	Description	Default
`code`	`str \| None`	Language code to validate (e.g., "en", "eng", "ko", "kor").	required

Returns:

Type	Description
`str \| None`	Normalized language code (converted to ISO 639-3 if valid).

Raises:

Type	Description
`ValueError`	If code is not a valid ISO 639 language code.

Examples:

>>> validate_iso639_code("en")
'eng'
>>> validate_iso639_code("eng")
'eng'
>>> validate_iso639_code("ko")
'kor'
>>> validate_iso639_code(None)
None
>>> validate_iso639_code("invalid")
Traceback (most recent call last):
    ...
ValueError: Invalid language code: 'invalid'

`repository` ¶

Repository pattern for data access with optional caching.

This module provides a generic Repository class that implements CRUD operations for Pydantic models, with optional in-memory caching for efficient access.

`Repository` ¶

Generic repository for CRUD operations on Pydantic models.

Provides create, read, update, delete operations with JSONLines file storage and optional in-memory caching for efficient data access.

Class Type Parameters:

Name	Bound or Constraints	Description	Default
`T`	`BaseModel`	Pydantic model type this repository manages	required

Parameters:

Name	Type	Description	Default
`model_class`	`type[T]`	The Pydantic model class this repository manages	required
`storage_path`	`Path`	Path to the JSONLines file for persistent storage	required
`use_cache`	`bool`	Whether to use in-memory caching (default: True)	`True`

Attributes:

Name	Type	Description
`model_class`	`type[T]`	The Pydantic model class
`storage_path`	`Path`	Path to storage file
`use_cache`	`bool`	Whether caching is enabled
`cache`	`dict[UUID, T]`	In-memory cache of objects by ID

Examples:

>>> from pathlib import Path
>>> from bead.data.base import BeadBaseModel
>>> class MyModel(BeadBaseModel):
...     name: str
>>> repo = Repository[MyModel](
...     model_class=MyModel,
...     storage_path=Path("data/models.jsonl"),
...     use_cache=True
... )
>>> obj = MyModel(name="test")
>>> repo.add(obj)
>>> loaded = repo.get(obj.id)
>>> loaded.name
'test'
>>> repo.count()
1

`get(object_id: UUID) -> T | None` ¶

Get object by ID.

Parameters:

Name	Type	Description	Default
`object_id`	`UUID`	ID of the object to retrieve.	required

Returns:

Type	Description
`T \| None`	The object if found, None otherwise.

Examples:

>>> repo = Repository[MyModel](MyModel, Path("data.jsonl"))
>>> obj = MyModel(name="test")
>>> repo.add(obj)
>>> loaded = repo.get(obj.id)
>>> loaded is not None
True

`get_all() -> list[T]` ¶

Get all objects.

Returns:

Type	Description
`list[T]`	List of all objects in the repository

Examples:

>>> repo = Repository[MyModel](MyModel, Path("data.jsonl"))
>>> repo.add(MyModel(name="test1"))
>>> repo.add(MyModel(name="test2"))
>>> len(repo.get_all())
2

`add(obj: T) -> None` ¶

Add single object to repository.

Appends the object to the storage file and updates cache if enabled.

Parameters:

Name	Type	Description	Default
`obj`	`T`	Object to add.	required

Examples:

>>> repo = Repository[MyModel](MyModel, Path("data.jsonl"))
>>> obj = MyModel(name="test")
>>> repo.add(obj)
>>> repo.exists(obj.id)
True

`add_many(objects: list[T]) -> None` ¶

Add multiple objects to repository.

Appends all objects to the storage file and updates cache if enabled.

Parameters:

Name	Type	Description	Default
`objects`	`list[T]`	List of objects to add.	required

Examples:

>>> repo = Repository[MyModel](MyModel, Path("data.jsonl"))
>>> objs = [MyModel(name="test1"), MyModel(name="test2")]
>>> repo.add_many(objs)
>>> repo.count()
2

`update(obj: T) -> None` ¶

Update existing object.

Rewrites the entire storage file with the updated object.

Parameters:

Name	Type	Description	Default
`obj`	`T`	Object to update (must have existing ID).	required

Examples:

>>> repo = Repository[MyModel](MyModel, Path("data.jsonl"))
>>> obj = MyModel(name="test")
>>> repo.add(obj)
>>> obj.name = "updated"
>>> repo.update(obj)
>>> loaded = repo.get(obj.id)
>>> loaded.name
'updated'

`delete(object_id: UUID) -> None` ¶

Delete object by ID.

Rewrites the entire storage file without the deleted object.

Parameters:

Name	Type	Description	Default
`object_id`	`UUID`	ID of object to delete.	required

Examples:

>>> repo = Repository[MyModel](MyModel, Path("data.jsonl"))
>>> obj = MyModel(name="test")
>>> repo.add(obj)
>>> repo.delete(obj.id)
>>> repo.exists(obj.id)
False

`exists(object_id: UUID) -> bool` ¶

Check if object exists.

Parameters:

Name	Type	Description	Default
`object_id`	`UUID`	ID of object to check.	required

Returns:

Type	Description
`bool`	True if object exists, False otherwise.

Examples:

>>> repo = Repository[MyModel](MyModel, Path("data.jsonl"))
>>> obj = MyModel(name="test")
>>> repo.add(obj)
>>> repo.exists(obj.id)
True

`count() -> int` ¶

Count objects in repository.

Returns:

Type	Description
`int`	Number of objects

Examples:

>>> repo = Repository[MyModel](MyModel, Path("data.jsonl"))
>>> repo.count()
0
>>> repo.add(MyModel(name="test"))
>>> repo.count()
1

`clear() -> None` ¶

Clear all objects and delete storage file.

Examples:

>>> repo = Repository[MyModel](MyModel, Path("data.jsonl"))
>>> repo.add(MyModel(name="test"))
>>> repo.clear()
>>> repo.count()
0

`rebuild_cache() -> None` ¶

Rebuild cache from storage.

Reloads all objects from storage into the cache. Useful if the storage file was modified externally.

Examples:

>>> repo = Repository[MyModel](MyModel, Path("data.jsonl"), use_cache=True)
>>> repo.rebuild_cache()

bead.data¶

Base Models¶

base ¶

BeadBaseModel ¶

update_modified_time() -> None ¶

Identifiers and Timestamps¶

identifiers ¶

generate_uuid() -> UUID ¶

extract_timestamp(uuid: UUID) -> int ¶

is_valid_uuid7(uuid: UUID) -> bool ¶

timestamps ¶

now_iso8601() -> datetime ¶

parse_iso8601(timestamp: str) -> datetime ¶

format_iso8601(dt: datetime) -> str ¶

Metadata and Validation¶

metadata ¶

ProvenanceRecord ¶

ProcessingRecord ¶

MetadataTracker ¶

add_provenance(parent_id: UUID, parent_type: str, relationship: str) -> None ¶

add_processing(operation: str, parameters: dict[str, JsonValue] | None = None, operator: str | None = None) -> None ¶

get_provenance_chain() -> list[UUID] ¶

get_recent_processing(n: int = 5) -> list[ProcessingRecord] ¶

validation ¶

ValidationReport ¶

add_error(message: str) -> None ¶

add_warning(message: str) -> None ¶

has_errors() -> bool ¶

has_warnings() -> bool ¶

validate_jsonlines_file(path: Path, model_class: type[BaseModel], strict: bool = True) -> ValidationReport ¶

validate_uuid_references(objects: list[BaseModel], reference_pool: dict[UUID, BaseModel]) -> ValidationReport ¶

validate_provenance_chain(metadata: MetadataTracker, repository: dict[UUID, BaseModel]) -> ValidationReport ¶

Serialization¶

serialization ¶

SerializationError ¶

DeserializationError ¶

write_jsonlines(objects: Sequence[T], path: Path | str, validate: bool = True, append: bool = False) -> None ¶

read_jsonlines(path: Path | str, model_class: type[T], validate: bool = True, skip_errors: bool = False) -> list[T] ¶

stream_jsonlines(path: Path | str, model_class: type[T], validate: bool = True) -> Iterator[T] ¶

append_jsonlines(objects: Sequence[T], path: Path | str, validate: bool = True) -> None ¶

Utilities¶

language_codes ¶

validate_iso639_code(code: str | None) -> str | None ¶

repository ¶

Repository ¶

get(object_id: UUID) -> T | None ¶

get_all() -> list[T] ¶

add(obj: T) -> None ¶

add_many(objects: list[T]) -> None ¶

update(obj: T) -> None ¶

delete(object_id: UUID) -> None ¶

exists(object_id: UUID) -> bool ¶

count() -> int ¶

clear() -> None ¶

rebuild_cache() -> None ¶

`base` ¶

`BeadBaseModel` ¶

`update_modified_time() -> None` ¶

`identifiers` ¶

`generate_uuid() -> UUID` ¶

`extract_timestamp(uuid: UUID) -> int` ¶

`is_valid_uuid7(uuid: UUID) -> bool` ¶

`timestamps` ¶

`now_iso8601() -> datetime` ¶

`parse_iso8601(timestamp: str) -> datetime` ¶

`format_iso8601(dt: datetime) -> str` ¶

`metadata` ¶

`ProvenanceRecord` ¶

`ProcessingRecord` ¶

`MetadataTracker` ¶

`add_provenance(parent_id: UUID, parent_type: str, relationship: str) -> None` ¶

`add_processing(operation: str, parameters: dict[str, JsonValue] | None = None, operator: str | None = None) -> None` ¶

`get_provenance_chain() -> list[UUID]` ¶

`get_recent_processing(n: int = 5) -> list[ProcessingRecord]` ¶

`validation` ¶

`ValidationReport` ¶

`add_error(message: str) -> None` ¶

`add_warning(message: str) -> None` ¶

`has_errors() -> bool` ¶

`has_warnings() -> bool` ¶

`validate_jsonlines_file(path: Path, model_class: type[BaseModel], strict: bool = True) -> ValidationReport` ¶

`validate_uuid_references(objects: list[BaseModel], reference_pool: dict[UUID, BaseModel]) -> ValidationReport` ¶

`validate_provenance_chain(metadata: MetadataTracker, repository: dict[UUID, BaseModel]) -> ValidationReport` ¶

`serialization` ¶

`SerializationError` ¶

`DeserializationError` ¶

`write_jsonlines(objects: Sequence[T], path: Path | str, validate: bool = True, append: bool = False) -> None` ¶

`read_jsonlines(path: Path | str, model_class: type[T], validate: bool = True, skip_errors: bool = False) -> list[T]` ¶

`stream_jsonlines(path: Path | str, model_class: type[T], validate: bool = True) -> Iterator[T]` ¶

`append_jsonlines(objects: Sequence[T], path: Path | str, validate: bool = True) -> None` ¶

`language_codes` ¶

`validate_iso639_code(code: str | None) -> str | None` ¶

`repository` ¶

`Repository` ¶

`get(object_id: UUID) -> T | None` ¶

`get_all() -> list[T]` ¶

`add(obj: T) -> None` ¶

`add_many(objects: list[T]) -> None` ¶

`update(obj: T) -> None` ¶

`delete(object_id: UUID) -> None` ¶

`exists(object_id: UUID) -> bool` ¶

`count() -> int` ¶

`clear() -> None` ¶

`rebuild_cache() -> None` ¶