Skip to content

bead.config

Configuration system with Pydantic models for YAML-based pipeline orchestration.

All configuration modules are documented here. See the Configuration Guide for usage examples.

config

Configuration system for the bead pipeline.

Provides configuration models, default settings, and named profiles for development, testing, and production environments.

DEFAULT_CONFIG = BeadConfig(profile='default', paths=(PathsConfig()), resources=(ResourceConfig()), templates=(TemplateConfig()), items=(ItemConfig()), lists=(ListConfig()), deployment=(DeploymentConfig()), active_learning=(ActiveLearningConfig()), logging=(LoggingConfig())) module-attribute

Default configuration instance.

This configuration uses all default values from each config model. It's the base configuration used when no config file is provided.

Examples:

>>> from bead.config.defaults import DEFAULT_CONFIG
>>> DEFAULT_CONFIG.profile
'default'
>>> DEFAULT_CONFIG.paths.data_dir
PosixPath('data')

DEV_CONFIG = BeadConfig(profile='dev', paths=(PathsConfig(data_dir=(Path('data')), output_dir=(Path('output')), cache_dir=(Path('.cache')), temp_dir=(Path(gettempdir()) / 'bead_dev'), create_dirs=True)), resources=(ResourceConfig(cache_external=False)), templates=(TemplateConfig(filling_strategy='exhaustive', batch_size=100, stream_mode=False)), items=(ItemConfig(model=(ModelConfig(provider='huggingface', model_name='gpt2', batch_size=4, device='cpu')), parallel_processing=False)), lists=(ListConfig(num_lists=1)), deployment=(DeploymentConfig()), active_learning=(ActiveLearningConfig(forced_choice_model=(ForcedChoiceModelConfig(num_epochs=1, batch_size=8, learning_rate=2e-05)), trainer=(TrainerConfig(epochs=1)))), logging=(LoggingConfig(level='DEBUG', console=True))) module-attribute

Development configuration profile.

Optimized for: - Quick iteration and debugging - Verbose logging (DEBUG level) - Small batch sizes for fast feedback - No caching for fresh data - Simple single-threaded processing - Temporary directories for easy cleanup

Examples:

>>> from bead.config.profiles import DEV_CONFIG
>>> DEV_CONFIG.logging.level
'DEBUG'
>>> DEV_CONFIG.templates.batch_size
100

PROD_CONFIG = BeadConfig(profile='prod', paths=(PathsConfig(data_dir=(Path('/var/bead/data').absolute()), output_dir=(Path('/var/bead/output').absolute()), cache_dir=(Path('/var/bead/cache').absolute()), temp_dir=(Path('/var/bead/temp').absolute()), create_dirs=True)), resources=(ResourceConfig(cache_external=True)), templates=(TemplateConfig(filling_strategy='exhaustive', batch_size=10000, stream_mode=True)), items=(ItemConfig(model=(ModelConfig(provider='huggingface', model_name='gpt2', batch_size=32, device='cuda')), parallel_processing=True, num_workers=8)), lists=(ListConfig(num_lists=1)), deployment=(DeploymentConfig(apply_material_design=True, include_demographics=True, include_attention_checks=True)), active_learning=(ActiveLearningConfig(forced_choice_model=(ForcedChoiceModelConfig(num_epochs=10, batch_size=32, learning_rate=2e-05)), trainer=(TrainerConfig(epochs=10, use_wandb=True)))), logging=(LoggingConfig(level='WARNING', console=False, file=(Path('/var/log/bead/app.log'))))) module-attribute

Production configuration profile.

Optimized for: - Maximum performance and throughput - Large batch sizes for efficiency - GPU acceleration (when available) - Parallel processing - External caching enabled - Minimal logging (WARNING level) - Absolute paths to production directories - Metrics tracking with W&B

Examples:

>>> from bead.config.profiles import PROD_CONFIG
>>> PROD_CONFIG.logging.level
'WARNING'
>>> PROD_CONFIG.templates.batch_size
10000
>>> PROD_CONFIG.items.parallel_processing
True

PROFILES: dict[str, BeadConfig] = {'default': BeadConfig(), 'dev': DEV_CONFIG, 'prod': PROD_CONFIG, 'test': TEST_CONFIG} module-attribute

Registry of all available configuration profiles.

Maps profile names to their corresponding BeadConfig instances.

Examples:

>>> from bead.config.profiles import PROFILES
>>> list(PROFILES.keys())
['default', 'dev', 'prod', 'test']
>>> PROFILES["dev"].logging.level
'DEBUG'

TEST_CONFIG = BeadConfig(profile='test', paths=(PathsConfig(data_dir=(Path(gettempdir()) / 'bead_test' / 'data'), output_dir=(Path(gettempdir()) / 'bead_test' / 'output'), cache_dir=(Path(gettempdir()) / 'bead_test' / 'cache'), temp_dir=(Path(gettempdir()) / 'bead_test' / 'temp'), create_dirs=True)), resources=(ResourceConfig(cache_external=False)), templates=(TemplateConfig(filling_strategy='exhaustive', batch_size=10, max_combinations=100, random_seed=42)), items=(ItemConfig(model=(ModelConfig(provider='huggingface', model_name='gpt2', batch_size=1, device='cpu')), parallel_processing=False, num_workers=1)), lists=(ListConfig(num_lists=1, random_seed=42)), deployment=(DeploymentConfig(apply_material_design=False, include_demographics=False, include_attention_checks=False)), active_learning=(ActiveLearningConfig(forced_choice_model=(ForcedChoiceModelConfig(num_epochs=1, batch_size=2, learning_rate=2e-05)), trainer=(TrainerConfig(epochs=1, use_wandb=False)))), logging=(LoggingConfig(level='CRITICAL', console=False))) module-attribute

Test configuration profile.

Optimized for: - Fast test execution - Reproducibility (fixed random seeds) - Minimal resource usage - Tiny batch sizes - Temporary directories for isolation - Minimal logging (CRITICAL level) - No external dependencies - CPU-only execution

Examples:

>>> from bead.config.profiles import TEST_CONFIG
>>> TEST_CONFIG.logging.level
'CRITICAL'
>>> TEST_CONFIG.templates.batch_size
10
>>> TEST_CONFIG.templates.random_seed
42

ActiveLearningConfig

Bases: BaseModel

Configuration for active learning infrastructure.

Reflects the bead/active_learning/ module structure: - models: Active learning models (ForcedChoiceModel, etc.) - trainers: Training infrastructure (HuggingFace, Lightning) - loop: Active learning loop orchestration - selection: Item selection strategies (uncertainty sampling, etc.)

Parameters:

Name Type Description Default
forced_choice_model ForcedChoiceModelConfig

Configuration for forced choice models.

required
trainer TrainerConfig

Configuration for trainers (HuggingFace, Lightning).

required
loop ActiveLearningLoopConfig

Configuration for active learning loop.

required
uncertainty_sampler UncertaintySamplerConfig

Configuration for uncertainty sampling strategies.

required

Examples:

>>> config = ActiveLearningConfig()
>>> config.forced_choice_model.model_name
'bert-base-uncased'
>>> config.trainer.trainer_type
'huggingface'
>>> config.loop.max_iterations
10
>>> config.uncertainty_sampler.method
'entropy'

BeadConfig

Bases: BaseModel

Main configuration for the bead package.

Reflects the actual bead/ module structure: - active_learning: Active learning (models, trainers, loop, selection) - data_collection: Human data collection (JATOS, Prolific) - deployment: Experiment deployment (jsPsych, JATOS) - evaluation: Model evaluation and metrics - items: Item generation and management - lists: List construction and balancing - resources: Linguistic resources (VerbNet, PropBank, UniMorph) - templates: Template management

Parameters:

Name Type Description Default
profile str

Configuration profile name.

required
paths PathsConfig

Paths configuration.

required
resources ResourceConfig

Resources configuration.

required
templates TemplateConfig

Templates configuration.

required
items ItemConfig

Items configuration.

required
lists ListConfig

Lists configuration.

required
deployment DeploymentConfig

Deployment configuration.

required
active_learning ActiveLearningConfig

Active learning configuration (models, trainers, loop, selection).

required
logging LoggingConfig

Logging configuration.

required

Examples:

>>> config = BeadConfig()
>>> config.profile
'default'
>>> config.paths.data_dir
PosixPath('data')
>>> config.active_learning.forced_choice_model.model_name
'bert-base-uncased'
>>> config.active_learning.trainer.trainer_type
'huggingface'
>>> config.active_learning.loop.max_iterations
10

to_dict() -> dict[str, Any]

Convert configuration to dictionary.

Returns:

Type Description
dict[str, Any]

Configuration as a dictionary.

Examples:

>>> config = BeadConfig()
>>> d = config.to_dict()
>>> d["profile"]
'default'

to_yaml() -> str

Convert configuration to YAML string.

Returns:

Type Description
str

Configuration as YAML string.

Examples:

>>> config = BeadConfig()
>>> yaml_str = config.to_yaml()
>>> 'profile: default' in yaml_str
True

validate_paths() -> list[str]

Validate all path fields exist.

Returns:

Type Description
list[str]

List of validation errors. Empty if all paths are valid.

Examples:

>>> config = BeadConfig()
>>> errors = config.validate_paths()
>>> len(errors)
0

DeploymentConfig

Bases: BaseModel

Configuration for experiment deployment.

Parameters:

Name Type Description Default
platform str

Deployment platform.

required
jspsych_version str

jsPsych version to use.

required
apply_material_design bool

Whether to use Material Design.

required
include_demographics bool

Whether to include demographics survey.

required
include_attention_checks bool

Whether to include attention checks.

required
jatos_export bool

Whether to export to JATOS.

required
distribution_strategy ListDistributionStrategy

List distribution strategy for batch experiments. Defaults to balanced assignment.

required

Examples:

>>> config = DeploymentConfig()
>>> config.platform
'jspsych'
>>> config.jspsych_version
'7.3.0'
>>> config.distribution_strategy.strategy_type
<DistributionStrategyType.BALANCED: 'balanced'>

ItemConfig

Bases: BaseModel

Configuration for item generation.

Parameters:

Name Type Description Default
model ModelConfig

Model configuration.

required
apply_constraints bool

Whether to apply model-based constraints.

required
track_metadata bool

Whether to track item metadata.

required
parallel_processing bool

Whether to use parallel processing.

required
num_workers int

Number of workers for parallel processing.

required

Examples:

>>> config = ItemConfig()
>>> config.apply_constraints
True
>>> config.num_workers
4

ListConfig

Bases: BaseModel

Configuration for list partitioning.

Parameters:

Name Type Description Default
partitioning_strategy str

Strategy name for partitioning.

required
num_lists int

Number of lists to create.

required
items_per_list int | None

Items per list.

required
balance_by list[str]

Fields to balance on.

required
ensure_uniqueness bool

Whether to ensure items are unique across lists.

required
random_seed int | None

Random seed for reproducibility.

required
batch_constraints list[BatchConstraintConfig] | None

Batch-level constraints to apply across all lists.

required

Examples:

>>> config = ListConfig()
>>> config.partitioning_strategy
'balanced'
>>> config.num_lists
1

validate_items_per_list(v: int | None) -> int | None classmethod

Validate items_per_list is positive.

Parameters:

Name Type Description Default
v int | None

Items per list value.

required

Returns:

Type Description
int | None

Validated value.

Raises:

Type Description
ValueError

If value is not positive.

LoggingConfig

Bases: BaseModel

Configuration for logging.

Parameters:

Name Type Description Default
level str

Log level.

required
format str

Log format string.

required
file Path | None

Log file path.

required
console bool

Whether to log to console.

required

Examples:

>>> config = LoggingConfig()
>>> config.level
'INFO'
>>> config.console
True

ModelConfig

Bases: BaseModel

Configuration for language models.

Parameters:

Name Type Description Default
provider str

Model provider name.

required
model_name str

Model identifier.

required
batch_size int

Inference batch size.

required
device str

Device to use for computation.

required
max_length int

Maximum sequence length.

required
temperature float

Sampling temperature.

required
cache_outputs bool

Whether to cache model outputs.

required

Examples:

>>> config = ModelConfig()
>>> config.provider
'huggingface'
>>> config.device
'cpu'

PathsConfig

Bases: BaseModel

Configuration for file system paths.

Parameters:

Name Type Description Default
data_dir Path

Base directory for data files.

required
output_dir Path

Base directory for outputs.

required
cache_dir Path

Cache directory.

required
temp_dir Path | None

Temporary directory. If None, uses system temp.

required
create_dirs bool

Whether to create directories if they don't exist.

required

Examples:

>>> config = PathsConfig()
>>> config.data_dir
PosixPath('data')
>>> config = PathsConfig(data_dir=Path("/absolute/path"))
>>> config.data_dir
PosixPath('/absolute/path')

ResourceConfig

Bases: BaseModel

Configuration for external resources.

Parameters:

Name Type Description Default
lexicon_path Path | None

Path to lexicon file.

required
templates_path Path | None

Path to templates file.

required
constraints_path Path | None

Path to constraints file.

required
external_adapters list[str]

List of external adapters to enable.

required
cache_external bool

Whether to cache external resource lookups.

required

Examples:

>>> config = ResourceConfig()
>>> config.cache_external
True
>>> config.external_adapters
[]

SlotStrategyConfig

Bases: BaseModel

Configuration for a single slot's filling strategy.

Parameters:

Name Type Description Default
strategy

Filling strategy for this slot. Must be one of "exhaustive", "random", "stratified", or "mlm".

required
sample_size

Sample size for random or stratified strategies. Only used when strategy is "random" or "stratified".

required
stratify_by

Feature name to stratify by. Only used when strategy is "stratified".

required
beam_size

Beam size for MLM strategy. Only used when strategy is "mlm".

required

Examples:

>>> config = SlotStrategyConfig(strategy="exhaustive")
>>> config.strategy
'exhaustive'
>>> config_random = SlotStrategyConfig(strategy="random", sample_size=100)
>>> config_random.sample_size
100
>>> config_stratified = SlotStrategyConfig(
...     strategy="stratified", sample_size=50, stratify_by="pos"
... )
>>> config_stratified.stratify_by
'pos'
>>> config_mlm = SlotStrategyConfig(strategy="mlm", beam_size=10)
>>> config_mlm.beam_size
10

TemplateConfig

Bases: BaseModel

Configuration for template filling.

Parameters:

Name Type Description Default
filling_strategy str

Strategy name for filling templates ("exhaustive", "random", "stratified", "mlm", "mixed").

required
batch_size int

Batch size for filling operations.

required
max_combinations int | None

Maximum combinations to generate.

required
random_seed int | None

Random seed for reproducibility.

required
stream_mode bool

Use streaming for large templates.

required
use_csp_solver bool

Use CSP solver for templates with multi-slot constraints.

required
mlm_model_name str | None

HuggingFace model name for MLM filling.

required
mlm_beam_size int

Beam search width for MLM strategy.

required
mlm_fill_direction str

Direction for filling slots in MLM strategy.

required
mlm_custom_order list[int] | None

Custom slot fill order for MLM strategy.

required
mlm_top_k int

Number of top candidates per slot in MLM.

required
mlm_device str

Device for MLM inference.

required
mlm_cache_enabled bool

Enable content-addressable caching for MLM predictions.

required
mlm_cache_dir Path | None

Directory for MLM prediction cache.

required
slot_strategies dict[str, SlotStrategyConfig] | None

Per-slot strategy configuration for mixed filling. Maps slot names to SlotStrategyConfig instances.

required

Examples:

>>> config = TemplateConfig()
>>> config.filling_strategy
'exhaustive'
>>> config.batch_size
1000
>>> # MLM configuration
>>> config_mlm = TemplateConfig(
...     filling_strategy="mlm", mlm_model_name="bert-base-uncased"
... )
>>> config_mlm.mlm_beam_size
5
>>> # Mixed strategy configuration
>>> config_mixed = TemplateConfig(
...     filling_strategy="mixed",
...     mlm_model_name="bert-base-uncased",
...     slot_strategies={
...         "noun": SlotStrategyConfig(strategy="exhaustive"),
...         "verb": SlotStrategyConfig(strategy="exhaustive"),
...         "adjective": SlotStrategyConfig(strategy="mlm", beam_size=10)
...     }
... )
>>> config_mixed.slot_strategies["noun"].strategy
'exhaustive'
>>> config_mixed.slot_strategies["adjective"].beam_size
10

validate_max_combinations(v: int | None) -> int | None classmethod

Validate max_combinations is positive.

Parameters:

Name Type Description Default
v int | None

Max combinations value.

required

Returns:

Type Description
int | None

Validated value.

Raises:

Type Description
ValueError

If value is not positive.

validate_mlm_config() -> TemplateConfig

Validate MLM configuration is consistent.

Returns:

Type Description
TemplateConfig

Validated config.

Raises:

Type Description
ValueError

If MLM config is inconsistent.

get_default_config() -> BeadConfig

Get a copy of the default configuration.

Returns:

Type Description
BeadConfig

A deep copy of the default configuration.

Examples:

>>> from bead.config.defaults import get_default_config
>>> config = get_default_config()
>>> config.profile
'default'
>>> config.templates.batch_size
1000
Notes

Returns a deep copy to ensure modifications don't affect the original DEFAULT_CONFIG instance.

get_profile(name: str) -> BeadConfig

Get configuration profile by name.

Parameters:

Name Type Description Default
name str

Profile name. Must be one of: 'default', 'dev', 'prod', 'test'.

required

Returns:

Type Description
BeadConfig

Configuration for the specified profile.

Raises:

Type Description
ValueError

If profile name is not found in the registry.

Examples:

>>> from bead.config.profiles import get_profile
>>> config = get_profile("dev")
>>> config.profile
'dev'
>>> config.logging.level
'DEBUG'
>>> try:
...     get_profile("invalid")
... except ValueError as e:
...     print(str(e))
Profile 'invalid' not found. Available profiles: default, dev, prod, test

list_profiles() -> list[str]

Return list of available profile names.

Returns:

Type Description
list[str]

List of available profile names, sorted alphabetically.

Examples:

>>> from bead.config.profiles import list_profiles
>>> profiles = list_profiles()
>>> "default" in profiles
True
>>> "dev" in profiles
True
>>> "prod" in profiles
True
>>> "test" in profiles
True

load_config(config_path: Path | str | None = None, profile: str = 'default', **overrides: Any) -> BeadConfig

Load configuration from YAML file with optional overrides.

Precedence (lowest to highest): 1. Profile defaults 2. YAML file values 3. Keyword overrides

Parameters:

Name Type Description Default
config_path Path | str | None

Path to YAML config file. If None, uses profile defaults.

None
profile str

Profile to use as base (default, dev, prod, test).

'default'
**overrides Any

Direct overrides for config values.

{}

Returns:

Type Description
BeadConfig

Loaded and merged configuration.

Raises:

Type Description
FileNotFoundError

If config_path is specified but doesn't exist.

YAMLError

If YAML file is malformed.

ValidationError

If configuration is invalid.

Examples:

>>> config = load_config(profile="dev")
>>> config.profile
'dev'
>>> config = load_config(config_path="config.yaml", logging__level="DEBUG")
>>> config.logging.level
'DEBUG'

load_from_env(prefix: str = 'BEAD_') -> dict[str, Any]

Load configuration values from environment variables.

Converts environment variables with the given prefix to a nested configuration dictionary.

Parameters:

Name Type Description Default
prefix str

Environment variable prefix to filter on.

'BEAD_'

Returns:

Type Description
dict[str, Any]

Nested configuration dictionary from environment.

Examples:

>>> # With env var: BEAD_LOGGING__LEVEL=DEBUG
>>> load_from_env()
{'logging': {'level': 'DEBUG'}}

load_yaml_file(path: Path | str) -> dict[str, Any]

Load YAML file and return as dictionary.

Parameters:

Name Type Description Default
path Path | str

Path to YAML file.

required

Returns:

Type Description
dict[str, Any]

Parsed YAML content.

Raises:

Type Description
FileNotFoundError

If file doesn't exist.

YAMLError

If YAML is malformed.

merge_configs(base: dict[str, Any], override: dict[str, Any]) -> dict[str, Any]

Deep merge two configuration dictionaries.

Recursively merges override into base, with override values taking precedence.

Parameters:

Name Type Description Default
base dict[str, Any]

Base configuration dictionary.

required
override dict[str, Any]

Override configuration dictionary.

required

Returns:

Type Description
dict[str, Any]

Merged configuration dictionary.

Examples:

>>> base = {"a": 1, "b": {"c": 2}}
>>> override = {"b": {"d": 3}}
>>> merge_configs(base, override)
{'a': 1, 'b': {'c': 2, 'd': 3}}

save_yaml(config: BeadConfig, path: Path | str, include_defaults: bool = False, create_dirs: bool = True) -> None

Save configuration to YAML file.

Parameters:

Name Type Description Default
config BeadConfig

Configuration to save.

required
path Path | str

Path where YAML file should be saved.

required
include_defaults bool

If True, include all fields even if they have default values.

False
create_dirs bool

If True, create parent directories if they don't exist.

True

Raises:

Type Description
IOError

If file cannot be written.

FileNotFoundError

If create_dirs is False and parent directory doesn't exist.

Examples:

>>> from pathlib import Path
>>> from bead.config import get_default_config
>>> config = get_default_config()
>>> save_yaml(config, Path("config.yaml"))

to_yaml(config: BeadConfig, include_defaults: bool = False) -> str

Serialize configuration to YAML string.

Parameters:

Name Type Description Default
config BeadConfig

Configuration to serialize.

required
include_defaults bool

If True, include all fields even if they have default values. If False, only include non-default values.

False

Returns:

Type Description
str

YAML representation of configuration.

Examples:

>>> from bead.config import get_default_config
>>> config = get_default_config()
>>> yaml_str = to_yaml(config)
>>> 'profile: default' in yaml_str
True

validate_config(config: BeadConfig) -> list[str]

Perform pre-flight validation on configuration.

Checks: - All paths exist (if absolute paths are specified) - Resource paths exist (if specified) - Model configurations are compatible - Training configurations are valid - No conflicting settings

Parameters:

Name Type Description Default
config BeadConfig

Configuration to validate.

required

Returns:

Type Description
list[str]

List of validation errors. Empty if valid.

Examples:

>>> from bead.config import get_default_config
>>> config = get_default_config()
>>> errors = validate_config(config)
>>> len(errors)
0