bead.config¶
Configuration system with Pydantic models for YAML-based pipeline orchestration.
All configuration modules are documented here. See the Configuration Guide for usage examples.
config
¶
Configuration system for the bead pipeline.
Provides configuration models, default settings, and named profiles for development, testing, and production environments.
DEFAULT_CONFIG = BeadConfig(profile='default', paths=(PathsConfig()), resources=(ResourceConfig()), templates=(TemplateConfig()), items=(ItemConfig()), lists=(ListConfig()), deployment=(DeploymentConfig()), active_learning=(ActiveLearningConfig()), logging=(LoggingConfig()))
module-attribute
¶
Default configuration instance.
This configuration uses all default values from each config model. It's the base configuration used when no config file is provided.
Examples:
DEV_CONFIG = BeadConfig(profile='dev', paths=(PathsConfig(data_dir=(Path('data')), output_dir=(Path('output')), cache_dir=(Path('.cache')), temp_dir=(Path(gettempdir()) / 'bead_dev'), create_dirs=True)), resources=(ResourceConfig(cache_external=False)), templates=(TemplateConfig(filling_strategy='exhaustive', batch_size=100, stream_mode=False)), items=(ItemConfig(model=(ModelConfig(provider='huggingface', model_name='gpt2', batch_size=4, device='cpu')), parallel_processing=False)), lists=(ListConfig(num_lists=1)), deployment=(DeploymentConfig()), active_learning=(ActiveLearningConfig(forced_choice_model=(ForcedChoiceModelConfig(num_epochs=1, batch_size=8, learning_rate=2e-05)), trainer=(TrainerConfig(epochs=1)))), logging=(LoggingConfig(level='DEBUG', console=True)))
module-attribute
¶
Development configuration profile.
Optimized for: - Quick iteration and debugging - Verbose logging (DEBUG level) - Small batch sizes for fast feedback - No caching for fresh data - Simple single-threaded processing - Temporary directories for easy cleanup
Examples:
PROD_CONFIG = BeadConfig(profile='prod', paths=(PathsConfig(data_dir=(Path('/var/bead/data').absolute()), output_dir=(Path('/var/bead/output').absolute()), cache_dir=(Path('/var/bead/cache').absolute()), temp_dir=(Path('/var/bead/temp').absolute()), create_dirs=True)), resources=(ResourceConfig(cache_external=True)), templates=(TemplateConfig(filling_strategy='exhaustive', batch_size=10000, stream_mode=True)), items=(ItemConfig(model=(ModelConfig(provider='huggingface', model_name='gpt2', batch_size=32, device='cuda')), parallel_processing=True, num_workers=8)), lists=(ListConfig(num_lists=1)), deployment=(DeploymentConfig(apply_material_design=True, include_demographics=True, include_attention_checks=True)), active_learning=(ActiveLearningConfig(forced_choice_model=(ForcedChoiceModelConfig(num_epochs=10, batch_size=32, learning_rate=2e-05)), trainer=(TrainerConfig(epochs=10, use_wandb=True)))), logging=(LoggingConfig(level='WARNING', console=False, file=(Path('/var/log/bead/app.log')))))
module-attribute
¶
Production configuration profile.
Optimized for: - Maximum performance and throughput - Large batch sizes for efficiency - GPU acceleration (when available) - Parallel processing - External caching enabled - Minimal logging (WARNING level) - Absolute paths to production directories - Metrics tracking with W&B
Examples:
PROFILES: dict[str, BeadConfig] = {'default': BeadConfig(), 'dev': DEV_CONFIG, 'prod': PROD_CONFIG, 'test': TEST_CONFIG}
module-attribute
¶
TEST_CONFIG = BeadConfig(profile='test', paths=(PathsConfig(data_dir=(Path(gettempdir()) / 'bead_test' / 'data'), output_dir=(Path(gettempdir()) / 'bead_test' / 'output'), cache_dir=(Path(gettempdir()) / 'bead_test' / 'cache'), temp_dir=(Path(gettempdir()) / 'bead_test' / 'temp'), create_dirs=True)), resources=(ResourceConfig(cache_external=False)), templates=(TemplateConfig(filling_strategy='exhaustive', batch_size=10, max_combinations=100, random_seed=42)), items=(ItemConfig(model=(ModelConfig(provider='huggingface', model_name='gpt2', batch_size=1, device='cpu')), parallel_processing=False, num_workers=1)), lists=(ListConfig(num_lists=1, random_seed=42)), deployment=(DeploymentConfig(apply_material_design=False, include_demographics=False, include_attention_checks=False)), active_learning=(ActiveLearningConfig(forced_choice_model=(ForcedChoiceModelConfig(num_epochs=1, batch_size=2, learning_rate=2e-05)), trainer=(TrainerConfig(epochs=1, use_wandb=False)))), logging=(LoggingConfig(level='CRITICAL', console=False)))
module-attribute
¶
Test configuration profile.
Optimized for: - Fast test execution - Reproducibility (fixed random seeds) - Minimal resource usage - Tiny batch sizes - Temporary directories for isolation - Minimal logging (CRITICAL level) - No external dependencies - CPU-only execution
Examples:
ActiveLearningConfig
¶
Bases: BaseModel
Configuration for active learning infrastructure.
Reflects the bead/active_learning/ module structure: - models: Active learning models (ForcedChoiceModel, etc.) - trainers: Training infrastructure (HuggingFace, Lightning) - loop: Active learning loop orchestration - selection: Item selection strategies (uncertainty sampling, etc.)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
forced_choice_model
|
ForcedChoiceModelConfig
|
Configuration for forced choice models. |
required |
trainer
|
TrainerConfig
|
Configuration for trainers (HuggingFace, Lightning). |
required |
loop
|
ActiveLearningLoopConfig
|
Configuration for active learning loop. |
required |
uncertainty_sampler
|
UncertaintySamplerConfig
|
Configuration for uncertainty sampling strategies. |
required |
Examples:
>>> config = ActiveLearningConfig()
>>> config.forced_choice_model.model_name
'bert-base-uncased'
>>> config.trainer.trainer_type
'huggingface'
>>> config.loop.max_iterations
10
>>> config.uncertainty_sampler.method
'entropy'
BeadConfig
¶
Bases: BaseModel
Main configuration for the bead package.
Reflects the actual bead/ module structure: - active_learning: Active learning (models, trainers, loop, selection) - data_collection: Human data collection (JATOS, Prolific) - deployment: Experiment deployment (jsPsych, JATOS) - evaluation: Model evaluation and metrics - items: Item generation and management - lists: List construction and balancing - resources: Linguistic resources (VerbNet, PropBank, UniMorph) - templates: Template management
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
profile
|
str
|
Configuration profile name. |
required |
paths
|
PathsConfig
|
Paths configuration. |
required |
resources
|
ResourceConfig
|
Resources configuration. |
required |
templates
|
TemplateConfig
|
Templates configuration. |
required |
items
|
ItemConfig
|
Items configuration. |
required |
lists
|
ListConfig
|
Lists configuration. |
required |
deployment
|
DeploymentConfig
|
Deployment configuration. |
required |
active_learning
|
ActiveLearningConfig
|
Active learning configuration (models, trainers, loop, selection). |
required |
logging
|
LoggingConfig
|
Logging configuration. |
required |
Examples:
>>> config = BeadConfig()
>>> config.profile
'default'
>>> config.paths.data_dir
PosixPath('data')
>>> config.active_learning.forced_choice_model.model_name
'bert-base-uncased'
>>> config.active_learning.trainer.trainer_type
'huggingface'
>>> config.active_learning.loop.max_iterations
10
DeploymentConfig
¶
Bases: BaseModel
Configuration for experiment deployment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
platform
|
str
|
Deployment platform. |
required |
jspsych_version
|
str
|
jsPsych version to use. |
required |
apply_material_design
|
bool
|
Whether to use Material Design. |
required |
include_demographics
|
bool
|
Whether to include demographics survey. |
required |
include_attention_checks
|
bool
|
Whether to include attention checks. |
required |
jatos_export
|
bool
|
Whether to export to JATOS. |
required |
distribution_strategy
|
ListDistributionStrategy
|
List distribution strategy for batch experiments. Defaults to balanced assignment. |
required |
Examples:
>>> config = DeploymentConfig()
>>> config.platform
'jspsych'
>>> config.jspsych_version
'7.3.0'
>>> config.distribution_strategy.strategy_type
<DistributionStrategyType.BALANCED: 'balanced'>
ItemConfig
¶
Bases: BaseModel
Configuration for item generation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
ModelConfig
|
Model configuration. |
required |
apply_constraints
|
bool
|
Whether to apply model-based constraints. |
required |
track_metadata
|
bool
|
Whether to track item metadata. |
required |
parallel_processing
|
bool
|
Whether to use parallel processing. |
required |
num_workers
|
int
|
Number of workers for parallel processing. |
required |
Examples:
ListConfig
¶
Bases: BaseModel
Configuration for list partitioning.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
partitioning_strategy
|
str
|
Strategy name for partitioning. |
required |
num_lists
|
int
|
Number of lists to create. |
required |
items_per_list
|
int | None
|
Items per list. |
required |
balance_by
|
list[str]
|
Fields to balance on. |
required |
ensure_uniqueness
|
bool
|
Whether to ensure items are unique across lists. |
required |
random_seed
|
int | None
|
Random seed for reproducibility. |
required |
batch_constraints
|
list[BatchConstraintConfig] | None
|
Batch-level constraints to apply across all lists. |
required |
Examples:
validate_items_per_list(v: int | None) -> int | None
classmethod
¶
Validate items_per_list is positive.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
int | None
|
Items per list value. |
required |
Returns:
| Type | Description |
|---|---|
int | None
|
Validated value. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If value is not positive. |
LoggingConfig
¶
Bases: BaseModel
Configuration for logging.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
level
|
str
|
Log level. |
required |
format
|
str
|
Log format string. |
required |
file
|
Path | None
|
Log file path. |
required |
console
|
bool
|
Whether to log to console. |
required |
Examples:
ModelConfig
¶
Bases: BaseModel
Configuration for language models.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
provider
|
str
|
Model provider name. |
required |
model_name
|
str
|
Model identifier. |
required |
batch_size
|
int
|
Inference batch size. |
required |
device
|
str
|
Device to use for computation. |
required |
max_length
|
int
|
Maximum sequence length. |
required |
temperature
|
float
|
Sampling temperature. |
required |
cache_outputs
|
bool
|
Whether to cache model outputs. |
required |
Examples:
PathsConfig
¶
Bases: BaseModel
Configuration for file system paths.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_dir
|
Path
|
Base directory for data files. |
required |
output_dir
|
Path
|
Base directory for outputs. |
required |
cache_dir
|
Path
|
Cache directory. |
required |
temp_dir
|
Path | None
|
Temporary directory. If None, uses system temp. |
required |
create_dirs
|
bool
|
Whether to create directories if they don't exist. |
required |
Examples:
>>> config = PathsConfig()
>>> config.data_dir
PosixPath('data')
>>> config = PathsConfig(data_dir=Path("/absolute/path"))
>>> config.data_dir
PosixPath('/absolute/path')
ResourceConfig
¶
Bases: BaseModel
Configuration for external resources.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lexicon_path
|
Path | None
|
Path to lexicon file. |
required |
templates_path
|
Path | None
|
Path to templates file. |
required |
constraints_path
|
Path | None
|
Path to constraints file. |
required |
external_adapters
|
list[str]
|
List of external adapters to enable. |
required |
cache_external
|
bool
|
Whether to cache external resource lookups. |
required |
Examples:
SlotStrategyConfig
¶
Bases: BaseModel
Configuration for a single slot's filling strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strategy
|
Filling strategy for this slot. Must be one of "exhaustive", "random", "stratified", or "mlm". |
required | |
sample_size
|
Sample size for random or stratified strategies. Only used when strategy is "random" or "stratified". |
required | |
stratify_by
|
Feature name to stratify by. Only used when strategy is "stratified". |
required | |
beam_size
|
Beam size for MLM strategy. Only used when strategy is "mlm". |
required |
Examples:
>>> config = SlotStrategyConfig(strategy="exhaustive")
>>> config.strategy
'exhaustive'
>>> config_random = SlotStrategyConfig(strategy="random", sample_size=100)
>>> config_random.sample_size
100
>>> config_stratified = SlotStrategyConfig(
... strategy="stratified", sample_size=50, stratify_by="pos"
... )
>>> config_stratified.stratify_by
'pos'
>>> config_mlm = SlotStrategyConfig(strategy="mlm", beam_size=10)
>>> config_mlm.beam_size
10
TemplateConfig
¶
Bases: BaseModel
Configuration for template filling.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filling_strategy
|
str
|
Strategy name for filling templates ("exhaustive", "random", "stratified", "mlm", "mixed"). |
required |
batch_size
|
int
|
Batch size for filling operations. |
required |
max_combinations
|
int | None
|
Maximum combinations to generate. |
required |
random_seed
|
int | None
|
Random seed for reproducibility. |
required |
stream_mode
|
bool
|
Use streaming for large templates. |
required |
use_csp_solver
|
bool
|
Use CSP solver for templates with multi-slot constraints. |
required |
mlm_model_name
|
str | None
|
HuggingFace model name for MLM filling. |
required |
mlm_beam_size
|
int
|
Beam search width for MLM strategy. |
required |
mlm_fill_direction
|
str
|
Direction for filling slots in MLM strategy. |
required |
mlm_custom_order
|
list[int] | None
|
Custom slot fill order for MLM strategy. |
required |
mlm_top_k
|
int
|
Number of top candidates per slot in MLM. |
required |
mlm_device
|
str
|
Device for MLM inference. |
required |
mlm_cache_enabled
|
bool
|
Enable content-addressable caching for MLM predictions. |
required |
mlm_cache_dir
|
Path | None
|
Directory for MLM prediction cache. |
required |
slot_strategies
|
dict[str, SlotStrategyConfig] | None
|
Per-slot strategy configuration for mixed filling. Maps slot names to SlotStrategyConfig instances. |
required |
Examples:
>>> config = TemplateConfig()
>>> config.filling_strategy
'exhaustive'
>>> config.batch_size
1000
>>> # MLM configuration
>>> config_mlm = TemplateConfig(
... filling_strategy="mlm", mlm_model_name="bert-base-uncased"
... )
>>> config_mlm.mlm_beam_size
5
>>> # Mixed strategy configuration
>>> config_mixed = TemplateConfig(
... filling_strategy="mixed",
... mlm_model_name="bert-base-uncased",
... slot_strategies={
... "noun": SlotStrategyConfig(strategy="exhaustive"),
... "verb": SlotStrategyConfig(strategy="exhaustive"),
... "adjective": SlotStrategyConfig(strategy="mlm", beam_size=10)
... }
... )
>>> config_mixed.slot_strategies["noun"].strategy
'exhaustive'
>>> config_mixed.slot_strategies["adjective"].beam_size
10
validate_max_combinations(v: int | None) -> int | None
classmethod
¶
Validate max_combinations is positive.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
int | None
|
Max combinations value. |
required |
Returns:
| Type | Description |
|---|---|
int | None
|
Validated value. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If value is not positive. |
validate_mlm_config() -> TemplateConfig
¶
Validate MLM configuration is consistent.
Returns:
| Type | Description |
|---|---|
TemplateConfig
|
Validated config. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If MLM config is inconsistent. |
get_default_config() -> BeadConfig
¶
Get a copy of the default configuration.
Returns:
| Type | Description |
|---|---|
BeadConfig
|
A deep copy of the default configuration. |
Examples:
>>> from bead.config.defaults import get_default_config
>>> config = get_default_config()
>>> config.profile
'default'
>>> config.templates.batch_size
1000
Notes
Returns a deep copy to ensure modifications don't affect the original DEFAULT_CONFIG instance.
get_profile(name: str) -> BeadConfig
¶
Get configuration profile by name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Profile name. Must be one of: 'default', 'dev', 'prod', 'test'. |
required |
Returns:
| Type | Description |
|---|---|
BeadConfig
|
Configuration for the specified profile. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If profile name is not found in the registry. |
Examples:
list_profiles() -> list[str]
¶
Return list of available profile names.
Returns:
| Type | Description |
|---|---|
list[str]
|
List of available profile names, sorted alphabetically. |
Examples:
load_config(config_path: Path | str | None = None, profile: str = 'default', **overrides: Any) -> BeadConfig
¶
Load configuration from YAML file with optional overrides.
Precedence (lowest to highest): 1. Profile defaults 2. YAML file values 3. Keyword overrides
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_path
|
Path | str | None
|
Path to YAML config file. If None, uses profile defaults. |
None
|
profile
|
str
|
Profile to use as base (default, dev, prod, test). |
'default'
|
**overrides
|
Any
|
Direct overrides for config values. |
{}
|
Returns:
| Type | Description |
|---|---|
BeadConfig
|
Loaded and merged configuration. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If config_path is specified but doesn't exist. |
YAMLError
|
If YAML file is malformed. |
ValidationError
|
If configuration is invalid. |
Examples:
load_from_env(prefix: str = 'BEAD_') -> dict[str, Any]
¶
Load configuration values from environment variables.
Converts environment variables with the given prefix to a nested configuration dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prefix
|
str
|
Environment variable prefix to filter on. |
'BEAD_'
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Nested configuration dictionary from environment. |
Examples:
load_yaml_file(path: Path | str) -> dict[str, Any]
¶
Load YAML file and return as dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str
|
Path to YAML file. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Parsed YAML content. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If file doesn't exist. |
YAMLError
|
If YAML is malformed. |
merge_configs(base: dict[str, Any], override: dict[str, Any]) -> dict[str, Any]
¶
Deep merge two configuration dictionaries.
Recursively merges override into base, with override values taking precedence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base
|
dict[str, Any]
|
Base configuration dictionary. |
required |
override
|
dict[str, Any]
|
Override configuration dictionary. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Merged configuration dictionary. |
Examples:
save_yaml(config: BeadConfig, path: Path | str, include_defaults: bool = False, create_dirs: bool = True) -> None
¶
Save configuration to YAML file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
BeadConfig
|
Configuration to save. |
required |
path
|
Path | str
|
Path where YAML file should be saved. |
required |
include_defaults
|
bool
|
If True, include all fields even if they have default values. |
False
|
create_dirs
|
bool
|
If True, create parent directories if they don't exist. |
True
|
Raises:
| Type | Description |
|---|---|
IOError
|
If file cannot be written. |
FileNotFoundError
|
If create_dirs is False and parent directory doesn't exist. |
Examples:
to_yaml(config: BeadConfig, include_defaults: bool = False) -> str
¶
Serialize configuration to YAML string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
BeadConfig
|
Configuration to serialize. |
required |
include_defaults
|
bool
|
If True, include all fields even if they have default values. If False, only include non-default values. |
False
|
Returns:
| Type | Description |
|---|---|
str
|
YAML representation of configuration. |
Examples:
validate_config(config: BeadConfig) -> list[str]
¶
Perform pre-flight validation on configuration.
Checks: - All paths exist (if absolute paths are specified) - Resource paths exist (if specified) - Model configurations are compatible - Training configurations are valid - No conflicting settings
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
BeadConfig
|
Configuration to validate. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
List of validation errors. Empty if valid. |
Examples: