bead.participants¶
Participant metadata models and collection management.
Models¶
models
¶
Participant data models.
This module provides Participant and ParticipantIDMapping models for storing participant information with privacy-preserving external ID mapping.
Participant
¶
Bases: BeadBaseModel
A study participant with demographic and session metadata.
Inherits UUID, timestamps, version, and metadata from BeadBaseModel.
The internal id (UUID) is used for all analysis; external IDs
(e.g., Prolific IDs) are stored separately for privacy.
Attributes:
| Name | Type | Description |
|---|---|---|
id |
UUID
|
Internal unique identifier (UUIDv7, inherited from BeadBaseModel). |
created_at |
datetime
|
When participant record was created (inherited). |
modified_at |
datetime
|
When participant record was last modified (inherited). |
participant_metadata |
dict[str, JsonValue]
|
Demographic and other participant attributes (e.g., age, education). Keys should match a ParticipantMetadataSpec for validation. |
study_id |
str | None
|
Optional study identifier this participant belongs to. |
session_ids |
list[str]
|
Session identifiers for this participant (for longitudinal studies). |
consent_timestamp |
datetime | None
|
When participant provided consent. |
notes |
str | None
|
Free-text notes about this participant. |
Examples:
>>> participant = Participant(
... participant_metadata={
... "age": 25,
... "education": "bachelors",
... "native_speaker": True,
... },
... study_id="study_001",
... )
>>> participant.participant_metadata["age"]
25
>>> str(participant.id)
'019...' # UUIDv7
validate_against_spec(spec: ParticipantMetadataSpec) -> tuple[bool, list[str]]
¶
Validate participant_metadata against a specification.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec
|
ParticipantMetadataSpec
|
Specification to validate against. |
required |
Returns:
| Type | Description |
|---|---|
tuple[bool, list[str]]
|
(is_valid, list of error messages) |
Examples:
>>> from bead.participants.metadata_spec import (
... FieldSpec, ParticipantMetadataSpec
... )
>>> spec = ParticipantMetadataSpec(
... name="test",
... fields=[FieldSpec(name="age", field_type="int", required=True)]
... )
>>> p = Participant(participant_metadata={"age": 25})
>>> p.validate_against_spec(spec)
(True, [])
get_attribute(key: str, default: JsonValue = None) -> JsonValue
¶
Get a metadata attribute with optional default.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
Attribute name. |
required |
default
|
JsonValue
|
Default value if attribute not found. |
None
|
Returns:
| Type | Description |
|---|---|
JsonValue
|
Attribute value or default. |
Examples:
set_attribute(key: str, value: JsonValue) -> None
¶
add_session(session_id: str) -> None
¶
ParticipantIDMapping
¶
Bases: BeadBaseModel
Mapping between external participant IDs and internal UUIDs.
This model is stored SEPARATELY from participant data for IRB/privacy compliance. The external ID (e.g., Prolific PID) can be deleted while retaining the internal UUID for analysis.
Attributes:
| Name | Type | Description |
|---|---|---|
id |
UUID
|
Unique identifier for this mapping record (inherited). |
external_id |
str
|
External participant identifier (e.g., Prolific PID). |
external_source |
str
|
Source of the external ID (e.g., "prolific", "mturk", "sona"). |
participant_id |
UUID
|
Internal participant UUID (references Participant.id). |
mapping_timestamp |
datetime
|
When this mapping was created. |
is_active |
bool
|
Whether this mapping is active (for soft deletion). |
Examples:
>>> from uuid import UUID
>>> mapping = ParticipantIDMapping(
... external_id="PROLIFIC_ABC123",
... external_source="prolific",
... participant_id=UUID("01234567-89ab-cdef-0123-456789abcdef"),
... )
>>> mapping.external_source
'prolific'
validate_non_empty(v: str) -> str
classmethod
¶
Validate string fields are non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
String value to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Validated string. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If string is empty or whitespace only. |
deactivate() -> None
¶
Soft-delete this mapping (for privacy compliance).
Sets is_active to False without deleting the record. This allows the mapping to be retained for audit purposes while marking it as no longer valid.
Examples:
Collection¶
collection
¶
Participant collection with JSONL I/O and DataFrame support.
This module provides ParticipantCollection and IDMappingCollection for managing multiple participants with JSONL serialization and pandas/polars DataFrame conversion for analysis.
ParticipantCollection
¶
Bases: BeadBaseModel
Collection of participants with JSONL I/O and DataFrame support.
Provides methods for managing multiple participants, saving/loading from JSONL files, and converting to pandas/polars DataFrames for analysis.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Name of this collection. |
participants |
list[Participant]
|
List of participants. |
metadata_spec_name |
str | None
|
Name of the metadata spec used (for documentation). |
Examples:
>>> collection = ParticipantCollection(name="study_001_participants")
>>> participant = Participant(
... participant_metadata={"age": 25, "education": "bachelors"}
... )
>>> collection.add_participant(participant)
>>> len(collection.participants)
1
>>> collection.to_jsonl("participants.jsonl")
validate_name(v: str) -> str
classmethod
¶
Validate name is non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
Collection name to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Validated collection name. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If name is empty or whitespace only. |
__len__() -> int
¶
Return number of participants.
Returns:
| Type | Description |
|---|---|
int
|
Number of participants in the collection. |
add_participant(participant: Participant) -> None
¶
Add a participant to the collection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
participant
|
Participant
|
Participant to add. |
required |
Examples:
add_participants(participants: list[Participant]) -> None
¶
Add multiple participants to the collection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
participants
|
list[Participant]
|
Participants to add. |
required |
Examples:
get_by_id(participant_id: UUID) -> Participant | None
¶
Get participant by UUID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
participant_id
|
UUID
|
Participant UUID to find. |
required |
Returns:
| Type | Description |
|---|---|
Participant | None
|
Participant if found, None otherwise. |
Examples:
get_by_attribute(key: str, value: JsonValue) -> list[Participant]
¶
Get participants by metadata attribute value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
Attribute name. |
required |
value
|
JsonValue
|
Value to match. |
required |
Returns:
| Type | Description |
|---|---|
list[Participant]
|
Participants with matching attribute. |
Examples:
validate_all(spec: ParticipantMetadataSpec) -> dict[UUID, list[str]]
¶
Validate all participants against a specification.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec
|
ParticipantMetadataSpec
|
Specification to validate against. |
required |
Returns:
| Type | Description |
|---|---|
dict[UUID, list[str]]
|
Mapping from participant ID to list of validation errors. Empty dict if all valid. |
Examples:
>>> from bead.participants.metadata_spec import (
... FieldSpec, ParticipantMetadataSpec
... )
>>> spec = ParticipantMetadataSpec(
... name="test",
... fields=[FieldSpec(name="age", field_type="int", required=True)]
... )
>>> collection = ParticipantCollection(name="test")
>>> p = Participant(participant_metadata={"age": 25})
>>> collection.add_participant(p)
>>> errors = collection.validate_all(spec)
>>> len(errors)
0
to_jsonl(path: Path | str) -> None
¶
from_jsonl(path: Path | str, name: str = 'loaded_participants') -> ParticipantCollection
classmethod
¶
Load participants from JSONL file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str
|
Path to JSONL file. |
required |
name
|
str
|
Name for the collection. |
'loaded_participants'
|
Returns:
| Type | Description |
|---|---|
ParticipantCollection
|
Collection with loaded participants. |
Examples:
to_dataframe(backend: Literal['pandas', 'polars'] = 'pandas', include_fields: list[str] | None = None, exclude_fields: list[str] | None = None, flatten_metadata: bool = True) -> DataFrame
¶
Convert to pandas or polars DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
backend
|
Literal['pandas', 'polars']
|
DataFrame backend to use (default: "pandas"). |
'pandas'
|
include_fields
|
list[str] | None
|
If provided, only include these metadata fields. |
None
|
exclude_fields
|
list[str] | None
|
If provided, exclude these metadata fields. |
None
|
flatten_metadata
|
bool
|
If True, flatten participant_metadata into top-level columns. |
True
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
pandas or polars DataFrame with participant data. Always includes 'participant_id' column (as string). |
Examples:
from_dataframe(df: DataFrame, name: str, id_column: str = 'participant_id', metadata_columns: list[str] | None = None) -> ParticipantCollection
classmethod
¶
Create collection from pandas or polars DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
pandas or polars DataFrame with participant data. |
required |
name
|
str
|
Name for the collection. |
required |
id_column
|
str
|
Column containing participant IDs (default: "participant_id"). If column exists, uses those UUIDs; otherwise generates new ones. |
'participant_id'
|
metadata_columns
|
list[str] | None
|
Columns to include in participant_metadata. If None, includes all columns except id_column. |
None
|
Returns:
| Type | Description |
|---|---|
ParticipantCollection
|
Collection with participants from DataFrame. |
Examples:
IDMappingCollection
¶
Bases: BeadBaseModel
Collection of ID mappings (stored separately for privacy).
This collection should be stored in a SEPARATE file from participant data for IRB/privacy compliance.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Name of this mapping collection. |
mappings |
list[ParticipantIDMapping]
|
List of ID mappings. |
source |
str
|
Primary source of external IDs (e.g., "prolific"). |
Examples:
>>> from uuid import uuid4
>>> collection = IDMappingCollection(name="study_001", source="prolific")
>>> mapping = collection.add_mapping("PROLIFIC_ABC123", uuid4())
>>> collection.get_participant_id("PROLIFIC_ABC123") is not None
True
validate_non_empty(v: str) -> str
classmethod
¶
Validate string fields are non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
String to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Validated string. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If string is empty or whitespace only. |
__len__() -> int
¶
Return number of mappings.
Returns:
| Type | Description |
|---|---|
int
|
Number of mappings in the collection. |
add_mapping(external_id: str, participant_id: UUID, external_source: str | None = None) -> ParticipantIDMapping
¶
Create and add a new ID mapping.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
external_id
|
str
|
External participant ID. |
required |
participant_id
|
UUID
|
Internal participant UUID. |
required |
external_source
|
str | None
|
Source of external ID (defaults to collection's source). |
None
|
Returns:
| Type | Description |
|---|---|
ParticipantIDMapping
|
The created mapping. |
Examples:
get_participant_id(external_id: str) -> UUID | None
¶
Look up internal participant ID from external ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
external_id
|
str
|
External ID to look up. |
required |
Returns:
| Type | Description |
|---|---|
UUID | None
|
Internal participant ID if found, None otherwise. |
Examples:
get_external_id(participant_id: UUID) -> str | None
¶
Look up external ID from internal participant ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
participant_id
|
UUID
|
Internal participant ID to look up. |
required |
Returns:
| Type | Description |
|---|---|
str | None
|
External ID if found, None otherwise. |
Examples:
deactivate_all() -> int
¶
Deactivate all mappings (for bulk privacy removal).
Returns:
| Type | Description |
|---|---|
int
|
Number of mappings deactivated. |
Examples:
to_jsonl(path: Path | str) -> None
¶
Write mappings to JSONL file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str
|
Path to output file. |
required |
Examples:
from_jsonl(path: Path | str, name: str = 'loaded_mappings', source: str = 'unknown') -> IDMappingCollection
classmethod
¶
Load mappings from JSONL file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str
|
Path to JSONL file. |
required |
name
|
str
|
Name for the collection. |
'loaded_mappings'
|
source
|
str
|
External ID source. |
'unknown'
|
Returns:
| Type | Description |
|---|---|
IDMappingCollection
|
Collection with loaded mappings. |
Examples:
Merging¶
merging
¶
Utilities for merging participant metadata with judgment data.
This module provides functions for joining participant metadata with judgment DataFrames for analysis. All functions support both pandas and polars DataFrames, preserving the input type.
merge_participant_metadata(judgments_df: DataFrame, participants: ParticipantCollection, id_column: str = 'participant_id', metadata_columns: list[str] | None = None, how: str = 'left') -> DataFrame
¶
Merge participant metadata into a judgments DataFrame.
Preserves input DataFrame type (pandas in -> pandas out, polars in -> polars out).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
judgments_df
|
DataFrame
|
DataFrame containing judgment data with participant IDs. |
required |
participants
|
ParticipantCollection
|
Collection of participants with metadata. |
required |
id_column
|
str
|
Column in judgments_df containing participant IDs (default: "participant_id"). |
'participant_id'
|
metadata_columns
|
list[str] | None
|
Specific metadata columns to include. If None, includes all. |
None
|
how
|
str
|
Merge type: "left", "inner", "outer" (default: "left"). |
'left'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Merged DataFrame with participant metadata columns added. |
Examples:
>>> import pandas as pd
>>> from bead.participants.collection import ParticipantCollection
>>> from bead.participants.models import Participant
>>> judgments = pd.DataFrame({
... "participant_id": ["uuid1", "uuid2"],
... "response": [5, 3],
... })
>>> collection = ParticipantCollection(name="test")
>>> # ... add participants ...
>>> # merged = merge_participant_metadata(judgments, collection)
resolve_external_ids(df: DataFrame, id_mappings: IDMappingCollection, external_id_column: str = 'PROLIFIC_PID', output_column: str = 'participant_id', drop_unresolved: bool = False) -> DataFrame
¶
Resolve external IDs to internal participant UUIDs.
Preserves input DataFrame type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame with external participant IDs. |
required |
id_mappings
|
IDMappingCollection
|
Collection of ID mappings. |
required |
external_id_column
|
str
|
Column containing external IDs (default: "PROLIFIC_PID"). |
'PROLIFIC_PID'
|
output_column
|
str
|
Column name for resolved UUIDs (default: "participant_id"). |
'participant_id'
|
drop_unresolved
|
bool
|
If True, drop rows with unresolved IDs (default: False). |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with resolved participant UUIDs. |
Examples:
>>> import pandas as pd
>>> from uuid import uuid4
>>> from bead.participants.collection import IDMappingCollection
>>> raw_data = pd.DataFrame({
... "PROLIFIC_PID": ["ABC123", "DEF456"],
... "response": [5, 3],
... })
>>> mappings = IDMappingCollection(name="test", source="prolific")
>>> pid = uuid4()
>>> mappings.add_mapping("ABC123", pid)
>>> resolved = resolve_external_ids(raw_data, mappings)
>>> output_column in resolved.columns
True
create_analysis_dataframe(judgments_df: DataFrame, participants: ParticipantCollection, id_mappings: IDMappingCollection | None = None, external_id_column: str | None = None, participant_id_column: str = 'participant_id', metadata_columns: list[str] | None = None) -> DataFrame
¶
Create analysis-ready DataFrame with resolved IDs and metadata.
Convenience function that: 1. Resolves external IDs to internal UUIDs (if id_mappings provided) 2. Merges participant metadata 3. Returns a clean DataFrame ready for analysis
Preserves input DataFrame type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
judgments_df
|
DataFrame
|
Raw judgment data. |
required |
participants
|
ParticipantCollection
|
Participant collection with metadata. |
required |
id_mappings
|
IDMappingCollection | None
|
ID mappings (required if external_id_column is provided). |
None
|
external_id_column
|
str | None
|
Column with external IDs to resolve. |
None
|
participant_id_column
|
str
|
Column with participant IDs (after resolution). |
'participant_id'
|
metadata_columns
|
list[str] | None
|
Metadata columns to include. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Analysis-ready DataFrame. |
Examples:
>>> import pandas as pd
>>> from bead.participants.collection import (
... ParticipantCollection, IDMappingCollection
... )
>>> raw_judgments = pd.DataFrame({
... "PROLIFIC_PID": ["ABC123"],
... "response": [5],
... })
>>> participants = ParticipantCollection(name="test")
>>> mappings = IDMappingCollection(name="test", source="prolific")
>>> # analysis_df = create_analysis_dataframe(
>>> # raw_judgments,
>>> # participants,
>>> # id_mappings=mappings,
>>> # external_id_column="PROLIFIC_PID",
>>> # )
build_participant_lookup(participants: ParticipantCollection, key_field: str | None = None) -> dict[str, dict[str, str | int | float | bool | None]]
¶
Build a lookup dictionary from participant collection.
Useful for manual merging or custom processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
participants
|
ParticipantCollection
|
Collection of participants. |
required |
key_field
|
str | None
|
If provided, use this metadata field as the key instead of UUID. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, dict[str, str | int | float | bool | None]]
|
Lookup from participant ID (or key_field) to metadata dict. |
Examples:
>>> from bead.participants.collection import ParticipantCollection
>>> from bead.participants.models import Participant
>>> collection = ParticipantCollection(name="test")
>>> p = Participant(participant_metadata={"age": 25})
>>> collection.add_participant(p)
>>> lookup = build_participant_lookup(collection)
>>> str(p.id) in lookup
True
Metadata Specification¶
metadata_spec
¶
Metadata specification for participant attributes.
This module provides FieldSpec and ParticipantMetadataSpec for defining configurable metadata fields with validation constraints (allowed values, ranges).
FieldSpec
¶
Bases: BaseModel
Specification for a single metadata field.
Defines the constraints and display properties for a participant metadata field. Used for validation and for generating demographics forms.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Field name (e.g., "age", "education"). Must be valid Python identifier. |
field_type |
Literal['int', 'float', 'str', 'bool']
|
Data type for the field. |
required |
bool
|
Whether this field is required (default: False). |
allowed_values |
list[str | int | float | bool] | None
|
Exhaustive list of allowed values (for categorical fields). If None, any value of the correct type is accepted. |
range |
Range[int] | Range[float] | None
|
Numeric range constraint (for int/float fields). |
label |
str | None
|
Display label for forms. If None, uses name with underscores replaced. |
description |
str | None
|
Help text / description for the field. |
Examples:
>>> age_spec = FieldSpec(
... name="age",
... field_type="int",
... required=True,
... range=Range[int](min=18, max=100),
... label="Age",
... description="Your age in years"
... )
>>> education_spec = FieldSpec(
... name="education",
... field_type="str",
... required=True,
... allowed_values=["high_school", "bachelors", "masters", "phd"],
... label="Highest Education Level"
... )
validate_name(v: str) -> str
classmethod
¶
Validate field name is non-empty and valid identifier.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
Field name to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Validated field name. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If field name is empty or not a valid Python identifier. |
validate_constraints() -> FieldSpec
¶
Validate that constraints are consistent with field_type.
Returns:
| Type | Description |
|---|---|
FieldSpec
|
The validated FieldSpec instance. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If constraints are inconsistent with field_type. |
validate_value(value: str | int | float | bool | None) -> bool
¶
Check if a value satisfies this field's constraints.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str | int | float | bool | None
|
Value to validate. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if value is valid, False otherwise. |
Examples:
get_display_label() -> str
¶
Get display label for forms.
Returns:
| Type | Description |
|---|---|
str
|
The label if set, otherwise name with underscores replaced by spaces and title-cased. |
Examples:
ParticipantMetadataSpec
¶
Bases: BaseModel
Specification for participant metadata schema.
Defines the allowed fields and their constraints for participant metadata. Used to validate participant data on ingestion and to generate demographics forms for experiments.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Name of this specification (e.g., "prolific_demographics"). |
version |
str
|
Version string for this spec. |
fields |
list[FieldSpec]
|
List of field specifications. |
Examples:
>>> spec = ParticipantMetadataSpec(
... name="standard_demographics",
... version="1.0.0",
... fields=[
... FieldSpec(
... name="age",
... field_type="int",
... range=Range[int](min=18, max=100)
... ),
... FieldSpec(
... name="education",
... field_type="str",
... allowed_values=["high_school", "bachelors", "masters", "phd"]
... ),
... FieldSpec(name="native_speaker", field_type="bool", required=True),
... ]
... )
>>> spec.get_field("age").range.min
18
validate_name(v: str) -> str
classmethod
¶
Validate spec name is non-empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
Spec name to validate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Validated spec name. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If name is empty. |
validate_unique_field_names(v: list[FieldSpec]) -> list[FieldSpec]
classmethod
¶
get_field(name: str) -> FieldSpec | None
¶
Get a field specification by name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Field name to look up. |
required |
Returns:
| Type | Description |
|---|---|
FieldSpec | None
|
The field spec if found, None otherwise. |
Examples:
get_required_fields() -> list[FieldSpec]
¶
Get all required field specifications.
Returns:
| Type | Description |
|---|---|
list[FieldSpec]
|
List of required fields. |
Examples:
validate_metadata(metadata: dict[str, str | int | float | bool | None]) -> tuple[bool, list[str]]
¶
Validate metadata against this specification.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata
|
dict[str, str | int | float | bool | None]
|
Metadata dictionary to validate. |
required |
Returns:
| Type | Description |
|---|---|
tuple[bool, list[str]]
|
(is_valid, list of error messages). Empty list if valid. |
Examples:
to_demographics_config() -> DemographicsConfig
¶
Convert this spec to a DemographicsConfig for deployment.
Creates a demographics form configuration that can be used in experiment deployment to collect participant data.
Returns:
| Type | Description |
|---|---|
DemographicsConfig
|
Demographics configuration for jsPsych deployment. |
Examples: