bead.participants¶
Participant metadata models and collection management.
Models¶
models
¶
Participant data models.
Participant stores demographic and session metadata.
ParticipantIDMapping records the link between an external participant
identifier (e.g. Prolific PID) and an internal UUID. The mapping is stored
separately so the external id can be deleted for privacy compliance while
the internal UUID is retained for analysis.
Participant
¶
Bases: BeadBaseModel
A study participant.
Attributes:
| Name | Type | Description |
|---|---|---|
participant_metadata |
dict[str, JsonValue]
|
Demographic and other participant attributes. |
study_id |
str | None
|
Study identifier. |
session_ids |
tuple[str, ...]
|
Session identifiers (for longitudinal studies). |
consent_timestamp |
datetime | None
|
When the participant provided consent. |
notes |
str | None
|
Free-text notes. |
validate_against_spec(spec: ParticipantMetadataSpec) -> tuple[bool, list[str]]
¶
Validate participant_metadata against spec.
Returns (is_valid, error_messages).
get_attribute(key: str, default: JsonValue = None) -> JsonValue
¶
Return participant_metadata[key] if present, else default.
with_attribute(key: str, value: JsonValue) -> Self
¶
Return a new participant with participant_metadata[key] = value.
with_session(session_id: str) -> Self
¶
Return a new participant with session_id appended.
ParticipantIDMapping
¶
Bases: BeadBaseModel
Mapping between an external participant ID and an internal UUID.
Attributes:
| Name | Type | Description |
|---|---|---|
external_id |
str
|
External participant identifier (e.g. Prolific PID). |
external_source |
str
|
Source of the external id ( |
participant_id |
UUID
|
Internal participant UUID. |
mapping_timestamp |
datetime
|
When the mapping was created. |
is_active |
bool
|
Whether the mapping is active (used for soft-delete). |
deactivated() -> Self
¶
Return a new mapping with is_active=False.
Collection¶
collection
¶
Participant collection with JSONL I/O and DataFrame support.
ParticipantCollection and IDMappingCollection group multiple
Participant and ParticipantIDMapping instances respectively, with
JSONL serialization and pandas / polars DataFrame conversion.
ParticipantCollection
¶
Bases: BeadBaseModel
Collection of participants.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Collection name. |
participants |
tuple[Participant, ...]
|
Member participants. |
metadata_spec_name |
str | None
|
Name of the metadata spec applied (for documentation). |
__len__() -> int
¶
Return the number of participants.
with_participant(participant: Participant) -> Self
¶
Return a new collection with participant appended.
with_participants(participants: tuple[Participant, ...] | list[Participant]) -> Self
¶
Return a new collection with each participant appended.
get_by_id(participant_id: UUID) -> Participant | None
¶
Return the participant whose id matches, or None.
get_by_attribute(key: str, value: JsonValue) -> tuple[Participant, ...]
¶
Return participants whose participant_metadata[key] == value.
validate_all(spec: ParticipantMetadataSpec) -> dict[UUID, list[str]]
¶
Validate every participant against spec.
Returns a mapping from offending participant id to error messages.
to_jsonl(path: Path | str) -> None
¶
Write each participant to path as a JSONL line.
from_jsonl(path: Path | str, name: str = 'loaded_participants') -> ParticipantCollection
classmethod
¶
Load participants from path as JSONL.
to_dataframe(backend: Literal['pandas', 'polars'] = 'pandas', include_fields: tuple[str, ...] | None = None, exclude_fields: tuple[str, ...] | None = None, flatten_metadata: bool = True) -> DataFrame
¶
Render the collection as a DataFrame.
Always emits participant_id, created_at, and study_id
columns; participant_metadata is flattened by default.
from_dataframe(df: DataFrame, name: str, id_column: str = 'participant_id', metadata_columns: tuple[str, ...] | None = None) -> ParticipantCollection
classmethod
¶
Build a collection from a DataFrame of participant rows.
Each row becomes a Participant. The id_column is consumed
as the participant UUID when present and parseable; otherwise a
new UUID is generated. metadata_columns (if given) restricts
which columns flow into participant_metadata.
IDMappingCollection
¶
Bases: BeadBaseModel
Collection of external-to-internal participant ID mappings.
Stored separately from participant data for IRB / privacy compliance.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Collection name. |
source |
str
|
Primary external ID source (e.g. |
mappings |
tuple[ParticipantIDMapping, ...]
|
Member mappings. |
__len__() -> int
¶
Return the number of mappings.
with_mapping(external_id: str, participant_id: UUID, external_source: str | None = None) -> tuple[Self, ParticipantIDMapping]
¶
Return (new_collection, mapping) with one new mapping appended.
get_participant_id(external_id: str) -> UUID | None
¶
Return the internal UUID for external_id if a live mapping exists.
get_external_id(participant_id: UUID) -> str | None
¶
Return the external id for participant_id if a live mapping exists.
deactivated_all() -> tuple[Self, int]
¶
Return (new_collection, count_deactivated) with all live mappings off.
to_jsonl(path: Path | str) -> None
¶
Write each mapping to path as a JSONL line.
from_jsonl(path: Path | str, name: str = 'loaded_mappings', source: str = 'unknown') -> IDMappingCollection
classmethod
¶
Load mappings from path as JSONL.
Merging¶
merging
¶
Utilities for merging participant metadata with judgment data.
This module provides functions for joining participant metadata with judgment DataFrames for analysis. All functions support both pandas and polars DataFrames, preserving the input type.
merge_participant_metadata(judgments_df: DataFrame, participants: ParticipantCollection, id_column: str = 'participant_id', metadata_columns: list[str] | None = None, how: str = 'left') -> DataFrame
¶
Merge participant metadata into a judgments DataFrame.
Preserves input DataFrame type (pandas in -> pandas out, polars in -> polars out).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
judgments_df
|
DataFrame
|
DataFrame containing judgment data with participant IDs. |
required |
participants
|
ParticipantCollection
|
Collection of participants with metadata. |
required |
id_column
|
str
|
Column in judgments_df containing participant IDs (default: "participant_id"). |
'participant_id'
|
metadata_columns
|
list[str] | None
|
Specific metadata columns to include. If None, includes all. |
None
|
how
|
str
|
Merge type: "left", "inner", "outer" (default: "left"). |
'left'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Merged DataFrame with participant metadata columns added. |
Examples:
>>> import pandas as pd
>>> from bead.participants.collection import ParticipantCollection
>>> from bead.participants.models import Participant
>>> judgments = pd.DataFrame({
... "participant_id": ["uuid1", "uuid2"],
... "response": [5, 3],
... })
>>> collection = ParticipantCollection(name="test")
>>> # ... add participants ...
>>> # merged = merge_participant_metadata(judgments, collection)
resolve_external_ids(df: DataFrame, id_mappings: IDMappingCollection, external_id_column: str = 'PROLIFIC_PID', output_column: str = 'participant_id', drop_unresolved: bool = False) -> DataFrame
¶
Resolve external IDs to internal participant UUIDs.
Preserves input DataFrame type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame with external participant IDs. |
required |
id_mappings
|
IDMappingCollection
|
Collection of ID mappings. |
required |
external_id_column
|
str
|
Column containing external IDs (default: "PROLIFIC_PID"). |
'PROLIFIC_PID'
|
output_column
|
str
|
Column name for resolved UUIDs (default: "participant_id"). |
'participant_id'
|
drop_unresolved
|
bool
|
If True, drop rows with unresolved IDs (default: False). |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with resolved participant UUIDs. |
Examples:
>>> import pandas as pd
>>> from uuid import uuid4
>>> from bead.participants.collection import IDMappingCollection
>>> raw_data = pd.DataFrame({
... "PROLIFIC_PID": ["ABC123", "DEF456"],
... "response": [5, 3],
... })
>>> mappings = IDMappingCollection(name="test", source="prolific")
>>> pid = uuid4()
>>> mappings.add_mapping("ABC123", pid)
>>> resolved = resolve_external_ids(raw_data, mappings)
>>> output_column in resolved.columns
True
create_analysis_dataframe(judgments_df: DataFrame, participants: ParticipantCollection, id_mappings: IDMappingCollection | None = None, external_id_column: str | None = None, participant_id_column: str = 'participant_id', metadata_columns: list[str] | None = None) -> DataFrame
¶
Create analysis-ready DataFrame with resolved IDs and metadata.
Convenience function that: 1. Resolves external IDs to internal UUIDs (if id_mappings provided) 2. Merges participant metadata 3. Returns a clean DataFrame ready for analysis
Preserves input DataFrame type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
judgments_df
|
DataFrame
|
Raw judgment data. |
required |
participants
|
ParticipantCollection
|
Participant collection with metadata. |
required |
id_mappings
|
IDMappingCollection | None
|
ID mappings (required if external_id_column is provided). |
None
|
external_id_column
|
str | None
|
Column with external IDs to resolve. |
None
|
participant_id_column
|
str
|
Column with participant IDs (after resolution). |
'participant_id'
|
metadata_columns
|
list[str] | None
|
Metadata columns to include. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Analysis-ready DataFrame. |
Examples:
>>> import pandas as pd
>>> from bead.participants.collection import (
... ParticipantCollection, IDMappingCollection
... )
>>> raw_judgments = pd.DataFrame({
... "PROLIFIC_PID": ["ABC123"],
... "response": [5],
... })
>>> participants = ParticipantCollection(name="test")
>>> mappings = IDMappingCollection(name="test", source="prolific")
>>> # analysis_df = create_analysis_dataframe(
>>> # raw_judgments,
>>> # participants,
>>> # id_mappings=mappings,
>>> # external_id_column="PROLIFIC_PID",
>>> # )
build_participant_lookup(participants: ParticipantCollection, key_field: str | None = None) -> dict[str, dict[str, str | int | float | bool | None]]
¶
Build a lookup dictionary from participant collection.
Useful for manual merging or custom processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
participants
|
ParticipantCollection
|
Collection of participants. |
required |
key_field
|
str | None
|
If provided, use this metadata field as the key instead of UUID. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, dict[str, str | int | float | bool | None]]
|
Lookup from participant ID (or key_field) to metadata dict. |
Examples:
>>> from bead.participants.collection import ParticipantCollection
>>> from bead.participants.models import Participant
>>> collection = ParticipantCollection(name="test")
>>> p = Participant(participant_metadata={"age": 25})
>>> collection.add_participant(p)
>>> lookup = build_participant_lookup(collection)
>>> str(p.id) in lookup
True
Metadata Specification¶
metadata_spec
¶
Metadata specification for participant attributes.
FieldSpec defines the constraints and display properties for a single
participant metadata field. ParticipantMetadataSpec is the schema for
the full set of participant fields.
FieldSpec
¶
Bases: BeadBaseModel
Specification for a single metadata field.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Field name. Must be a valid Python identifier. |
field_type |
Literal['int', 'float', 'str', 'bool']
|
Data type for the field. |
required |
bool
|
Whether the field must be supplied. |
allowed_values |
tuple[str | int | float | bool, ...] | None
|
Exhaustive list of allowed values (categorical fields). |
range |
Range[float] | None
|
Numeric range constraint (numeric fields). |
label |
str | None
|
Display label for forms. |
description |
str | None
|
Help text for the field. |
ParticipantMetadataSpec
¶
Bases: BeadBaseModel
Schema for participant metadata.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Spec name (e.g. |
version |
str
|
Spec version. |
fields |
tuple[FieldSpec, ...]
|
Field specifications. |
get_field(name: str) -> FieldSpec | None
¶
Return the field spec named name, or None.
get_required_fields() -> tuple[FieldSpec, ...]
¶
Return the required field specs.
validate_metadata(metadata: dict[str, str | int | float | bool | None]) -> tuple[bool, list[str]]
¶
Validate metadata against this spec.
Returns (is_valid, error_messages).
to_demographics_config() -> DemographicsConfig
¶
Render the spec as a DemographicsConfig for jsPsych deployment.
validate_field_spec(spec: FieldSpec) -> None
¶
Raise ValueError if spec's constraints contradict its type.
Validates that range is only used with numeric types and that every
value in allowed_values matches field_type.