bead.data_collection - bead Documentation

`jatos` ¶

JATOS data collection for model training.

This module provides the JATOSDataCollector class for downloading experimental results from JATOS servers. It wraps the existing JATOSClient and adds functionality for: - Downloading all results for a study - Filtering by component and worker type - Adding metadata (timestamps, etc.) - Saving to JSONLines format

`JATOSDataCollector` ¶

Collects experimental data from JATOS API.

This class wraps the existing JATOSClient to provide data collection functionality specifically for model training. It downloads results, adds metadata, and saves in JSONLines format.

Parameters:

Name	Type	Description	Default
`base_url`	`str`	JATOS instance URL (e.g., https://jatos.example.com).	required
`api_token`	`str`	API authentication token.	required
`study_id`	`int`	JATOS study ID to collect data from.	required

Attributes:

Name	Type	Description
`study_id`	`int`	JATOS study ID to collect data from.
`client`	`JATOSClient`	Underlying JATOS API client.

Examples:

Create a collector and download results::

collector = JATOSDataCollector(
    base_url="https://jatos.example.com",
    api_token="my-token",
    study_id=123
)
results = collector.download_results(Path("results.jsonl"))

`download_results(output_path: Path, component_id: int | None = None, worker_type: str | None = None) -> list[dict[str, JsonValue]]` ¶

Download all results for the study.

Downloads results from JATOS, optionally filtering by component ID and worker type. Each result is enriched with download timestamp metadata and saved to a JSONLines file (one result per line).

Parameters:

Name	Type	Description	Default
`output_path`	`Path`	Path to save results (JSONLines format).	required
`component_id`	`int \| None`	Filter by component ID (optional).	`None`
`worker_type`	`str \| None`	Filter by worker type (optional).	`None`

Returns:

Type	Description
`list[dict[str, JsonValue]]`	Downloaded results with metadata.

Raises:

Type	Description
`HTTPError`	If the API request fails.

Examples:

Download all results::

results = collector.download_results(Path("results.jsonl"))

Download with filters::

results = collector.download_results(
    Path("results.jsonl"),
    component_id=1,
    worker_type="Prolific"
)

`get_study_info() -> dict[str, JsonValue]` ¶

Get study information.

Delegates to the underlying JATOSClient.

Returns:

Type	Description
`dict[str, JsonValue]`	Study details dictionary.

Raises:

Type	Description
`HTTPError`	If the API request fails.

Examples:

::

info = collector.get_study_info()
print(info["title"])

`get_result_count() -> int` ¶

Get count of results.

Returns:

Type	Description
`int`	Number of results available for the study.

Raises:

Type	Description
`HTTPError`	If the API request fails.

Examples:

::

count = collector.get_result_count()
print(f"Found {count} results")

`prolific` ¶

Prolific data collection for model training.

This module provides the ProlificDataCollector class for downloading participant metadata and submissions from Prolific. It supports: - Downloading participant submissions with pagination - Filtering by submission status - Approving submissions - Getting study metadata

`ProlificDataCollector` ¶

Collects participant data from Prolific API.

This class interfaces with the Prolific API v1 to download participant submissions, demographics, and metadata for model training.

Parameters:

Name	Type	Description	Default
`api_key`	`str`	Prolific API key for authentication.	required
`study_id`	`str`	Prolific study ID to collect data from.	required

Attributes:

Name	Type	Description
`api_key`	`str`	Prolific API key for authentication.
`study_id`	`str`	Prolific study ID to collect data from.
`base_url`	`str`	Prolific API base URL.
`session`	`Session`	HTTP session with authentication headers.

Examples:

Create a collector and download submissions::

collector = ProlificDataCollector(
    api_key="my-api-key",
    study_id="abc123"
)
submissions = collector.download_submissions(Path("submissions.json"))

`download_submissions(output_path: Path, status: str | None = None) -> list[dict[str, JsonValue]]` ¶

Download participant submissions.

Downloads all submissions for the study, handling pagination automatically. Each submission is enriched with a download timestamp.

Parameters:

Name	Type	Description	Default
`output_path`	`Path`	Path to save submissions (JSON format).	required
`status`	`str \| None`	Filter by status (e.g., "APPROVED", "AWAITING REVIEW").	`None`

Returns:

Type	Description
`list[dict[str, JsonValue]]`	Downloaded submissions with metadata.

Raises:

Type	Description
`HTTPError`	If the API request fails.

Examples:

Download all submissions::

submissions = collector.download_submissions(Path("submissions.json"))

Download with status filter::

submissions = collector.download_submissions(
    Path("approved.json"),
    status="APPROVED"
)

`get_study_info() -> dict[str, JsonValue]` ¶

Get study information.

Returns:

Type	Description
`dict[str, JsonValue]`	Study details dictionary.

Raises:

Type	Description
`HTTPError`	If the API request fails.

Examples:

::

info = collector.get_study_info()
print(info["name"])

`approve_submissions(submission_ids: list[str]) -> None` ¶

Approve submissions.

Approves multiple submissions by transitioning their status to APPROVED.

Parameters:

Name	Type	Description	Default
`submission_ids`	`list[str]`	Submission IDs to approve.	required

Raises:

Type	Description
`HTTPError`	If the API request fails.

Examples:

::

collector.approve_submissions(["sub1", "sub2", "sub3"])

`merger` ¶

Data merger for JATOS and Prolific data.

This module provides the DataMerger class for merging experimental results from JATOS with participant metadata from Prolific. The merger matches records based on participant IDs and handles unmatched records gracefully.

`DataMerger` ¶

Merges JATOS results with Prolific metadata.

This class merges experimental data from JATOS with participant demographics and metadata from Prolific based on participant IDs.

Parameters:

Name	Type	Description	Default
`merge_key`	`str`	Key to merge on (e.g., "PROLIFIC_PID"). Default is "PROLIFIC_PID".	`'PROLIFIC_PID'`

Attributes:

Name	Type	Description
`merge_key`	`str`	Key to merge on (e.g., "PROLIFIC_PID").

Examples:

Create a merger with custom key::

merger = DataMerger(merge_key="PROLIFIC_PID")
merged_data = merger.merge(jatos_results, prolific_submissions)

`merge(jatos_results: list[dict[str, JsonValue]], prolific_submissions: list[dict[str, JsonValue]]) -> list[dict[str, JsonValue]]` ¶

Merge JATOS and Prolific data.

Merges experimental results from JATOS with participant submissions from Prolific by matching on participant IDs. Returns merged records with both JATOS data and Prolific metadata.

Parameters:

Name	Type	Description	Default
`jatos_results`	`list[dict[str, Any]]`	JATOS results from JATOSDataCollector.	required
`prolific_submissions`	`list[dict[str, Any]]`	Prolific submissions from ProlificDataCollector.	required

Returns:

Type	Description
`list[dict[str, JsonValue]]`	Merged data with structure: { "jatos_data": {...}, "prolific_metadata": {...} \| None, "merged": bool }

Examples:

::

jatos_results = [
    {"data": {"PROLIFIC_PID": "abc123"}, "metadata": {}}
]
prolific_submissions = [
    {"participant_id": "abc123", "status": "APPROVED"}
]
merged = merger.merge(jatos_results, prolific_submissions)
assert merged[0]["merged"] is True

`records` ¶

Bridge from JATOS results to bead annotation records.

JATOS returns experimental results as nested JSON: each study run contains a data array of trial objects, each carrying the metadata serialized by :func:bead.deployment.jspsych.trials._serialize_item_metadata and a jsPsych response field.

This module is the single canonical conversion from that representation into :class:~bead.evaluation.AnnotationRecord instances, the input shape consumed by every reliability, inter-annotator-agreement, and conditional-observation check in bead. There is no other path from raw JATOS output to bead records.

`jatos_results_to_annotation_records(results: Iterable[Mapping[str, Any]], *, annotator_id_key: str = 'PROLIFIC_PID') -> tuple[AnnotationRecord, ...]` ¶

Convert a sequence of JATOS results to :class:AnnotationRecords.

Each JATOS result is expected to be the dict shape returned by :class:~bead.data_collection.JATOSDataCollector (a study run with a data field carrying jsPsych trial dicts). Trials that lack item_id or template_name are silently skipped, since they correspond to non-question trials such as instructions or consent.

Parameters:

Name	Type	Description	Default
`results`	`Iterable[Mapping[str, Any]]`	JATOS result envelopes.	required
`annotator_id_key`	`str`	Query-parameter key carrying the annotator identifier. Defaults to `"PROLIFIC_PID"`.	`'PROLIFIC_PID'`

Returns:

Type	Description
`tuple[AnnotationRecord, ...]`	One record per (item, template_name) trial. Records appear in result-then-trial order. Trials missing required fields are skipped.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search

bead.data_collection¶

JATOS Integration¶

`jatos` ¶

`JATOSDataCollector` ¶

`download_results(output_path: Path, component_id: int | None = None, worker_type: str | None = None) -> list[dict[str, JsonValue]]` ¶

`get_study_info() -> dict[str, JsonValue]` ¶

`get_result_count() -> int` ¶

Prolific Integration¶

`prolific` ¶

`ProlificDataCollector` ¶

`download_submissions(output_path: Path, status: str | None = None) -> list[dict[str, JsonValue]]` ¶

`get_study_info() -> dict[str, JsonValue]` ¶

`approve_submissions(submission_ids: list[str]) -> None` ¶

Data Merging¶

`merger` ¶

`DataMerger` ¶

`merge(jatos_results: list[dict[str, JsonValue]], prolific_submissions: list[dict[str, JsonValue]]) -> list[dict[str, JsonValue]]` ¶

Annotation-Record Bridge¶

`records` ¶

`jatos_results_to_annotation_records(results: Iterable[Mapping[str, Any]], *, annotator_id_key: str = 'PROLIFIC_PID') -> tuple[AnnotationRecord, ...]` ¶

bead.data_collection¶

JATOS Integration¶

jatos ¶

JATOSDataCollector ¶

download_results(output_path: Path, component_id: int | None = None, worker_type: str | None = None) -> list[dict[str, JsonValue]] ¶

get_study_info() -> dict[str, JsonValue] ¶

get_result_count() -> int ¶

Prolific Integration¶

prolific ¶

ProlificDataCollector ¶

download_submissions(output_path: Path, status: str | None = None) -> list[dict[str, JsonValue]] ¶

get_study_info() -> dict[str, JsonValue] ¶

approve_submissions(submission_ids: list[str]) -> None ¶

Data Merging¶

merger ¶

DataMerger ¶

merge(jatos_results: list[dict[str, JsonValue]], prolific_submissions: list[dict[str, JsonValue]]) -> list[dict[str, JsonValue]] ¶

Annotation-Record Bridge¶

records ¶

jatos_results_to_annotation_records(results: Iterable[Mapping[str, Any]], *, annotator_id_key: str = 'PROLIFIC_PID') -> tuple[AnnotationRecord, ...] ¶

`jatos` ¶

`JATOSDataCollector` ¶

`download_results(output_path: Path, component_id: int | None = None, worker_type: str | None = None) -> list[dict[str, JsonValue]]` ¶

`get_study_info() -> dict[str, JsonValue]` ¶

`get_result_count() -> int` ¶

`prolific` ¶

`ProlificDataCollector` ¶

`download_submissions(output_path: Path, status: str | None = None) -> list[dict[str, JsonValue]]` ¶

`get_study_info() -> dict[str, JsonValue]` ¶

`approve_submissions(submission_ids: list[str]) -> None` ¶

`merger` ¶

`DataMerger` ¶

`merge(jatos_results: list[dict[str, JsonValue]], prolific_submissions: list[dict[str, JsonValue]]) -> list[dict[str, JsonValue]]` ¶

`records` ¶

`jatos_results_to_annotation_records(results: Iterable[Mapping[str, Any]], *, annotator_id_key: str = 'PROLIFIC_PID') -> tuple[AnnotationRecord, ...]` ¶