bead.data_collection¶

Data retrieval and integration from JATOS and Prolific platforms.

JATOS Integration¶

`jatos` ¶

JATOS data collection for model training.

This module provides the JATOSDataCollector class for downloading experimental results from JATOS servers. It wraps the existing JATOSClient and adds functionality for: - Downloading all results for a study - Filtering by component and worker type - Adding metadata (timestamps, etc.) - Saving to JSONLines format

`JATOSDataCollector` ¶

Collects experimental data from JATOS API.

This class wraps the existing JATOSClient to provide data collection functionality specifically for model training. It downloads results, adds metadata, and saves in JSONLines format.

Parameters:

Name	Type	Description	Default
`base_url`	`str`	JATOS instance URL (e.g., https://jatos.example.com).	required
`api_token`	`str`	API authentication token.	required
`study_id`	`int`	JATOS study ID to collect data from.	required

Attributes:

Name	Type	Description
`study_id`	`int`	JATOS study ID to collect data from.
`client`	`JATOSClient`	Underlying JATOS API client.

Examples:

Create a collector and download results::

collector = JATOSDataCollector(
    base_url="https://jatos.example.com",
    api_token="my-token",
    study_id=123
)
results = collector.download_results(Path("results.jsonl"))

`download_results(output_path: Path, component_id: int | None = None, worker_type: str | None = None) -> list[dict[str, JsonValue]]` ¶

Download all results for the study.

Downloads results from JATOS, optionally filtering by component ID and worker type. Each result is enriched with download timestamp metadata and saved to a JSONLines file (one result per line).

Parameters:

Name	Type	Description	Default
`output_path`	`Path`	Path to save results (JSONLines format).	required
`component_id`	`int \| None`	Filter by component ID (optional).	`None`
`worker_type`	`str \| None`	Filter by worker type (optional).	`None`

Returns:

Type	Description
`list[dict[str, JsonValue]]`	Downloaded results with metadata.

Raises:

Type	Description
`HTTPError`	If the API request fails.

Examples:

Download all results::

results = collector.download_results(Path("results.jsonl"))

Download with filters::

results = collector.download_results(
    Path("results.jsonl"),
    component_id=1,
    worker_type="Prolific"
)

`get_study_info() -> dict[str, JsonValue]` ¶

Get study information.

Delegates to the underlying JATOSClient.

Returns:

Type	Description
`dict[str, JsonValue]`	Study details dictionary.

Raises:

Type	Description
`HTTPError`	If the API request fails.

Examples:

::

info = collector.get_study_info()
print(info["title"])

`get_result_count() -> int` ¶

Get count of results.

Returns:

Type	Description
`int`	Number of results available for the study.

Raises:

Type	Description
`HTTPError`	If the API request fails.

Examples:

::

count = collector.get_result_count()
print(f"Found {count} results")

Prolific Integration¶

`prolific` ¶

Prolific data collection for model training.

This module provides the ProlificDataCollector class for downloading participant metadata and submissions from Prolific. It supports: - Downloading participant submissions with pagination - Filtering by submission status - Approving submissions - Getting study metadata

`ProlificDataCollector` ¶

Collects participant data from Prolific API.

This class interfaces with the Prolific API v1 to download participant submissions, demographics, and metadata for model training.

Parameters:

Name	Type	Description	Default
`api_key`	`str`	Prolific API key for authentication.	required
`study_id`	`str`	Prolific study ID to collect data from.	required

Attributes:

Name	Type	Description
`api_key`	`str`	Prolific API key for authentication.
`study_id`	`str`	Prolific study ID to collect data from.
`base_url`	`str`	Prolific API base URL.
`session`	`Session`	HTTP session with authentication headers.

Examples:

Create a collector and download submissions::

collector = ProlificDataCollector(
    api_key="my-api-key",
    study_id="abc123"
)
submissions = collector.download_submissions(Path("submissions.json"))

`download_submissions(output_path: Path, status: str | None = None) -> list[dict[str, JsonValue]]` ¶

Download participant submissions.

Downloads all submissions for the study, handling pagination automatically. Each submission is enriched with a download timestamp.

Parameters:

Name	Type	Description	Default
`output_path`	`Path`	Path to save submissions (JSON format).	required
`status`	`str \| None`	Filter by status (e.g., "APPROVED", "AWAITING REVIEW").	`None`

Returns:

Type	Description
`list[dict[str, JsonValue]]`	Downloaded submissions with metadata.

Raises:

Type	Description
`HTTPError`	If the API request fails.

Examples:

Download all submissions::

submissions = collector.download_submissions(Path("submissions.json"))

Download with status filter::

submissions = collector.download_submissions(
    Path("approved.json"),
    status="APPROVED"
)

`get_study_info() -> dict[str, JsonValue]` ¶

Get study information.

Returns:

Type	Description
`dict[str, JsonValue]`	Study details dictionary.

Raises:

Type	Description
`HTTPError`	If the API request fails.

Examples:

::

info = collector.get_study_info()
print(info["name"])

`approve_submissions(submission_ids: list[str]) -> None` ¶

Approve submissions.

Approves multiple submissions by transitioning their status to APPROVED.

Parameters:

Name	Type	Description	Default
`submission_ids`	`list[str]`	Submission IDs to approve.	required

Raises:

Type	Description
`HTTPError`	If the API request fails.

Examples:

::

collector.approve_submissions(["sub1", "sub2", "sub3"])

Data Merging¶

`merger` ¶

Data merger for JATOS and Prolific data.

This module provides the DataMerger class for merging experimental results from JATOS with participant metadata from Prolific. The merger matches records based on participant IDs and handles unmatched records gracefully.

`DataMerger` ¶

Merges JATOS results with Prolific metadata.

This class merges experimental data from JATOS with participant demographics and metadata from Prolific based on participant IDs.

Parameters:

Name	Type	Description	Default
`merge_key`	`str`	Key to merge on (e.g., "PROLIFIC_PID"). Default is "PROLIFIC_PID".	`'PROLIFIC_PID'`

Attributes:

Name	Type	Description
`merge_key`	`str`	Key to merge on (e.g., "PROLIFIC_PID").

Examples:

Create a merger with custom key::

merger = DataMerger(merge_key="PROLIFIC_PID")
merged_data = merger.merge(jatos_results, prolific_submissions)

`merge(jatos_results: list[dict[str, JsonValue]], prolific_submissions: list[dict[str, JsonValue]]) -> list[dict[str, JsonValue]]` ¶

Merge JATOS and Prolific data.

Merges experimental results from JATOS with participant submissions from Prolific by matching on participant IDs. Returns merged records with both JATOS data and Prolific metadata.

Parameters:

Name	Type	Description	Default
`jatos_results`	`list[dict[str, Any]]`	JATOS results from JATOSDataCollector.	required
`prolific_submissions`	`list[dict[str, Any]]`	Prolific submissions from ProlificDataCollector.	required

Returns:

Type	Description
`list[dict[str, JsonValue]]`	Merged data with structure: { "jatos_data": {...}, "prolific_metadata": {...} \| None, "merged": bool }

Examples:

::

jatos_results = [
    {"data": {"PROLIFIC_PID": "abc123"}, "metadata": {}}
]
prolific_submissions = [
    {"participant_id": "abc123", "status": "APPROVED"}
]
merged = merger.merge(jatos_results, prolific_submissions)
assert merged[0]["merged"] is True

bead.data_collection¶

JATOS Integration¶

jatos ¶

JATOSDataCollector ¶

download_results(output_path: Path, component_id: int | None = None, worker_type: str | None = None) -> list[dict[str, JsonValue]] ¶

get_study_info() -> dict[str, JsonValue] ¶

get_result_count() -> int ¶

Prolific Integration¶

prolific ¶

ProlificDataCollector ¶

download_submissions(output_path: Path, status: str | None = None) -> list[dict[str, JsonValue]] ¶

get_study_info() -> dict[str, JsonValue] ¶

approve_submissions(submission_ids: list[str]) -> None ¶

Data Merging¶

merger ¶

DataMerger ¶

merge(jatos_results: list[dict[str, JsonValue]], prolific_submissions: list[dict[str, JsonValue]]) -> list[dict[str, JsonValue]] ¶

`jatos` ¶

`JATOSDataCollector` ¶

`download_results(output_path: Path, component_id: int | None = None, worker_type: str | None = None) -> list[dict[str, JsonValue]]` ¶

`get_study_info() -> dict[str, JsonValue]` ¶

`get_result_count() -> int` ¶

`prolific` ¶

`ProlificDataCollector` ¶

`download_submissions(output_path: Path, status: str | None = None) -> list[dict[str, JsonValue]]` ¶

`get_study_info() -> dict[str, JsonValue]` ¶

`approve_submissions(submission_ids: list[str]) -> None` ¶

`merger` ¶

`DataMerger` ¶

`merge(jatos_results: list[dict[str, JsonValue]], prolific_submissions: list[dict[str, JsonValue]]) -> list[dict[str, JsonValue]]` ¶