Skip to content

bead.data_collection

Data retrieval and integration from JATOS and Prolific platforms.

JATOS Integration

jatos

JATOS data collection for model training.

This module provides the JATOSDataCollector class for downloading experimental results from JATOS servers. It wraps the existing JATOSClient and adds functionality for: - Downloading all results for a study - Filtering by component and worker type - Adding metadata (timestamps, etc.) - Saving to JSONLines format

JATOSDataCollector

Collects experimental data from JATOS API.

This class wraps the existing JATOSClient to provide data collection functionality specifically for model training. It downloads results, adds metadata, and saves in JSONLines format.

Parameters:

Name Type Description Default
base_url str

JATOS instance URL (e.g., https://jatos.example.com).

required
api_token str

API authentication token.

required
study_id int

JATOS study ID to collect data from.

required

Attributes:

Name Type Description
study_id int

JATOS study ID to collect data from.

client JATOSClient

Underlying JATOS API client.

Examples:

Create a collector and download results::

collector = JATOSDataCollector(
    base_url="https://jatos.example.com",
    api_token="my-token",
    study_id=123
)
results = collector.download_results(Path("results.jsonl"))

download_results(output_path: Path, component_id: int | None = None, worker_type: str | None = None) -> list[dict[str, JsonValue]]

Download all results for the study.

Downloads results from JATOS, optionally filtering by component ID and worker type. Each result is enriched with download timestamp metadata and saved to a JSONLines file (one result per line).

Parameters:

Name Type Description Default
output_path Path

Path to save results (JSONLines format).

required
component_id int | None

Filter by component ID (optional).

None
worker_type str | None

Filter by worker type (optional).

None

Returns:

Type Description
list[dict[str, JsonValue]]

Downloaded results with metadata.

Raises:

Type Description
HTTPError

If the API request fails.

Examples:

Download all results::

results = collector.download_results(Path("results.jsonl"))

Download with filters::

results = collector.download_results(
    Path("results.jsonl"),
    component_id=1,
    worker_type="Prolific"
)

get_study_info() -> dict[str, JsonValue]

Get study information.

Delegates to the underlying JATOSClient.

Returns:

Type Description
dict[str, JsonValue]

Study details dictionary.

Raises:

Type Description
HTTPError

If the API request fails.

Examples:

::

info = collector.get_study_info()
print(info["title"])

get_result_count() -> int

Get count of results.

Returns:

Type Description
int

Number of results available for the study.

Raises:

Type Description
HTTPError

If the API request fails.

Examples:

::

count = collector.get_result_count()
print(f"Found {count} results")

Prolific Integration

prolific

Prolific data collection for model training.

This module provides the ProlificDataCollector class for downloading participant metadata and submissions from Prolific. It supports: - Downloading participant submissions with pagination - Filtering by submission status - Approving submissions - Getting study metadata

ProlificDataCollector

Collects participant data from Prolific API.

This class interfaces with the Prolific API v1 to download participant submissions, demographics, and metadata for model training.

Parameters:

Name Type Description Default
api_key str

Prolific API key for authentication.

required
study_id str

Prolific study ID to collect data from.

required

Attributes:

Name Type Description
api_key str

Prolific API key for authentication.

study_id str

Prolific study ID to collect data from.

base_url str

Prolific API base URL.

session Session

HTTP session with authentication headers.

Examples:

Create a collector and download submissions::

collector = ProlificDataCollector(
    api_key="my-api-key",
    study_id="abc123"
)
submissions = collector.download_submissions(Path("submissions.json"))

download_submissions(output_path: Path, status: str | None = None) -> list[dict[str, JsonValue]]

Download participant submissions.

Downloads all submissions for the study, handling pagination automatically. Each submission is enriched with a download timestamp.

Parameters:

Name Type Description Default
output_path Path

Path to save submissions (JSON format).

required
status str | None

Filter by status (e.g., "APPROVED", "AWAITING REVIEW").

None

Returns:

Type Description
list[dict[str, JsonValue]]

Downloaded submissions with metadata.

Raises:

Type Description
HTTPError

If the API request fails.

Examples:

Download all submissions::

submissions = collector.download_submissions(Path("submissions.json"))

Download with status filter::

submissions = collector.download_submissions(
    Path("approved.json"),
    status="APPROVED"
)

get_study_info() -> dict[str, JsonValue]

Get study information.

Returns:

Type Description
dict[str, JsonValue]

Study details dictionary.

Raises:

Type Description
HTTPError

If the API request fails.

Examples:

::

info = collector.get_study_info()
print(info["name"])

approve_submissions(submission_ids: list[str]) -> None

Approve submissions.

Approves multiple submissions by transitioning their status to APPROVED.

Parameters:

Name Type Description Default
submission_ids list[str]

Submission IDs to approve.

required

Raises:

Type Description
HTTPError

If the API request fails.

Examples:

::

collector.approve_submissions(["sub1", "sub2", "sub3"])

Data Merging

merger

Data merger for JATOS and Prolific data.

This module provides the DataMerger class for merging experimental results from JATOS with participant metadata from Prolific. The merger matches records based on participant IDs and handles unmatched records gracefully.

DataMerger

Merges JATOS results with Prolific metadata.

This class merges experimental data from JATOS with participant demographics and metadata from Prolific based on participant IDs.

Parameters:

Name Type Description Default
merge_key str

Key to merge on (e.g., "PROLIFIC_PID"). Default is "PROLIFIC_PID".

'PROLIFIC_PID'

Attributes:

Name Type Description
merge_key str

Key to merge on (e.g., "PROLIFIC_PID").

Examples:

Create a merger with custom key::

merger = DataMerger(merge_key="PROLIFIC_PID")
merged_data = merger.merge(jatos_results, prolific_submissions)

merge(jatos_results: list[dict[str, JsonValue]], prolific_submissions: list[dict[str, JsonValue]]) -> list[dict[str, JsonValue]]

Merge JATOS and Prolific data.

Merges experimental results from JATOS with participant submissions from Prolific by matching on participant IDs. Returns merged records with both JATOS data and Prolific metadata.

Parameters:

Name Type Description Default
jatos_results list[dict[str, Any]]

JATOS results from JATOSDataCollector.

required
prolific_submissions list[dict[str, Any]]

Prolific submissions from ProlificDataCollector.

required

Returns:

Type Description
list[dict[str, JsonValue]]

Merged data with structure: { "jatos_data": {...}, "prolific_metadata": {...} | None, "merged": bool }

Examples:

::

jatos_results = [
    {"data": {"PROLIFIC_PID": "abc123"}, "metadata": {}}
]
prolific_submissions = [
    {"participant_id": "abc123", "status": "APPROVED"}
]
merged = merger.merge(jatos_results, prolific_submissions)
assert merged[0]["merged"] is True