bead.data_collection¶
Data retrieval and integration from JATOS and Prolific platforms.
JATOS Integration¶
jatos
¶
JATOS data collection for model training.
This module provides the JATOSDataCollector class for downloading experimental results from JATOS servers. It wraps the existing JATOSClient and adds functionality for: - Downloading all results for a study - Filtering by component and worker type - Adding metadata (timestamps, etc.) - Saving to JSONLines format
JATOSDataCollector
¶
Collects experimental data from JATOS API.
This class wraps the existing JATOSClient to provide data collection functionality specifically for model training. It downloads results, adds metadata, and saves in JSONLines format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_url
|
str
|
JATOS instance URL (e.g., https://jatos.example.com). |
required |
api_token
|
str
|
API authentication token. |
required |
study_id
|
int
|
JATOS study ID to collect data from. |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
study_id |
int
|
JATOS study ID to collect data from. |
client |
JATOSClient
|
Underlying JATOS API client. |
Examples:
Create a collector and download results::
collector = JATOSDataCollector(
base_url="https://jatos.example.com",
api_token="my-token",
study_id=123
)
results = collector.download_results(Path("results.jsonl"))
download_results(output_path: Path, component_id: int | None = None, worker_type: str | None = None) -> list[dict[str, JsonValue]]
¶
Download all results for the study.
Downloads results from JATOS, optionally filtering by component ID and worker type. Each result is enriched with download timestamp metadata and saved to a JSONLines file (one result per line).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_path
|
Path
|
Path to save results (JSONLines format). |
required |
component_id
|
int | None
|
Filter by component ID (optional). |
None
|
worker_type
|
str | None
|
Filter by worker type (optional). |
None
|
Returns:
| Type | Description |
|---|---|
list[dict[str, JsonValue]]
|
Downloaded results with metadata. |
Raises:
| Type | Description |
|---|---|
HTTPError
|
If the API request fails. |
Examples:
Download all results::
results = collector.download_results(Path("results.jsonl"))
Download with filters::
results = collector.download_results(
Path("results.jsonl"),
component_id=1,
worker_type="Prolific"
)
get_study_info() -> dict[str, JsonValue]
¶
Get study information.
Delegates to the underlying JATOSClient.
Returns:
| Type | Description |
|---|---|
dict[str, JsonValue]
|
Study details dictionary. |
Raises:
| Type | Description |
|---|---|
HTTPError
|
If the API request fails. |
Examples:
::
info = collector.get_study_info()
print(info["title"])
get_result_count() -> int
¶
Get count of results.
Returns:
| Type | Description |
|---|---|
int
|
Number of results available for the study. |
Raises:
| Type | Description |
|---|---|
HTTPError
|
If the API request fails. |
Examples:
::
count = collector.get_result_count()
print(f"Found {count} results")
Prolific Integration¶
prolific
¶
Prolific data collection for model training.
This module provides the ProlificDataCollector class for downloading participant metadata and submissions from Prolific. It supports: - Downloading participant submissions with pagination - Filtering by submission status - Approving submissions - Getting study metadata
ProlificDataCollector
¶
Collects participant data from Prolific API.
This class interfaces with the Prolific API v1 to download participant submissions, demographics, and metadata for model training.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_key
|
str
|
Prolific API key for authentication. |
required |
study_id
|
str
|
Prolific study ID to collect data from. |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
api_key |
str
|
Prolific API key for authentication. |
study_id |
str
|
Prolific study ID to collect data from. |
base_url |
str
|
Prolific API base URL. |
session |
Session
|
HTTP session with authentication headers. |
Examples:
Create a collector and download submissions::
collector = ProlificDataCollector(
api_key="my-api-key",
study_id="abc123"
)
submissions = collector.download_submissions(Path("submissions.json"))
download_submissions(output_path: Path, status: str | None = None) -> list[dict[str, JsonValue]]
¶
Download participant submissions.
Downloads all submissions for the study, handling pagination automatically. Each submission is enriched with a download timestamp.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_path
|
Path
|
Path to save submissions (JSON format). |
required |
status
|
str | None
|
Filter by status (e.g., "APPROVED", "AWAITING REVIEW"). |
None
|
Returns:
| Type | Description |
|---|---|
list[dict[str, JsonValue]]
|
Downloaded submissions with metadata. |
Raises:
| Type | Description |
|---|---|
HTTPError
|
If the API request fails. |
Examples:
Download all submissions::
submissions = collector.download_submissions(Path("submissions.json"))
Download with status filter::
submissions = collector.download_submissions(
Path("approved.json"),
status="APPROVED"
)
get_study_info() -> dict[str, JsonValue]
¶
Get study information.
Returns:
| Type | Description |
|---|---|
dict[str, JsonValue]
|
Study details dictionary. |
Raises:
| Type | Description |
|---|---|
HTTPError
|
If the API request fails. |
Examples:
::
info = collector.get_study_info()
print(info["name"])
approve_submissions(submission_ids: list[str]) -> None
¶
Approve submissions.
Approves multiple submissions by transitioning their status to APPROVED.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
submission_ids
|
list[str]
|
Submission IDs to approve. |
required |
Raises:
| Type | Description |
|---|---|
HTTPError
|
If the API request fails. |
Examples:
::
collector.approve_submissions(["sub1", "sub2", "sub3"])
Data Merging¶
merger
¶
Data merger for JATOS and Prolific data.
This module provides the DataMerger class for merging experimental results from JATOS with participant metadata from Prolific. The merger matches records based on participant IDs and handles unmatched records gracefully.
DataMerger
¶
Merges JATOS results with Prolific metadata.
This class merges experimental data from JATOS with participant demographics and metadata from Prolific based on participant IDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
merge_key
|
str
|
Key to merge on (e.g., "PROLIFIC_PID"). Default is "PROLIFIC_PID". |
'PROLIFIC_PID'
|
Attributes:
| Name | Type | Description |
|---|---|---|
merge_key |
str
|
Key to merge on (e.g., "PROLIFIC_PID"). |
Examples:
Create a merger with custom key::
merger = DataMerger(merge_key="PROLIFIC_PID")
merged_data = merger.merge(jatos_results, prolific_submissions)
merge(jatos_results: list[dict[str, JsonValue]], prolific_submissions: list[dict[str, JsonValue]]) -> list[dict[str, JsonValue]]
¶
Merge JATOS and Prolific data.
Merges experimental results from JATOS with participant submissions from Prolific by matching on participant IDs. Returns merged records with both JATOS data and Prolific metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
jatos_results
|
list[dict[str, Any]]
|
JATOS results from JATOSDataCollector. |
required |
prolific_submissions
|
list[dict[str, Any]]
|
Prolific submissions from ProlificDataCollector. |
required |
Returns:
| Type | Description |
|---|---|
list[dict[str, JsonValue]]
|
Merged data with structure: { "jatos_data": {...}, "prolific_metadata": {...} | None, "merged": bool } |
Examples:
::
jatos_results = [
{"data": {"PROLIFIC_PID": "abc123"}, "metadata": {}}
]
prolific_submissions = [
{"participant_id": "abc123", "status": "APPROVED"}
]
merged = merger.merge(jatos_results, prolific_submissions)
assert merged[0]["merged"] is True