bead.data_collection¶
Data retrieval and integration from JATOS and Prolific platforms.
JATOS Integration¶
jatos
¶
JATOS data collection for model training.
This module provides the JATOSDataCollector class for downloading experimental results from JATOS servers. It wraps the existing JATOSClient and adds functionality for: - Downloading all results for a study - Filtering by component and worker type - Adding metadata (timestamps, etc.) - Saving to JSONLines format
JATOSDataCollector
¶
Collects experimental data from JATOS API.
This class wraps the existing JATOSClient to provide data collection functionality specifically for model training. It downloads results, adds metadata, and saves in JSONLines format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_url
|
str
|
JATOS instance URL (e.g., https://jatos.example.com). |
required |
api_token
|
str
|
API authentication token. |
required |
study_id
|
int
|
JATOS study ID to collect data from. |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
study_id |
int
|
JATOS study ID to collect data from. |
client |
JATOSClient
|
Underlying JATOS API client. |
Examples:
Create a collector and download results::
collector = JATOSDataCollector(
base_url="https://jatos.example.com",
api_token="my-token",
study_id=123
)
results = collector.download_results(Path("results.jsonl"))
download_results(output_path: Path, component_id: int | None = None, worker_type: str | None = None) -> list[dict[str, JsonValue]]
¶
Download all results for the study.
Downloads results from JATOS, optionally filtering by component ID and worker type. Each result is enriched with download timestamp metadata and saved to a JSONLines file (one result per line).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_path
|
Path
|
Path to save results (JSONLines format). |
required |
component_id
|
int | None
|
Filter by component ID (optional). |
None
|
worker_type
|
str | None
|
Filter by worker type (optional). |
None
|
Returns:
| Type | Description |
|---|---|
list[dict[str, JsonValue]]
|
Downloaded results with metadata. |
Raises:
| Type | Description |
|---|---|
HTTPError
|
If the API request fails. |
Examples:
Download all results::
results = collector.download_results(Path("results.jsonl"))
Download with filters::
results = collector.download_results(
Path("results.jsonl"),
component_id=1,
worker_type="Prolific"
)
get_study_info() -> dict[str, JsonValue]
¶
Get study information.
Delegates to the underlying JATOSClient.
Returns:
| Type | Description |
|---|---|
dict[str, JsonValue]
|
Study details dictionary. |
Raises:
| Type | Description |
|---|---|
HTTPError
|
If the API request fails. |
Examples:
::
info = collector.get_study_info()
print(info["title"])
get_result_count() -> int
¶
Get count of results.
Returns:
| Type | Description |
|---|---|
int
|
Number of results available for the study. |
Raises:
| Type | Description |
|---|---|
HTTPError
|
If the API request fails. |
Examples:
::
count = collector.get_result_count()
print(f"Found {count} results")
Prolific Integration¶
prolific
¶
Prolific data collection for model training.
This module provides the ProlificDataCollector class for downloading participant metadata and submissions from Prolific. It supports: - Downloading participant submissions with pagination - Filtering by submission status - Approving submissions - Getting study metadata
ProlificDataCollector
¶
Collects participant data from Prolific API.
This class interfaces with the Prolific API v1 to download participant submissions, demographics, and metadata for model training.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_key
|
str
|
Prolific API key for authentication. |
required |
study_id
|
str
|
Prolific study ID to collect data from. |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
api_key |
str
|
Prolific API key for authentication. |
study_id |
str
|
Prolific study ID to collect data from. |
base_url |
str
|
Prolific API base URL. |
session |
Session
|
HTTP session with authentication headers. |
Examples:
Create a collector and download submissions::
collector = ProlificDataCollector(
api_key="my-api-key",
study_id="abc123"
)
submissions = collector.download_submissions(Path("submissions.json"))
download_submissions(output_path: Path, status: str | None = None) -> list[dict[str, JsonValue]]
¶
Download participant submissions.
Downloads all submissions for the study, handling pagination automatically. Each submission is enriched with a download timestamp.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_path
|
Path
|
Path to save submissions (JSON format). |
required |
status
|
str | None
|
Filter by status (e.g., "APPROVED", "AWAITING REVIEW"). |
None
|
Returns:
| Type | Description |
|---|---|
list[dict[str, JsonValue]]
|
Downloaded submissions with metadata. |
Raises:
| Type | Description |
|---|---|
HTTPError
|
If the API request fails. |
Examples:
Download all submissions::
submissions = collector.download_submissions(Path("submissions.json"))
Download with status filter::
submissions = collector.download_submissions(
Path("approved.json"),
status="APPROVED"
)
get_study_info() -> dict[str, JsonValue]
¶
Get study information.
Returns:
| Type | Description |
|---|---|
dict[str, JsonValue]
|
Study details dictionary. |
Raises:
| Type | Description |
|---|---|
HTTPError
|
If the API request fails. |
Examples:
::
info = collector.get_study_info()
print(info["name"])
approve_submissions(submission_ids: list[str]) -> None
¶
Approve submissions.
Approves multiple submissions by transitioning their status to APPROVED.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
submission_ids
|
list[str]
|
Submission IDs to approve. |
required |
Raises:
| Type | Description |
|---|---|
HTTPError
|
If the API request fails. |
Examples:
::
collector.approve_submissions(["sub1", "sub2", "sub3"])
Data Merging¶
merger
¶
Data merger for JATOS and Prolific data.
This module provides the DataMerger class for merging experimental results from JATOS with participant metadata from Prolific. The merger matches records based on participant IDs and handles unmatched records gracefully.
DataMerger
¶
Merges JATOS results with Prolific metadata.
This class merges experimental data from JATOS with participant demographics and metadata from Prolific based on participant IDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
merge_key
|
str
|
Key to merge on (e.g., "PROLIFIC_PID"). Default is "PROLIFIC_PID". |
'PROLIFIC_PID'
|
Attributes:
| Name | Type | Description |
|---|---|---|
merge_key |
str
|
Key to merge on (e.g., "PROLIFIC_PID"). |
Examples:
Create a merger with custom key::
merger = DataMerger(merge_key="PROLIFIC_PID")
merged_data = merger.merge(jatos_results, prolific_submissions)
merge(jatos_results: list[dict[str, JsonValue]], prolific_submissions: list[dict[str, JsonValue]]) -> list[dict[str, JsonValue]]
¶
Merge JATOS and Prolific data.
Merges experimental results from JATOS with participant submissions from Prolific by matching on participant IDs. Returns merged records with both JATOS data and Prolific metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
jatos_results
|
list[dict[str, Any]]
|
JATOS results from JATOSDataCollector. |
required |
prolific_submissions
|
list[dict[str, Any]]
|
Prolific submissions from ProlificDataCollector. |
required |
Returns:
| Type | Description |
|---|---|
list[dict[str, JsonValue]]
|
Merged data with structure: { "jatos_data": {...}, "prolific_metadata": {...} | None, "merged": bool } |
Examples:
::
jatos_results = [
{"data": {"PROLIFIC_PID": "abc123"}, "metadata": {}}
]
prolific_submissions = [
{"participant_id": "abc123", "status": "APPROVED"}
]
merged = merger.merge(jatos_results, prolific_submissions)
assert merged[0]["merged"] is True
Annotation-Record Bridge¶
Single canonical conversion from raw JATOS results to bead
:class:~bead.evaluation.AnnotationRecord instances, the input shape
consumed by every reliability and inter-annotator-agreement check.
records
¶
Bridge from JATOS results to bead annotation records.
JATOS returns experimental results as nested JSON: each study run
contains a data array of trial objects, each carrying the
metadata serialized by
:func:bead.deployment.jspsych.trials._serialize_item_metadata and a
jsPsych response field.
This module is the single canonical conversion from that
representation into :class:~bead.evaluation.AnnotationRecord
instances, the input shape consumed by every reliability,
inter-annotator-agreement, and conditional-observation check in bead.
There is no other path from raw JATOS output to bead records.
jatos_results_to_annotation_records(results: Iterable[Mapping[str, Any]], *, annotator_id_key: str = 'PROLIFIC_PID') -> tuple[AnnotationRecord, ...]
¶
Convert a sequence of JATOS results to :class:AnnotationRecords.
Each JATOS result is expected to be the dict shape returned by
:class:~bead.data_collection.JATOSDataCollector (a study run
with a data field carrying jsPsych trial dicts). Trials that
lack item_id or template_name are silently skipped, since
they correspond to non-question trials such as instructions or
consent.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results
|
Iterable[Mapping[str, Any]]
|
JATOS result envelopes. |
required |
annotator_id_key
|
str
|
Query-parameter key carrying the annotator identifier.
Defaults to |
'PROLIFIC_PID'
|
Returns:
| Type | Description |
|---|---|
tuple[AnnotationRecord, ...]
|
One record per (item, template_name) trial. Records appear in result-then-trial order. Trials missing required fields are skipped. |