Active Learning Module¶
The active learning module handles Stage 6 of the pipeline: training models on collected data with optional active learning loops.
GLMM Models¶
Bead uses Generalized Linear Mixed Models (GLMMs) to account for participant and item variability. Three mixed-effects modes balance complexity and data requirements.
Fixed Effects Only¶
Train without random effects:
uv run bead models train-model \
--task-type forced_choice \
--items items/2afc_pairs.jsonl \
--labels responses/labels.jsonl \
--model-name bert-base-uncased \
--mixed-effects-mode fixed \
--output-dir models/fixed_model/
Use when participant and item variability is minimal or when sample sizes are small.
Random Intercepts¶
Model participant and item baseline differences:
uv run bead models train-model \
--task-type forced_choice \
--items items/2afc_pairs.jsonl \
--labels responses/labels.jsonl \
--participant-ids responses/participant_ids.txt \
--model-name bert-base-uncased \
--mixed-effects-mode random_intercepts \
--output-dir models/random_intercepts_model/
Random intercepts capture that some participants are consistently stricter/lenient and some items are consistently easier/harder.
Random Slopes (Full Mixed Effects)¶
Model interactions between participants and items:
uv run bead models train-model \
--task-type forced_choice \
--items items/2afc_pairs.jsonl \
--labels responses/labels.jsonl \
--participant-ids responses/participant_ids.txt \
--model-name bert-base-uncased \
--mixed-effects-mode random_slopes \
--output-dir models/random_slopes_model/
Random slopes capture that participants respond differently to different items (participant × item interactions).
Task-Type Models¶
All 8 task types support GLMM training:
forced_choice: 2AFC, 3AFC, N-way choiceordinal_scale: Likert scales, slidersbinary: yes/no, true/falsecategorical: NLI, semantic relationsmulti_select: checkbox tasksmagnitude: reading time, confidencefree_text: open-ended textcloze: fill-in-the-blank
Specify the task type with --task-type:
uv run bead models train-model \
--task-type ordinal_scale \
--items items/likert7.jsonl \
--labels responses/likert_labels.jsonl \
--participant-ids responses/participant_ids.txt \
--model-name bert-base-uncased \
--mixed-effects-mode random_intercepts \
--output-dir models/ordinal_model/
LoRA Training¶
Use Low-Rank Adaptation for parameter-efficient fine-tuning:
uv run bead models train-model \
--task-type free_text \
--items items/paraphrase.jsonl \
--labels responses/paraphrase_labels.jsonl \
--model-name gpt2 \
--mixed-effects-mode fixed \
--use-lora \
--lora-rank 8 \
--lora-alpha 16 \
--output-dir models/lora_model/
LoRA reduces trainable parameters, enabling fine-tuning of larger models with limited compute.
Prediction¶
Generate predictions from trained models:
uv run bead models predict \
--model-dir models/random_intercepts_model/ \
--items items/new_items.jsonl \
--participant-ids participant_ids.txt \
--output predictions/predictions.jsonl
The --participant-ids file contains one participant ID per line, matching the order of items.
Probability Predictions¶
Get class probability distributions:
uv run bead models predict-proba \
--model-dir models/random_intercepts_model/ \
--items items/new_items.jsonl \
--participant-ids participant_ids.txt \
--output predictions/probabilities.json
Output is a JSON array of probability vectors.
Convergence Detection¶
Check if model performance matches human inter-annotator agreement:
uv run bead active-learning check-convergence \
--predictions predictions/model_preds.jsonl \
--human-labels responses/human_responses.jsonl \
--metric krippendorff_alpha \
--threshold 0.80
Output:
Krippendorff's alpha: 0.823
✓ Convergence threshold met (0.80)
Model performance matches human agreement
Supported metrics:
- krippendorff_alpha: inter-annotator agreement (works with all task types)
- fleiss_kappa: multi-rater agreement (categorical data)
- cohens_kappa: two-rater agreement (pairwise)
Evaluation¶
Model Evaluation¶
Compute standard metrics:
uv run bead training evaluate \
--model-dir models/random_intercepts_model/ \
--test-items items/test_set.jsonl \
--test-labels responses/test_labels.jsonl \
--metrics accuracy,f1,precision,recall \
--output evaluation/metrics.json
Cross-Validation¶
k-fold cross-validation with stratification:
uv run bead training cross-validate \
--items items/all.jsonl \
--labels responses/all_labels.jsonl \
--model-config config/model_config.yaml \
--k-folds 5 \
--stratify-by participant_id \
--output evaluation/cv_results.json
Stratifying by participant_id ensures participants don't appear in both train and test sets.
Learning Curve¶
Plot performance vs training set size:
uv run bead training learning-curve \
--items items/all.jsonl \
--labels responses/all_labels.jsonl \
--model-config config/model_config.yaml \
--train-sizes 0.1,0.2,0.5,0.8,1.0 \
--output evaluation/learning_curve.json
Inter-Annotator Agreement¶
Compute agreement among human annotators:
# Krippendorff's alpha (works with all task types)
uv run bead training compute-agreement \
--annotations responses/multi_annotator.jsonl \
--metric krippendorff_alpha \
--data-type ordinal
# Fleiss' kappa (categorical data, multiple raters)
uv run bead training compute-agreement \
--annotations responses/multi_annotator.jsonl \
--metric fleiss_kappa
# Cohen's kappa (pairwise agreement)
uv run bead training compute-agreement \
--annotations responses/two_annotators.jsonl \
--metric cohens_kappa
Data Collection¶
After deployment, collect responses from JATOS:
uv run bead training collect-data responses/raw_responses.jsonl \
--jatos-url https://jatos.example.com \
--api-token your-api-token \
--study-id 123
The command downloads all responses and converts them to bead's JSONL format.
Workflow Example¶
Complete training and convergence detection workflow:
# 1. Collect data from JATOS
uv run bead training collect-data responses/collected_data.jsonl \
--jatos-url https://jatos.example.com \
--api-token your-api-token \
--study-id 123
# 2. Train GLMM model
uv run bead models train-model \
--task-type forced_choice \
--items items/2afc_pairs.jsonl \
--labels responses/labels.jsonl \
--participant-ids responses/participant_ids.txt \
--model-name bert-base-uncased \
--mixed-effects-mode random_intercepts \
--output-dir models/trained_model/
# 3. Generate predictions on test set
uv run bead models predict \
--model-dir models/trained_model/ \
--items items/test_set.jsonl \
--participant-ids test_participant_ids.txt \
--output predictions/model_predictions.jsonl
# 4. Check convergence to human agreement
uv run bead active-learning check-convergence \
--predictions predictions/model_predictions.jsonl \
--human-labels responses/gold_standard.jsonl \
--metric krippendorff_alpha \
--threshold 0.80
# 5. Evaluate model performance
uv run bead training evaluate \
--model-dir models/trained_model/ \
--test-items items/test_set.jsonl \
--test-labels responses/test_labels.jsonl \
--metrics accuracy,f1,precision,recall \
--output evaluation/metrics.json
Next Steps¶
After training:
- Evaluate models with cross-validation
- Deploy models for real-time prediction
- Analyze learned representations
For complete API documentation, see bead.active_learning API reference.