Calibration Run
What is a calibration run?
A calibration run lets you test how well the AI matches your screening judgment before committing to a full screen. Think of it as a trial run.
How it works
1. Go to the Criteria stage and click Start Calibration
2. Oryn selects 20 papers randomly from your project
3. You screen all 20 papers manually (Include/Exclude/Maybe)
4. The AI screens the same 20 papers blindly
5. Results are compared side by side
What you'll see
- Agreement rate: How often you and the AI made the same decision
- Cohen's kappa: A statistical measure of agreement (accounting for chance)
- Disagreement list: Each paper where you and the AI differed, with both explanations
Interpreting results
| Kappa | Interpretation |
|---|---|
| 0.81-1.00 | Almost perfect agreement |
| 0.61-0.80 | Substantial agreement |
| 0.41-0.60 | Moderate agreement |
| Below 0.40 | Consider refining your criteria |
What to do next
- If agreement is high (kappa > 0.6), you can confidently run the full AI screen
- If agreement is low, review the disagreements. Refine your criteria to be more specific, then run another calibration
- You can run as many calibration rounds as you need