Placebo Calibration Tests
Calibration Summary
Placebo Test Results (All Dates)
| Placebo Date | Day # | HRV (RMSSD) p | Lowest HR p | Average HR p | Sleep Efficiency p |
|---|---|---|---|---|---|
| 2026-01-23 | 15 | p<0.001 | p<0.001 | p<0.001 | p=0.066 |
| 2026-01-24 | 16 | p<0.001 | p<0.001 | p<0.001 | p=0.028 |
| 2026-01-26 | 18 | p<0.001 | p<0.001 | p<0.001 | p=0.023 |
| 2026-01-27 | 19 | p<0.001 | p<0.001 | p<0.001 | p=0.018 |
| 2026-02-01 | 24 | p<0.001 | p<0.001 | p<0.001 | p=0.039 |
| 2026-02-05 | 28 | p<0.001 | p<0.001 | p<0.001 | p=0.035 |
| 2026-02-06 | 29 | p<0.001 | p<0.001 | p<0.001 | p=0.027 |
| 2026-02-07 | 30 | p<0.001 | p<0.001 | p<0.001 | p=0.049 |
| 2026-02-08 | 31 | p<0.001 | p<0.001 | p<0.001 | p=0.084 |
| 2026-02-09 | 32 | p<0.001 | p<0.001 | p<0.001 | p=0.195 |
| 2026-02-12 | 35 | p<0.001 | p<0.001 | p<0.001 | p=0.101 |
| 2026-02-14 | 37 | p<0.001 | p<0.001 | p<0.001 | p=0.041 |
| 2026-02-15 | 38 | p<0.001 | p<0.001 | p<0.001 | p=0.017 |
| 2026-02-16 | 39 | p<0.001 | p<0.001 | p<0.001 | p=0.012 |
| 2026-02-18 | 41 | p<0.001 | p<0.001 | p<0.001 | p=0.044 |
| 2026-02-19 | 42 | p<0.001 | p<0.001 | p<0.001 | p=0.102 |
| 2026-02-21 | 44 | p<0.001 | p<0.001 | p<0.001 | p=0.026 |
| 2026-02-22 | 45 | p<0.001 | p<0.001 | p<0.001 | p=0.066 |
| 2026-02-26 | 49 | p<0.001 | p<0.001 | p<0.001 | p=0.076 |
| 2026-02-28 | 51 | p<0.001 | p<0.001 | p<0.001 | p=0.232 |
False Positive Rate Summary
| Metric | Significant | Total | FPR | Assessment |
|---|---|---|---|---|
| HRV (RMSSD) | 20 | 20 | 100% | Liberal (FPR=100%, binom p=0.000) |
| Lowest HR | 20 | 20 | 100% | Liberal (FPR=100%, binom p=0.000) |
| Average HR | 20 | 20 | 100% | Liberal (FPR=100%, binom p=0.000) |
| Sleep Efficiency | 12 | 20 | 60% | Liberal (FPR=60%, binom p=0.000) |
P-Value Distribution Under Null
Methodology
Purpose: Placebo (falsification) tests check whether the statistical methods used for treatment-effect estimation produce false positives at the expected nominal rate. If they do, the p-values from the real analysis are trustworthy.
Method: 20 random dates were drawn (seed=42) from the pre-treatment period (2026-01-08 to 2026-03-15), each at least 14 days from the window edges. At each placebo date, the pre-treatment data was split and a two-sided Mann-Whitney U test was performed for each metric. CausalImpact was not available in this environment.
Expected result: ~5% of placebo tests should be significant (1 out of 20). If the observed FPR is much higher, the real treatment-effect p-values may be overconfident.
Interpretation:
- Well-calibrated: FPR within ~2 percentage points of 5%
- Conservative: FPR near 0% (tests are too strict, may miss real effects)
- Liberal: FPR significantly above 5% (tests find "effects" where none exist)