Oura Ring Gen 4 sensor data, not clinical measurementsN=1 case study, not validated for clinical decisionsHEV diagnosed Mar 18; Day 109 post-ruxolitinibMore
Consumer wearable data can support exploratory review only. The HEV diagnosis, temporally confounded with treatment start, remains a material confounder.

Treatment Response Detection

Module 2: Comparative Changepoint Analysis
EXEC SUMMARY

Executive Summary

SIGNIFICANT CHANGES
Info
3.0/ 6
Patient 1 post-Rux (Bonferroni-corrected)
IMPROVED METRICS
In range
6.0/ 6
Direction of change post-treatment
P2 EVENTS
Info
12.0detected
High-confidence consensus changepoints
DETECTION METHODS
Info
4.0methods
PELT + CUSUM + BOCPD + Rolling Window
HENRIK TREATMENT

Patient 1: Treatment Response Analysis

HRV (RMSSD)
In range
+33.6%change
Pre: 9.0 (n=64) | Post: 12.1 (n=24)<br>p=p=0.006 (corrected) | d=-0.87 (large)
LOWEST HEART RATE
In range
-5.9%change
Pre: 76.7 (n=64) | Post: 72.2 (n=24)<br>p=p=0.010 (corrected) | d=0.81 (large)
AVERAGE HEART RATE
In range
-5.3%change
Pre: 85.2 (n=64) | Post: 80.6 (n=24)<br>p=p=0.014 (corrected) | d=0.68 (medium)
SLEEP EFFICIENCY
In range
+1.4%change
Pre: 78.6 (n=64) | Post: 79.8 (n=24)<br>p=p=0.654 (corrected) | d=-0.30 (small)
DEEP SLEEP
In range
+2.3%change
Pre: 1.1 (n=64) | Post: 1.2 (n=24)<br>p=p=1.000 (corrected) | d=-0.07 (negligible)
DAILY STEPS
In range
+60.0%change
Pre: 2390.7 (n=67) | Post: 3825.8 (n=28)<br>p=p=1.000 (corrected) | d=-0.63 (medium)
MetricPre-Acute
(< 2026-02-09)
Post-Acute / Pre-Rux
(2026-02-09 - 2026-03-16)
Post-Rux
(≥ 2026-03-16)
HRV (RMSSD)7.6 (n=30)10.4 (n=34)12.1 (n=24)
Lowest Heart Rate79.6 (n=30)74.1 (n=34)72.2 (n=24)
Average Heart Rate88.7 (n=30)82.0 (n=34)80.6 (n=24)
Sleep Efficiency78.1 (n=30)79.1 (n=34)79.8 (n=24)
Deep Sleep1.2 (n=30)1.0 (n=34)1.2 (n=24)
Daily Steps2473.1 (n=32)2315.3 (n=35)3825.8 (n=28)
MITCHELL CHANGEPOINTS

Patient 2: Discovered Changepoints

DateScoreConfidenceMethodsMetrics
2021-05-318HIGHcusum, rolling_windowdeep_sleep_hours, efficiency, hr_average, hr_lowest, hrv_average, steps
2021-10-298HIGHcusum, rolling_windowdeep_sleep_hours, efficiency, hr_average, hr_lowest, hrv_average, steps
2022-01-238HIGHcusum, rolling_windowdeep_sleep_hours, efficiency, hr_average, hr_lowest, hrv_average, steps
2022-05-068HIGHcusum, rolling_windowdeep_sleep_hours, efficiency, hr_average, hr_lowest, hrv_average, steps
2023-07-068HIGHcusum, rolling_windowdeep_sleep_hours, efficiency, hr_average, hr_lowest, hrv_average, steps
2023-12-308HIGHcusum, rolling_windowdeep_sleep_hours, efficiency, hr_average, hr_lowest, hrv_average, steps
2022-08-087HIGHcusum, rolling_windowdeep_sleep_hours, efficiency, hr_average, hr_lowest, steps
2024-06-036HIGHcusum, rolling_windowhr_average, hr_lowest, hrv_average, steps
2025-02-036HIGHcusum, rolling_windowefficiency, hr_average, hr_lowest, hrv_average
2024-01-225HIGHcusum, rolling_windowhr_average, hr_lowest, hrv_average
2025-02-115HIGHcusum, rolling_windowefficiency, hr_average, hrv_average
2025-03-255HIGHcusum, rolling_windowefficiency, hr_average, hrv_average
2021-02-202lowcusumsteps
2025-12-022lowcusumhrv_average
COMPARATIVE DISTRIBUTIONS

Comparative Distributions

CONVERGENCE

Multi-Metric Convergence

Patient 1: 4 days with 3+ metrics deviating beyond 1.5 SD (systemic shift events).

Patient 2: 28 days with 3+ metrics deviating beyond 1.5 SD (systemic shift events).

Patient 2: 2 days with 3+ metrics deviating beyond 1.5 SD (systemic shift events).

METHODS APPENDIX

Methods Appendix

Changepoint Detection Methods

PELT (Penalized Exact Linear Time): Uses the ruptures library with RBF kernel to detect optimal changepoints. Signals are interpolated (for NaN) and standardized before fitting. Penalty is derived from BIC: 2 * log(n) * variance.

CUSUM (Cumulative Sum): Computes the cumulative sum of deviations from the overall mean. Second-derivative sign changes identify inflection points. Filtered by magnitude threshold (0.5 SD).

BOCPD (Bayesian Online Change Point Detection): Implements Adams & MacKay (2007) with Normal-Gamma conjugate prior. Hazard rate set to 1/30 for Patient 1 (shorter observation window) and 1/50 for Patient 2 (longer data span). Changepoints where posterior probability exceeds 0.3.

Rolling Window Comparison: Adjacent 14-day windows compared via Welch's t-test and Cohen's d. Dates flagged where p < 0.01 AND |d| > 0.5, indicating both statistical significance and practical effect size.

Statistical Tests

Pre/Post Comparison: Mann-Whitney U test (non-parametric, two-sided) with Bonferroni correction for 6 simultaneous comparisons. Effect size: Cohen's d with pooled standard deviation. Confidence intervals: bootstrap with 1,000 iterations.

Consensus Scoring: For Patient 2, all (method x metric) detections are clustered within a 3-day tolerance window. The consensus score counts the number of unique methods and metrics detecting each cluster. High confidence = score >= 3.