Treatment Response Detection | Oura Digital Twin

EXEC SUMMARY

Executive Summary

SIGNIFICANT CHANGES

Info

3.0/ 6

Patient 1 post-Rux (Bonferroni-corrected)

IMPROVED METRICS

In range

6.0/ 6

Direction of change post-treatment

P2 EVENTS

Info

12.0detected

High-confidence consensus changepoints

DETECTION METHODS

Info

4.0methods

PELT + CUSUM + BOCPD + Rolling Window

HENRIK TREATMENT

Patient 1: Treatment Response Analysis

HRV (RMSSD)

In range

+33.6%change

Pre: 9.0 (n=64) | Post: 12.1 (n=24) p=p=0.006 (corrected) | d=-0.87 (large)

LOWEST HEART RATE

In range

-5.9%change

Pre: 76.7 (n=64) | Post: 72.2 (n=24) p=p=0.010 (corrected) | d=0.81 (large)

AVERAGE HEART RATE

In range

-5.3%change

Pre: 85.2 (n=64) | Post: 80.6 (n=24) p=p=0.014 (corrected) | d=0.68 (medium)

SLEEP EFFICIENCY

In range

+1.4%change

Pre: 78.6 (n=64) | Post: 79.8 (n=24) p=p=0.654 (corrected) | d=-0.30 (small)

DEEP SLEEP

In range

+2.3%change

Pre: 1.1 (n=64) | Post: 1.2 (n=24) p=p=1.000 (corrected) | d=-0.07 (negligible)

DAILY STEPS

In range

+60.0%change

Pre: 2390.7 (n=67) | Post: 3825.8 (n=28) p=p=1.000 (corrected) | d=-0.63 (medium)

Metric	Pre-Acute (< 2026-02-09)	Post-Acute / Pre-Rux (2026-02-09 - 2026-03-16)	Post-Rux (≥ 2026-03-16)
HRV (RMSSD)	7.6 (n=30)	10.4 (n=34)	12.1 (n=24)
Lowest Heart Rate	79.6 (n=30)	74.1 (n=34)	72.2 (n=24)
Average Heart Rate	88.7 (n=30)	82.0 (n=34)	80.6 (n=24)
Sleep Efficiency	78.1 (n=30)	79.1 (n=34)	79.8 (n=24)
Deep Sleep	1.2 (n=30)	1.0 (n=34)	1.2 (n=24)
Daily Steps	2473.1 (n=32)	2315.3 (n=35)	3825.8 (n=28)

MITCHELL CHANGEPOINTS

Patient 2: Discovered Changepoints

Date	Score	Confidence	Methods	Metrics
2021-05-31	8	HIGH	cusum, rolling_window	deep_sleep_hours, efficiency, hr_average, hr_lowest, hrv_average, steps
2021-10-29	8	HIGH	cusum, rolling_window	deep_sleep_hours, efficiency, hr_average, hr_lowest, hrv_average, steps
2022-01-23	8	HIGH	cusum, rolling_window	deep_sleep_hours, efficiency, hr_average, hr_lowest, hrv_average, steps
2022-05-06	8	HIGH	cusum, rolling_window	deep_sleep_hours, efficiency, hr_average, hr_lowest, hrv_average, steps
2023-07-06	8	HIGH	cusum, rolling_window	deep_sleep_hours, efficiency, hr_average, hr_lowest, hrv_average, steps
2023-12-30	8	HIGH	cusum, rolling_window	deep_sleep_hours, efficiency, hr_average, hr_lowest, hrv_average, steps
2022-08-08	7	HIGH	cusum, rolling_window	deep_sleep_hours, efficiency, hr_average, hr_lowest, steps
2024-06-03	6	HIGH	cusum, rolling_window	hr_average, hr_lowest, hrv_average, steps
2025-02-03	6	HIGH	cusum, rolling_window	efficiency, hr_average, hr_lowest, hrv_average
2024-01-22	5	HIGH	cusum, rolling_window	hr_average, hr_lowest, hrv_average
2025-02-11	5	HIGH	cusum, rolling_window	efficiency, hr_average, hrv_average
2025-03-25	5	HIGH	cusum, rolling_window	efficiency, hr_average, hrv_average
2021-02-20	2	low	cusum	steps
2025-12-02	2	low	cusum	hrv_average

COMPARATIVE DISTRIBUTIONS

Comparative Distributions

CONVERGENCE

Multi-Metric Convergence

Patient 1: 4 days with 3+ metrics deviating beyond 1.5 SD (systemic shift events).

Patient 2: 28 days with 3+ metrics deviating beyond 1.5 SD (systemic shift events).

Patient 2: 2 days with 3+ metrics deviating beyond 1.5 SD (systemic shift events).

METHODS APPENDIX

Methods Appendix

Changepoint Detection Methods

PELT (Penalized Exact Linear Time): Uses the ruptures library with RBF kernel to detect optimal changepoints. Signals are interpolated (for NaN) and standardized before fitting. Penalty is derived from BIC: 2 * log(n) * variance.

CUSUM (Cumulative Sum): Computes the cumulative sum of deviations from the overall mean. Second-derivative sign changes identify inflection points. Filtered by magnitude threshold (0.5 SD).

BOCPD (Bayesian Online Change Point Detection): Implements Adams & MacKay (2007) with Normal-Gamma conjugate prior. Hazard rate set to 1/30 for Patient 1 (shorter observation window) and 1/50 for Patient 2 (longer data span). Changepoints where posterior probability exceeds 0.3.

Rolling Window Comparison: Adjacent 14-day windows compared via Welch's t-test and Cohen's d. Dates flagged where p < 0.01 AND |d| > 0.5, indicating both statistical significance and practical effect size.

Statistical Tests

Pre/Post Comparison: Mann-Whitney U test (non-parametric, two-sided) with Bonferroni correction for 6 simultaneous comparisons. Effect size: Cohen's d with pooled standard deviation. Confidence intervals: bootstrap with 1,000 iterations.

Consensus Scoring: For Patient 2, all (method x metric) detections are clustered within a 3-day tolerance window. The consensus score counts the number of unique methods and metrics detecting each cluster. High confidence = score >= 3.