When we treat a patient, we are guided by the ethical principle of beneficence: to do good for them. How often do we achieve that goal? The best available data on interventions tested in randomized trials suggest that, even for medications that are efficacious on a population level, our interventions often make no difference on an individual level.
Number Needed to Treat: A Double-Edged Sword
This concept can best be understood using the number needed to treat (NNT).1 If a novel drug has an NNT of ten—a fairly robust effect by most standards—we are implying that we would need to treat ten patients to prevent one from experiencing the outcome of interest. In other words, 90% of individuals who received this highly efficacious drug would actually derive no benefit. Are we meeting our ethical obligation to this 90%? Do physicians owe our patients an explanation of this phenomenon when we prescribe a medication? Should we say that “this medication probably won’t make a difference in your life, but there is a chance that it may. I suggest you take it.”?
The NNT is a particularly meaningful metric in situations where a substantial relative treatment effect exists, but the occurrence of outcomes is low. For example, the Systolic Blood Pressure Intervention Trial was stopped early due to a marked observed treatment effect, with a 5.2% occurrence of the composite cardiovascular end point in the intensive BP control group compared with 6.8% in the control group (P<0.001).2 Although the relative effect size [1−(5.2/6.8)=24%] was large, the absolute effect size (6.8−5.2=1.6%) led to an NNT of around 60, because the vast majority of patients do not experience any of the cardiovascular events in the composite outcome.
How do we reconcile these results? The study was “positive” by conventional statistical definitions; however, we realize that we will need to treat 60 people to prevent one cardiovascular outcome. Does the 1.6% chance of (substantial) benefit outweigh the costs of treating 60 individuals (where costs can be literal dollars or complications and adverse effects of treatment)?
Broadly, the NNT of several common interventions for chronic diseases is depressing. However, if we broaden the perspective to the general population, treatments with high NNTs might still have significant benefits, particularly if the disease is common. Warfarin (versus aspirin) to prevent stroke in atrial fibrillation has an NNT of 80.3 Yet, we rarely tell patients that, except for in 1.25% of people, this drug (which requires extensive monitoring) will perform no better than aspirin. In the United States, however, hospitalizations with atrial fibrillation as the primary diagnosis number roughly 470,000 per year. This suggests that universal warfarin treatment would lead to >5875 fewer strokes per year—a clear societal benefit, although most treated patients would not experience that benefit individually.
Looking at the NNT shows how our obligations, as physicians, to our patients are potentially in conflict with our obligation to society at large. Can this issue be fixed? Broadly, this is the goal of personalized medicine—to identify the patients who will benefit the most from an intervention. To date, personalized medicine efforts have focused on genetic and biomarker measurement, but clinical data can be used as well, although there are significant hurdles to be crossed.
Risk-Based Targeting of Interventions
The first step is a recognition that the NNT is not uniform across all populations. It depends strongly on the baseline event rate, which may differ on a patient-to-patient level on the basis of a variety of other factors (such as severity of illness, family support, cointerventions, etc.). To use our warfarin example again, among those with a congestive heart failure, hypertension, age, diabetes, stroke (CHADS2) score of one, the NNT of warfarin versus aspirin to prevent stroke is 53. Among those with a CHADS2 score of six, the NNT is eight, suggesting that targeting warfarin to the subpopulation of patients with a high CHADS2 score (as is the common practice) may provide the most overall benefit.
The above approach to target a subgroup is achieved by prognostic modeling, and it is used frequently. Prognostic models, which are often refined into risk scores and calculators, estimate the probability of a given outcome. Prognostic models have the advantage of being able to be developed on observational data, but they have a significant weakness: not all interventions work better on those at higher risk for future events. This finding was driven home in a paper examining prognostic stratification in 32 large clinical trials. Although the absolute risk reduction was usually higher in the high-risk subgroups, the relative risk reduction was rarely different.4 In other words, prognostic targeting does not seem to identify “responders”—individuals who will disproportionately benefit from the intervention—but rather, it identifies those at high baseline risk. Efficient targeting of therapies requires maximizing the absolute risk reduction by identifying both those at high baseline risk and those in whom the relative risk reduction is largest.
A case in point is the use of tamoxifen in women with breast cancer. The drug is uniquely effective among those with hormone receptor–positive breast cancer, a group that, in fact, is at lower risk of breast cancer mortality overall.5 Targeting the use of tamoxifen to women with breast cancer at high risk of death would be ineffective (those at highest risk are hormone receptor negative). This is a case where the relative risk of an outcome is not constant as baseline risk increases.
Individual Treatment Effect Targeting of Therapies
By contrast, individual treatment effect (ITE) modeling attempts to predict who will benefit from a drug or intervention by identifying characteristics associated with greater than average relative risk reduction at the individual level.6,7 Broadly speaking, this is the goal of personalized medicine—where individual patient characteristics are used to predict therapeutic response. Typically, this is presented in a genetic framework, but it is conceivable that any collection of unique patient-level data can be used to identify this subgroup.
A perfect ITE model would allow targeting of an intervention such that the NNT would be one—all patients so targeted would experience the outcome without the intervention, and none would experience it if they received the intervention. However, with current statistical approaches, ITE models cannot be created outside a randomized trial. A comparison of the two modeling strategies is in Table 1. There are multiple statistical approaches to ITE modeling that can be broadly classified into one- or two-equation models. In a typical two-equation model, the first equation models the risk of outcome under control conditions, and the second equation models the risk of outcome under treatment conditions. The difference in predicted outcome is the ITE, where higher values predict a better treatment response. Single-equation models include several machine-learning algorithms, such as the “uplift random forest,”8 as well as models that attempt to group trial participants into “good targets” versus “bad targets” (also known as a class variable transformation).9
Prognostic versus individual treatment effect modeling
Saving the Most Lives May Entail Treating Fewer People
Our goal in identifying a subpopulation for targeted therapy is not to maximize the NNT. The goal is to maximize the total events avoided (or lives saved) in the entire population regardless of treatment status. The concept is illustrated by a series of curves (known as uplift curves) in Figure 1. Imagine that we have an intervention that, if given to everyone in a population, would save 1000 lives. We can then posit what might happen if we choose (or are forced) to limit who we treat (Figure 1). If we treat randomly, then our population benefit decreases as we give the intervention to fewer and fewer people. If, instead, we treat prognostically—allocating treatment to the sickest individuals—we may save more lives than expected by chance but may miss some individuals with low baseline risk who nevertheless may have been saved by the intervention. However, if we target with ITE modeling, we achieve our maximum population benefit and can even exceed 1000 lives saved by excluding from treatment those who would be killed by the intervention.
Uplift curves demonstrate improvement of outcomes above expected when more narrow populations are targeted. Three “uplift curves” that illustrate (using hypothetical data) the number of lives saved under various targeting paradigms. With random targeting of an intervention to a population, the number of lives saved decreases linearly. Prognostic targeting may improve on random targeting, but it may fail to treat the individuals with low baseline risk who, nevertheless, would die without the intervention. Individual treatment effect (ITE) models maximize the area under the uplift curve by targeting those most likely to benefit from an intervention. This graph posits a “perfect” ITE model, which is unlikely to ever be created, but nevertheless, it shows how this approach may be superior to prognostic modeling when it comes to population benefit. Dashed lines highlight outcomes when 90% and 10% of the eligible population are targeted. ITE models may improve on a “treat everyone” approach by excluding individuals who may be harmed by the intervention, and they may preserve performance over narrower target ranges by excluding individuals whose outcomes will not be affected by the intervention.
Aside from the requirement for randomization, ITE models suffer from a low signal-to-noise ratio, because factors not related to the randomized intervention (like patient age and comorbidities) are often much more strongly related to the ultimate outcome than the intervention itself. The risk of “learning” noise is thus greater than what is seen in prognostic modeling. This means that proactive steps to avoid model overfitting are critical. Steps that are useful include ensuring adequate (and multiple) internal crossvalidations and mechanisms to combine sparse data into aggregate metrics (such as denoising autoencoders, a common machine-learning technique).10
ITE modeling can incorporate many covariates into a single score. Rather than noting that a treatment seems to uniquely benefit patients with diabetes, for example, one could include diabetes, congestive heart failure, or any other collected baseline variables (even gene variants) into the model. This avoids the multiple comparisons issue that can arise when multiple subgroups analyses are conducted. Beyond that, it would allow for true individualization of therapy, and it would allow a doctor to inform a patient of his or her individual chance of success with the drug, thereby balancing patient preferences with clinical benefits.
Finally, ITE modeling does not necessarily incorporate harms distinct from the outcome of interest. For example, an ITE model trained to identify which patients would benefit from an intervention in terms of avoiding initiation of dialysis would, by default, identify those for whom the intervention would hasten dialysis, but experiencing the primary outcome is rarely the only harm from an intervention. As such, ITE models should be targeted to outcomes (like death), where there is tolerance for potential harm provided that benefit is maximized.
The Future of ITE Modeling
ITE modeling will be most useful in situations where the costs of an intervention (in terms of potential patient harms or the broader cost to society) are significant or where resources are limited. In nephrology, ITE modeling could be used to improve allocation strategies to better select patients for kidney transplant, determine when (and if) dialysis should be initiated in AKI or CKD, or optimize BP treatment. ITE modeling seeks to minimize the population-level exposure to an intervention while maximizing the population benefit; most interventions would benefit from such precision targeting.
The future is personalized medicine. We now can capture wide-ranging genomic, metabolomic, proteomic, and phenomic data on individual patients. ITE modeling provides an opportunity to personalize treatment on the basis of routinely measured covariates. The critical next step is to validate these models prospectively—to show that ITE targeting can increase both the absolute risk reduction and the relative risk reduction of a specific intervention in the target population. Our group will be conducting such a trial, targeting electronic alerts for AKI on the basis of an ITE model, in the near future (clinicaltrials.gov NCT02786277). If ITE modeling proves to successfully target therapies—preserving population benefit while maximizing individual benefit—we are ethically obligated to use it.
Disclosures
None.
Acknowledgments
The authors would like to thank Melissa Martin for her editorial assistance.
F.P.W. is supported by National Institutes of Health (NIH) grant R01DK113191. C.R.P. is supported by NIH grants R01HL-085757 and R01-DK082185.
Footnotes
Published online ahead of print. Publication date available at www.jasn.org.
- Copyright © 2018 by the American Society of Nephrology