Abstract
The association of hemodialysis dosage with patient survival is controversial. Here, we tested the hypothesis that methods for survival analysis may influence conclusions regarding dialysis dosage and mortality. We analyzed allcause mortality by proportional hazards and accelerated failure time regression models in a cohort of incident hemodialysis patients who were followed for 9 yr. Both models identified age, race, heart failure, physical functioning, and comorbidity scores as important predictors of patient survival. Using proportional hazards, there was no statistically significant association between mortality and Kt/V (hazard ratio 0.72; 95% confidence interval 0.45 to 1.14). In contrast, using accelerated failure time models, each 0.1U increment of Kt/V improved adjusted median patient survival by 3.50% (95% confidence interval 0.20 to 7.08%). Proportional hazard models also yielded less accurate estimates for median survival. These findings are consistent with an additive damage model for the survival of patients who are on hemodialysis. In this conceptual model, the assumptions of the proportional hazard model are violated, leading to underestimation of the importance of dialysis dosage. These results suggest that future studies of dialysis adequacy should consider this additive damage model when selecting methods for survival analysis. Accelerated failure time models may be useful adjuncts to the Cox model when studying outcomes of dialysis patients.
During the past 30 yr, observational and randomized trials^{1–3} of dialysis dosage have arrived at discrepant conclusions about the effect of dosage on mortality. In particular, observational studies^{4–6} suggested that a higher dosage of dialysis is associated with significant reduction in mortality; however, randomized trials in both peritoneal dialysis^{2,7} and hemodialysis^{3} have shown no relationship between the higher dosage of dialysis and mortality. In this article, we explore an alternative interpretation for these discrepancies as due to the limitations of standard survival methods.
To understand the relevance of survival methods to ESRD outcomes research, it is important to relate dialysis dosage to the assumptions of survival methods. The urea kinetics paradigm (Kt/V) for dialysis dosing is grounded in a conceptual model for the pathophysiology of uremia. According to this model, renal replacement therapies attenuate the damaging effects of uremic toxins^{8–10} by convective/diffusive removal. This model further suggests that the likelihood of a uremiarelated complication and death depends on the cumulative exposure to the uremic milieu (“additive damage model”). Survival in the proportional hazard model (PHM) is related to the instantaneous, not cumulative, exposure^{11} as a result of the assumption of proportional hazards. When this assumption is violated, one can anticipate a loss of statistical power unless there are a very large number of events. This phenomenon may partly explain the discrepant findings between the large observational studies and the much smaller randomized trials relating dialysis dosage to mortality. The epidemiologic studies, involving thousands of patients, may have had greater power than the randomized clinical trials, which followed hundreds of patients, with fewer events. Conversely, alternatives to the PHM, such as the accelerated failure time model (AFTM),^{12} can model cumulative exposures and in fact may be the preferred method to study survival in the context of additive damages.^{11,13} Recent theoretical research in biostatistics^{14–17} supports this position for a number of chronic diseases.^{14,18–21}
Because AFTM models have been infrequently applied to ESRD populations, we studied mortality in a cohort of incident dialysis patients using both standard approaches and AFTM. Second, we examined the apparent validity of AFTM and proportional hazard approaches by comparing model predicted against observed median patient survival. Third, we explored potential pitfalls in the application of the PHM to ESRD outcomes and discuss a potential role for the AFTM in clinical studies. Finally, we discuss the implications of these findings for the interpretation and analysis of dialysis clinical trials.
Results
Cohort Characteristics
Information regarding the following baseline characteristics was available in 766 of the 767 hemodialysis patients initially enrolled in the Choices for Healthy Outcomes in Caring for ESRD (CHOICE) study: Age, gender, race, cause of ESRD, diagnosis of heart failure, and comorbidities (Index of Coexistent Disease [ICED] score). Table 1 summarizes the characteristics of the patients in the study cohort, as well as the patients with and without missing covariate data for the adjusted survival analyses reported herein. There were no obvious differences in the two subgroups with respect to baseline characteristics (age, race, gender, cause of ESRD), known predictors of mortality (comorbidity scores, presence of heart failure), or duration of followup. Furthermore, multivariable logistic regressions did not identify any obvious missingness patterns with respect to baseline characteristics that could affect survival (age, gender, race, cause of ESRD, presence of heart failure, and worse comorbidity scores). Finally, the survival of the cohort with missing information did not differ from the survival of patients with complete covariate information (P = 0.53, twosided KolmogorovSmirnov test for crossing survival curves; Supplemental Figure 1). Because patients with missing values were different in neither the presence of potentially influential baseline characteristics nor their overall survival profiles, we restrict our analyses to the complete cases without further adjustment (e.g., multiple imputation). All subsequent analyses used the latter KaplanMeier (KM) curve to estimate the quantiles around the median and evaluate the different survival models.
Survival Analyses
During the observation period, there were 315 deaths in the cohort of patients with full covariate data, and only five of 491 patients were still alive and at risk by the beginning of the ninth year. The lognormal AFTM yielded parameter estimates that were comparable to the semiparametric AFTM (Supplemental Tables 1 and 2). Because the results of other AFTM distributions differed substantially from the semiparametric one, the lognormal AFTM was used in all subsequent analyses. The results of multivariable survival regressions with the PHM and the AFTM are shown in Table 2. Both models identified age, race, higher comorbidity, low physical functioning (PF) scores, presence of heart failure, and low albumin as significant predictors of mortality at the P < 0.05 level. Conversely, there was no statistically significant effect for body mass index (BMI), gender, and assigned cause of ESRD in either the hazard rate scale (HR) or the time scale (inverse acceleration factor [1/AF]). Despite the consensus about the significance of these covariates on survival, the interpretation of these results is different. Using albumin as an example, the estimated HR of 0.558 translates to a relative risk that decreases (improves) by 55.8% for each increase in the serum albumin concentration by 1 g/dl. Conversely, the corresponding 1/AF of 0.594 is equivalent to a median survival that is increased by 1/0.594 = 1.684 times (i.e., 68.4% increase for each 1g/dl increment in the concentration of serum albumin).
The two models reached different conclusions in the case of Kt/V with the AFTM estimating a significant effect (1/AF 0.70; 95% confidence interval [CI] 0.51 to 0.98; P = 0.039) and the PHM indicating a nonsignificant effect (HR 0.72; 95% confidence interval 0.45 to 1.14; P = 0.16). The result estimated by the AFTM suggests a statistically significant increase of median survivals as the Kt/V increases and is readily appreciated graphically (Figure 1). This figure shows the multivariableadjusted ratio (for age, gender, race, ICED score, heart failure, PF score, cause of ESRD, BMI, and albumin) of median survivals (RS_{0.5}) and its associated 95% CI over a range of increases of Kt/V over the baseline (ΔKt/V). The relationship between RS_{0.5} and ΔKt/V is an exponential one; each increment of ΔKt/V by 0.1 U is associated with an increase in median survival of 3.5%.
Validity of Survival Models
Overall, both models had similar discriminatory capacity in bootstrap analyses; their performance assessed by the area under the receiver operating characteristic curve value was almost identical (i.e., 0.6980 for the PHM versus 0.6983 for the AFTM). Despite the global predictive equivalence, the performance of the two models was not constant over time. Figure 2 depicts the overall fit between each of the regression models for the various quantiles of the KM survival estimator of the CHOICE cohort. Both models were somewhat accurate at the initial period (from initiation of hemodialysis until approximately 1.5 yr). The AFTM (thick dashed black line) yielded a fit that was closer to the survival of the cohort (black line) compared with the PHM (gray line) for the middle interval of the observation period (between 1.5 and 7.0 yr), whereas the PHM seems to fit the final and more uncertain (larger 95% CI) segment of the survival curve better.
Because the predicted median survival provides a global assessment of goodness of fit,^{22} we quantified the bias in the predictions made by the two models. The two models yielded different predictions for the median survival; the median survival times of the study cohort computed by the KM curve, the AFTM, and the PHM were 48.4, 49.3, and 52.6 mo, respectively. The predictions for the 40th and the 60th quantiles for the KM/AFTM/PHM were 61.0/59.7/66.0 and 37.9/39.8/44.7 mo, respectively. Hence, these survival time estimates were biased for the PHM (up to 17.9% for the 60th quantile, 8.7% for the median). In contrast, the estimates of the AFTM were considerably less biased (1.8% for the median, 5% for the worstcase scenario of the 40th quantile).
GoodnessofFit Analyses
The analyses of the previous section suggested that the proportionality assumption was violated by at least one (and possibly more) covariates during the observation period. Indeed, formal tests of nonproportionality indicate that the underlying hazard rate is not proportional for the following covariates: ICED score, Kt/V, and cause of ESRD (twosided P < 0.05; Supplemental Table 3). The smoothed hazard rate estimate for Kt/V is shown in Figure 3. The hazard rate is approximately constant until 1.5 yr after initiation of hemodialysis but displays strong nonmonotonic lognormal features^{23} (first increasing and then decreasing) during the subsequent years.
Sensitivity Analyses
There were no substantial differences in parameter and associated CI estimates when alternative algorithms were used to account for ties in proportional hazard regressions (Cox exact method, Breslow method). A variety of nonlinear relationships for Kt/V were also examined in semiparametric PHM (Cox) and parametric PHM (Weibull). These alternative, more complex models failed to improve goodnessoffit by the Akaike Information Criterion and even worsened it in the case of a binomial relation. The estimated relationships between the loghazard ratio and the Kt/V for the model of Table 2 and these complex models are shown in Supplemental Figure 2.
To rule out the possibility that a small number of patients with unusual characteristics biased the results of our analyses, we applied a bootstrap procedure to generate parameter estimates in both PHM and AFTM regressions. Bootstrap estimates for hazard ratios and acceleration factors were not substantially different from those reported in Table 2. When we carried out proportional hazard regression analyses stratified on categorical covariates with nonproportional effects (ICED score, cause of ESRD), there were no substantial differences in the parameter estimates of predictors of mortality. Similar effects were observed in the case of stratified lognormal regressions; furthermore, the parsimonious (nonstratified) lognormal survival model was the optimal description of the data set by virtue of its smallest Akaike information criterion.
Discussion
In this report, we hypothesized that the choice of a survival model would yield differential inferences about the effects of Kt/V on mortality. In agreement with our hypothesis, we found that Kt/V was a significant predictor of mortality when AFTMs but not PHMs were used. A careful review of existing statistical literature attributes this discrepant result to the violation of the proportionality assumption. Such a violation is hardly surprising. On the contrary, it is expected when the conceptual model of uremia implicit in the use of kinetic measures of dosage (Kt/V) is examined more closely. Other associations in this cohort were not sensitive to the survival model used; therefore, this report demonstrates that relating the role of dialysis dosage to mortality is uniquely sensitive to the selection of the survival model used to analyze the data.
The theoretical justification of a kinetic approach to dialysis dosing is that “uremia results from the retention of toxic solutes capable of removal by the dialyzer.”^{24} In other words, dialytic therapies allow patient survival by slowing the accumulation of uremic retention solutes and their associated toxic effects over time. This conceptual model of dialysis therapy is incompatible with standard survival models because of the mathematics underlying the PHM.^{11,13} When death is thought to arise from cumulativeovertime influences (“additive damage model”), the corresponding survival function defines an AFTM rather than a PHM. In the former but not the latter, survival at any given time depends on the history of the exposure up to that time. To gather empiric support for this model of uremia, we studied survival of hemodialysis patients during a long period of time. In our cohort of patients, we did observe the violation of proportional hazards that was anticipated by the additive damage model.
In light of these considerations, it is important to consider the implications of using a mismatched statistical model to assess ESRD outcomes. These consequences can be understood by referring to the “physiologic statistics” formula introduced by Sackett^{25} to aid interpretation of clinical trial results: When applied to the same body of data, misspecified survival models will yield different estimates for the noise and the signal in the data. In the case of the PHM applied to data from an additive damage process, the parameter estimates (“signal”) will be biased toward the null (“noeffect”) hypothesis and the “noise” estimates will be inflated.^{17,26} As a result, the associated P value (CI) will be larger (wider) than its actual value, and the null hypothesis will not be rejected. This effect reflects the loss of power in statistical tests of significance^{17} rather than the lack of importance of the clinical intervention examined, thereby providing a novel explanation for the discrepant finding between observational studies and randomized trials. In the first case, the large sample size overcomes the error from the nonrandomized nature of the data, leading to significant associations between dialysis dosage and outcomes. Conversely, the smaller sample size of the randomized studies might not be able to overcome the low power of the PHM.
Our parallel AFTM and PHM analyses demonstrate that older age, lower PF, higher comorbidity scores, heart failure, white race, and low albumin at baseline were important predictors of mortality. These associations, which were detected by both survival models, are in agreement with previous reports^{4,27–33} and clinical intuition. Conversely, Kt/V, a variable that failed the proportionality test, was significant when AFTMs were used. Our exploration of the deviation from proportionality points toward the lognormal distribution as a potential model for survival on hemodialysis. By using this model, we obtained estimates of the effects of Kt/V that were more precise (smaller CIs) and potentially less biased than the PHM, leading to a different assessment of the role of Kt/V on survival.
In our analyses, the hazard of lower Kt/V values did not increase until after the first 1.5 yr on dialysis. This could reflect the contribution of residual renal function to overall solute clearance, which has been shown to persist well after the first year in other cohorts.^{34} Viewed from that perspective, residual renal function may be as important for hemodialysis patients as for patients who are on peritoneal dialysis. Alternatively, the nonmonotonic hazard could represent a selection effect as a result of heterogeneous individual susceptibility (frailty).^{35} The frailest (most susceptible) individuals will tend to die first, leaving the more robust ones behind, imparting the population hazard with an increasingdecreasing pattern.
Our analysis supports the notion that kinetic estimates of dialysis dosage (Kt/V) are linearly associated with improved patient median survival. Shifting the focus from urea to other molecular biomarker(s) of dialysis dosage (pcresol,^{36,37} β_{2}microglobulin,^{9,38–43} guanidine compounds,^{41,44,45} highthroughput molecular fingerprinting assays^{46,47}) may increase the precision of the “additive damage” model of uremia. Consequently, AFTM survival models will be especially useful when such novel measures of dialysis adequacy are accepted for routine use.
Several limitations of this report should be noted when interpreting our findings. First, the CHOICE study was an observational one, not a randomized clinical trial designed to test the effects of dialysis dosage on survival. Second, approximately one third of enrolled patients had missing covariate data. Although we evaluated the plausibility of uninformative missing data mechanism, these assessments are approximate in nature. Third, our study cannot accurately assess the temporal effects of underdialysis without longitudinal measurements of Kt/V at multiple time points. These were not available for a substantial portion of the CHOICE patients after the first year of treatment. Consequently, the nonmonotonic form for the log hazard rate (Figure 3) of Kt/V can be determined only semiquantitatively. Last, our estimates about the linear relationship between median survival and Kt/V are limited to the range of Kt/V in CHOICE (1.26 ± 0.30); however, this range is likely generalizable to incident dialysis patients in the United States. CHOICE was a large study with age and racial distributions similar to those in previous national samples,^{48,49} and the effect sizes for Kt/V is compatible with previous assessments based on US Renal Data System data. Because these studies used much larger sample sizes, they were able to achieve statistical significance (smaller P values) with a standard survival model in line with the prediction of the Sakett formula.
In summary, this report demonstrates the sensitivity of mortality studies of hemodialysis patients to the survival model used to analyze the data. The most accurate interpretation of our findings is that non–timeupdated proportional hazards model may not be sensitive enough to detect survival benefits as a result of higher levels of Kt/V. Consequently, AFTMs should be considered as adjuncts to the PHM when analyzing the role of dialysis dosage on survival. Most statistical packages used in outcomes research provide estimation procedures for AFTM (e.g., PROC LIFEREG in SAS, streg in STATA, survreg/psm in R/SPlus), facilitating such analyses. Loss of precision and biased inferences may be expected if one adopts the “wrong” distribution for the AFTM model. Hence, one should always examine this possibility when reporting and interpreting such results. For example, one could use semiparametric forms of AFTM^{50–52} to explore sensitivity of the inferences to the parametric assumption provided that one has an adequate sample size to fit such models.
These findings should be validated in other hemodialysis study populations because our results are based on retrospective assessments by non–timeupdated Cox models time in a nonrandomized setting. Conversely, a reanalysis of the HEMO trial by timedependent PHMs^{53} revealed a markedly positive effect of Kt/V on survival. This discrepancy can be interpreted as a sign of robustness of the timeupdated Cox models to violations of the proportionality assumption; however, these findings were interpreted by the study investigators to be the results of the bias of perprotocol analyses that time updating implies.^{53} Moreover, it is crucial to explore survival modeling approaches in conjunction with biomarkers displaying definite uremic toxicity. This in turn will permit us to bridge the gap between kinetic descriptions of dosage and epidemiologic assessments of dialysis adequacy, potentially improving the care of patients with kidney failure.^{54}
Concise Methods
Study Population
The study participants were a subpopulation of the 767 hemodialysis patients who participated in CHOICE,^{55} a national, prospective cohort study of incident incenter hemodialysis and peritoneal dialysis patients. These patients were enrolled from 81 dialysis clinics associated with Dialysis Clinic, Inc. (Nashville, TN), New Haven CAPD (New Haven, CT), and the Hospital of St. Raphael (New Haven, CT) from October 1995 through June 1998. All study participants were ≥18 yr of age and spoke English or Spanish and were enrolled a median of 45 d from initiation of longterm dialysis (98% within 4 mo). Approximately two thirds of eligible patients were enrolled; these patients were similar to nonenrolled patients with regard to gender and age and were followed up for mortality until December 31, 2004. The protocol for this analysis was approved by the Johns Hopkins University School of Medicine and the University of Pittsburgh institutional review boards.
Baseline Data Collection
Demographic characteristics (age, gender, race), primary cause of kidney failure, and date of first longterm dialysis were ascertained from the Health Care Financing Administration Medical Evidence Form (Form 2728), which was completed at initiation of longterm dialysis. Comorbidity was measured using the ICED, a composite scoring system based on 19 medical (Index of Disease Severity) and 11 physical impairment categories (Index of Physical Impairment).^{56–58} On the basis of the peak Index of Disease Severity and Index of Physical Impairment scores, an ICED level is assigned on a 4point scale (0 to 3) with a higher score reflecting greater severity. The ICED score used in this report is essentially identical to the instrument used to adjust the findings of the HEMO study. Its performance is well documented in diverse cohorts of patients with ESRD. In comparative evaluations using shortterm followup data, ICED achieved a performance equal to or better that the Charlson Index in terms of the area under the receiver operator characteristic curve. A working copy of the instrument as well as scoring instructions are provided as an online supplement to this article (Supplemental Forms: ICED).
Healthrelated quality of life measures were assessed with the SF36 Version 1 instrument completed at baseline. In this report, Kt/V was calculated using the secondgeneration singlepool variable volume formula of Daugirdas^{59}: where R = BUN_{post}/BUN_{pre} is the ratio of blood urea nitrogen after and before dialysis, UF is the ultrafiltration volume and Wt_{post} is the postdialytic patient weight. For increasing robustness against measurement error, the multivariable regression models used the average Kt/V in the first 3 mo after enrollment rather than a single measurement.
Outcomes
The primary outcome was allcause mortality from the time of initiation of hemodialysis. Patients were censored at transplantation, switch to a different modality or the end of followup period. Overall, 40 (5%) of the 767 patients who underwent hemodialysis switched modality at least once during the CHOICE study; for these patients, survival was censored at the day of the last hemodialysis received before the switch.
Selection of Covariates for Survival Models
The selection of covariates was guided by factors that have been found to be associated with mortality in dialysis patients in previous studies.
Demographics.
Mortality has been associated with age, gender, and race^{60} in dialysis patients in previous studies as well as initial^{61} and subsequent analysis of the CHOICE cohort^{62}; therefore, these three variables were included in the models. To account for possible confounding between patient size and gender, we also evaluated the baseline BMI.
Comorbid Disease.
Patients who undergo dialysis often have substantial comorbidity as a result of advanced age and underlying cause of ESRD and associated therapy. To capture the effects of these multiple potentially independent sources of comorbidity, we adjusted for the following factors: (1) ICED score as a categorical variable with three levels (≤1, 2, and 3),^{56,57,63} (2) presence of heart failure^{64} (from Form 2728), and (3) assigned cause of ESRD (classified as the result of diabetes, hypertension, glomerulonephritis, or other causes).
Laboratory Tests.
A number of laboratory tests (hematocrit, serum albumin, serum creatinine, serum phosphate)^{65} have been shown to correlate with patient survival. In particular, serum albumin has shown a strong and consistent association with mortality outcomes in multiple independent cohorts^{66,67} and was thus used to adjust survival models.
HealthRelated Quality of Life.
The SF36 Version 1 PF scale, a component of the SF36 physical summary score, was scored and analyzed. For the SF36 PF scale, the minimum score is 0 and the maximum score is 100, with higher scores reflecting better PF.^{68} In previous studies, components of the SF36 have strongly been associated with patient survival on dialysis.^{32,33,69}
Statistical Analysis
Baseline demographic, socioeconomic, and laboratory factors were summarized as means (SD) for continuous variables and as frequency distributions for categorical variables. To evaluate possible biases introduced by missing values in the data, we applied the following methods. First, we examined the statistical significance of the differences between groups with missing and complete data using the Welch t test for continuous and Fisher exact test for categorical variables for which complete or nearcomplete data (frequency of missing values <1%) were available. These variables included age, gender, race, duration of followup, ICED score, presence or absence of heart failure, and assigned cause of ESRD. Second, we developed a logistic regression model to predict the probability of having missing data in any of the covariates with missing information (albumin, Kt/V, BMI, PF) on the basis of clinical predictors with complete information. Last, we evaluated the statistical significance of differences in the KM estimators of the survival of patients with missing or complete information. Because these curves crossed at multiple points, neither the logrank nor the Wilcoxon test is powerful enough to detect any differences, and we used the KolmogorovSmirnov test for arbitrarily censored data.^{70}
Multivariable survival analyses were carried out with the PHM and an AFTM using the lognormal distribution^{23}; in proportional hazard regressions, the Efron method and conventional (nonsandwich) SEs were used to handle ties in event times and compute parameter estimate P values, respectively. The effects of covariates on survival were quantified with the hazard ratio (HR) for PHM regressions and with the 1/AF for the lognormal AFTM. The AF is the measure of association in an AFTM and allows the investigator to obtain directly the effect of the predictor variable on the time until the event of interest. More formal, the AF is the ratio of the survival times corresponding to a given quantile of the survival curve. The 1/AF is a measure that is more directly comparable to the hazard ratio in PHM and is derived by inverting the AF from the AFTM regression model. For example, if the 1/AF is 0.5 (AF = 2.0) in a pairwise comparison, then the median survival time of the reference (comparison) group is half (twice) as long as that of the remaining one; analogous interpretations can be given for the relation of the other quantiles of the two survival distributions. An AF or “1/AF” equal to 1 (or an associated 95% CI that includes 1) signifies a lack of statistical significance for the comparison of quantile values, such as the median.
Models were compared by calculating the predictions that the lognormal AFTM and the PHM make at the following quantiles: 40, 45, 50 (median), 55, and 60%. Specifically, we computed the survival time at these quantile values relative to the predictions made by the unadjusted KM estimator in the CHOICE cohort. Subsequently, we evaluated the quantile predictions made by the two models at the points in time required to reach these quantiles in the KM estimator. To calculate these predictions, we evaluated both models at the average values of the regression coefficients estimated by the two models. Selection of these cutoff points is justified on recent theoretical investigations that show that both proportional models and AFTMs may yield severely biased results outside this range in the presence of missing important covariates.^{17} Consequently, the range of quantiles around the median is the region where predictions are more robust with respect to model misspecification. Furthermore, the median is an important end point for planning, analyzing, and interpreting in clinical trials; thus, biases in this quantity are of interest to the interpretation of research findings. Agreement with the KM estimate far from the median was assessed graphically by plotting the survival probability estimates of the “average” individual; these curves were superimposed to the KM estimate, and its associated 95% CI was computed at the log hazard scale. Formal evaluation of the internal validity and amount of overfitting by the two models was carried out by calculation of the concordance probability (Cindex) and shrinkage slope factors, respectively.^{71} The Cindex of survival model is numerically equal to the area under the receiver operating characteristic curve and is a measure of the capacity of the model to discriminate between patients who experienced event of interest and those who did not. For survival models, the Cindex is calculated from Somer Drank correlation coefficient (D) between predicted and observed outcomes as D = 2 × (C − 0.5). The shrinkage factor is the percentage of overfitting in the final model as a result of regression toward the mean. The shrinkage factor of a regression model is computed as the ratio (LR − p)/LR, where p is the degrees of freedom of the regression and LR is the corresponding likelihood ratio o. Both the shrinkage factor and the Cindex were calculated by a bootstrap procedure as detailed previously^{71}; briefly 5000 learning samples of 491 patients were drawn with replacement from the original cohort of patients with full covariate data for estimation of the PHM and AFTM. Subsequently, the estimated coefficients are used in the original cohort (testing sample) to correlate the predicted survival probability and the observed (KM) estimate.
Finally, a formal evaluation of the lack of fit in PHM was carried out with the use of scaled Schoenfeld residuals^{72} for each covariate under considerations as well as the total model. These residuals can be used to carry out a formal test for nonproportionality and explore the associated time varying contribution of each covariate to the overall hazard. Evidence for nonproportionality was defined as a P < 0.05 on the associated χ^{2} test.^{73} For covariates with nonproportional hazard effects, we also estimated the smoothed scatterplots of the scaled Schoenfeld residuals to explore the temporal nature of nonproportional effects. Sensitivity analyses were carried out by building and graphically evaluating survival (PHM and AFTM) models that were stratified with respect to covariates with nonproportional effects. All analyses were performed in R 2.6.0^{74} with the package Design^{75} for the regressions of the PHM and AFTM and the calculation of the corresponding survival probabilities. Formal tests of proportionality and the associated scatterplots were generated with the package survival.^{76}
Disclosures
This research was also supported by the Renal Discoveries–Baxter Extramural Grant Program (M.U., C.A., and C.C.H.C.).
Acknowledgments
CHOICE was supported by R01HS08365 (Agency for Healthcare Research and Quality) from June 1995 through May 2000, R01HL62985 (National Heart, Lung, and Blood Institute) from September 2000 through July 2005, and R01DK07024 (National Institute of Diabetes and Digestive and Kidney Diseases) from September 2000 through June 2007. M.U. was further supported by DK66006 and DK77785.
An early version of this article was presented in poster form at the annual meeting of the American Society of Nephrology; November 4 through 9, 2008; Philadelphia, PA (abstract SAPO2595).
We thank the patients, staff, and physicians who participated in the CHOICE study at Dialysis Clinic, Inc., and St. Raphael's Hospital. We thank Sir David Cox for the many helpful recommendations and suggestions about the sensitivity analyses and the interpretation of the results in a preliminary version of this manuscript. We thank Prof. Jane Hutton for providing extended versions of tables concerning the power of PHMs fitted to accelerated failure time data.
Footnotes

Published online ahead of print. Publication date available at www.jasn.org.

See related editorials, “Measuring Patient Survival on Hemodialysis,” on pages 1866–1867, and “Does a Statistical Method Suggest a New Pathobiology for Hemodialysis Patients?” on pages 1867–1869.

Supplemental information for this article is available online at http://www.jasn.org/.
 Copyright © 2009 by the American Society of Nephrology
References
 1.↵
 2.↵
 3.↵
 4.↵
 5.↵
 6.↵
 7.↵
 8.↵
 9.↵
 10.↵
 11.↵
 12.↵
 13.↵
 14.↵
 15.↵
 16.↵
 17.↵
 18.↵
 19.↵
 20.↵
 21.↵
 22.↵
 23.↵
 24.↵
 25.↵
 26.↵
 27.↵
 28.↵
 29.↵
 30.↵
 31.↵
 32.↵
 33.↵
 34.↵
 35.↵
 36.↵
 37.↵
 38.↵
 39.↵
 40.↵
 41.↵
 42.↵
 43.↵
 44.↵
 45.↵
 46.↵
 47.↵
 48.↵
 49.↵
 50.↵
 51.↵
 52.↵
 53.↵
 54.↵
 55.↵
 56.↵
 57.↵
 58.↵
 59.↵
 60.↵
 61.↵
 62.↵
 63.↵
 64.↵
 65.↵
 66.↵
 67.↵
 68.↵
 69.↵
 70.↵
 71.↵
 72.↵
 73.↵
 74.↵
 75.↵
 76.↵