Abstract
Diabetes is the leading cause of ESRD. Despite evidence for a substantial heritability of diabetic kidney disease, efforts to identify genetic susceptibility variants have had limited success. We extended previous efforts in three dimensions, examining a more comprehensive set of genetic variants in larger numbers of subjects with type 1 diabetes characterized for a wider range of cross-sectional diabetic kidney disease phenotypes. In 2843 subjects, we estimated that the heritability of diabetic kidney disease was 35% (P=6.4×10−3). Genome-wide association analysis and replication in 12,540 individuals identified no single variants reaching stringent levels of significance and, despite excellent power, provided little independent confirmation of previously published associated variants. Whole-exome sequencing in 997 subjects failed to identify any large-effect coding alleles of lower frequency influencing the risk of diabetic kidney disease. However, sets of alleles increasing body mass index (P=2.2×10−5) and the risk of type 2 diabetes (P=6.1×10−4) associated with the risk of diabetic kidney disease. We also found genome-wide genetic correlation between diabetic kidney disease and failure at smoking cessation (P=1.1×10−4). Pathway analysis implicated ascorbate and aldarate metabolism (P=9.0×10−6), and pentose and glucuronate interconversions (P=3.0×10−6) in pathogenesis of diabetic kidney disease. These data provide further evidence for the role of genetic factors influencing diabetic kidney disease in those with type 1 diabetes and highlight some key pathways that may be responsible. Altogether these results reveal important biology behind the major cause of kidney disease.
Diabetes is the leading cause of ESRD.1 Among the patients with type 1 diabetes (T1D), as many as one third develop serious renal complications,2 characterized by increasing urinary albumin excretion rates (AER) and decreasing kidney function, measured by eGFR, with 10%–20% of the subjects progressing to ESRD.3,4 In the vast majority of patients with T1D, renal complications reflect the condition of diabetic nephropathy,5 although we use the term diabetic kidney disease (DKD) here to reflect the fact that not all have histologic evidence of diabetic nephropathy. Whereas DKD is associated with high risk of cardiovascular disease6 and premature mortality,7 those who manage to avoid DKD have much better prognoses with survival rates comparable to subjects without diabetes.7 The treatment of DKD, primarily relying on the control of blood glucose levels and on the use of antihypertensive medication, can only retard disease progression rather than restore kidney function or efficiently prevent ESRD. This highlights the need for a better mechanistic understanding of DKD which would provide improved therapeutic targets and biomarkers of progression, both of which are expected to lead to improved personalization of care.
Family studies have shown familial clustering of DKD with a 2.1–2.3-fold increased risk of DKD in T1D siblings of probands with DKD.8–10 There is also evidence for shared genetic background between DKD and other diseases: the parents of the subjects with T1D and DKD have more type 2 diabetes (T2D) and cardiovascular disease.11–13 Despite evidence for genetic predisposition, few compelling signals have been identified for DKD. Positive reports of association with a range of candidate genes have generally failed to find support in larger analyses of independent samples,14 and only a few loci have been identified with genome-wide association studies (GWAS).15–17 In contrast, there have been a number of robust, replicated associations described for CKD in the general population.18–20 However, evidence that these variants influence predisposition to CKD in diabetes is mixed.15,20
The limited success of GWAS on DKD thus far may be due to multiple factors: relatively small sample sizes (in the thousands at the discovery stage); imprecise and variable diagnostics or phenotypic heterogeneity; and a focus on common variants (minor allele frequency [MAF] ≥5%), that neglects the possible contribution of lower frequency variants to DKD predisposition. In other complex traits, such as schizophrenia, initial challenges in identifying robust association signals have been overcome as sample sizes have increased.21,22
To overcome the limitations of previous studies in DKD, in this study we: increased sample size through GWAS meta-analysis; analyzed a range of phenotypic comparisons that encompass different stages and severity of DKD in subjects with T1D; and extended the genome-wide association screen from common variants to lower frequency coding variants through whole exome sequencing (WES).
Results
DKD Traits Are Heritable
Because no comprehensive evaluation of heritability exists for the various stages of DKD, we estimated the narrow-sense, ‘chip’ heritability (i.e., the proportion of phenotypic variance explained by genome-wide single nucleotide polymorphisms (SNPs), h2) for the seven applied DKD phenotypic comparisons (Figure 1) in the context of T1D using data from the largest included GWAS study, the Finnish Diabetic Nephropathy (FinnDiane) Study.23 Heritability estimates varied greatly across the comparisons (Figure 2). For the primary, ‘combined DKD’ phenotypic comparison (micro- or macroalbuminuria or ESRD versus normal AER), h2 was 35% (P=6×10−3; Supplemental Table 1). The highest value, h2=59%, was obtained for ‘CKD+DKD’ (those with both CKD and DKD assigned as cases, and those with no CKD and no DKD as controls; P=1×10−3). Other late-stage phenotype definitions also yielded high estimates of heritability, e.g., 47% for ‘ESRD versus no DKD’ (P=3×10−3). When we included gender, diabetes duration, and age at diabetes diagnosis as covariates, the proportion of the remaining phenotypic variance explained by the genotyped SNPs increased for each phenotypic comparison, e.g., from 35% to 50% for ‘combined DKD’ (P=2.5×10−4). The estimates represent lower limits for the total heritability of DKD traits, as the method considers only those causal variants captured by the variants genotyped on the array.
Schematic picture of the DKD phenotypic comparisons on the basis of measured AER and eGFR, encompassing different stages and severities of DKD. ‘Combined DKD’ and ‘Late DKD’ (conventionally used in many previous genetic studies of DKD) are expected to capture genetic factors affecting DKD in general; ‘Early DKD’ comparison targets the genetic factors affecting the initiation of DKD, or with milder effect on the phenotype, whereas the two ESRD-based case definitions are expected to capture factors related to the late progression of DKD such as fibrotic processes, or genetic factors with particularly strong effect on the phenotype. Although the ‘CKD’ phenotype may reveal genetic factors for reduced renal function irrespective of the presence of albuminuria, the ‘CKD+DKD’ phenotype is an extreme phenotype that requires that controls have no signs of renal complications (neither AER or eGFR based). AER thresholds are given in µg/min, eGFR thresholds in ml/min per 1.73 m2. Number of samples per subphenotype: Normo: 2593; miA: 800; maA: 944; ESRD: 813; no CKD: 2909; CKD: 1255; no CKD, no DKD: 2018; CKD+DKD: 1117. Normo, normal AER; miA, microalbuminuria; maA, macroalbuminuria.
The proportion of phenotypic variance explained by genotypes (V[G]/V[p_L]), i.e., the narrow-sense heritability, of the seven studied DKD phenotypic comparisons indicate high heritability especially for the more extreme phenotype definitions. Error bars represent the SEM.
GWAS
The GWAS discovery stage included 3135–5156 subjects with T1D, depending on phenotypic comparison (Supplemental Figure 1), from four studies: FinnDiane,23 the EURODIAB Family Study,24 the Scania Diabetes Registry (SDR),25 and the UK Nephropathy Family Study and Oxford Regional Prospective Study (NFS-ORPS/Cambridge)26,27 (Supplemental Tables 2 and 3, Table 1).
Clinical characteristics of the subjects, divided into cases and controls on the basis of the ‘combined DKD’ phenotype definition
With 2563 T1D DKD patients and 2593 T1D non-DKD controls for the ‘combined DKD’ phenotype in the discovery cohorts, we estimated >80% statistical power to detect variants with MAF≥10% and allelic odds ratios (OR) ≥1.55 at genome-wide significance (P<5×10−8; Supplemental Table 4). Results from meta-analyses were well calibrated (genomic control [λGC], 1.01–1.05; Supplemental Figure 2) but showed little deviation from the null with no SNP reaching P<5×10−8 (no adjustment for multiple phenotypes). Across seven DKD case-control comparisons, a total of 101 regional lead SNPs reached a suggestive P value of <5×10−6 in at least one analysis (Supplemental Table 5). These 101 associations were tested for all DKD phenotypes available in the various stage 2 samples (Supplemental Tables 2 and 3), although meta-analysis results were only compiled across data for equivalent phenotypes. The two-stage design provided ≥80% power for the primary ‘combined DKD’ comparison to reach P<5×10−8 for variants with an OR≥1.47 and MAF≥10% (Supplemental Table 6).
In the joint meta-analysis of stage 1 and 2 samples, no SNP reached P<5×10−8, although three achieved P<1×10−6, all for the ESRD-based case definitions: rs1989248 near CNTNAP2, rs61277444 in PTPN13, and rs7562121 in AFF3 (Supplemental Figure 3, Supplemental Table 5). We also identified rs72809865 near NRG3 for further replication on the basis of nominal replication (P=0.05) for ‘combined DKD’ in stage 2. These four SNPs were examined in 1087 additional subjects using de novo genotyping (stage 3), although direct genotyping was unsuccessful for rs1989248 (near CNTNAP2). After the joint analysis of stage 1–3 results, the association at rs7562121 in AFF3 remained suggestive (P<1×10−6; Table 2).
Association analysis results for the top SNPs from the GWAS discovery + in silico replication analysis, selected for additional de novo replication
Rs7562121 in AFF3 is in moderate linkage disequilibrium (LD) with rs7583877 (r2=0.67; D’= 0.94), the variant previously associated with ESRD in T1D.15 Conditional analysis showed that both SNPs represent the same signal (Supplemental Table 7). However, this study cannot be considered replication of previous reports because of substantial overlap of study samples. Analysis in nonoverlapping samples (222 ESRD cases, 1640 non-ESRD controls) provided no independent corroboration of the AFF3 association (OR=1.02; P=0.82).
The substantial heritability captured by SNPs on the genotyping array, and the failure to detect genome-wide significant signals, suggest that the causal variants are beyond detection in the present study, due to more modest effect size (e.g., OR<1.47 for MAF≥10%) and/or imperfect capture by common variant-focused GWAS analyses.
Re-Evaluation of Previous DKD Associations
The enlarged sample size provided an opportunity to investigate the previous, often contradictory reports of SNP associations.28 We selected signals from three sources: a literature-based meta-analysis of candidate gene associations28; GWASs on DKD in subjects with either T1D15,17,29–31 or T2D32,33; and GWASs on CKD irrespective of underlying pathology.20 Many of these previous studies included subjects overlapping with those in this analysis.
We estimate >80% statistical power to detect associations with allelic OR≥1.25 and MAF≥10% at nominal significance (α=0.05; Supplemental Table 8) for the ‘late DKD’ phenotype, which was most commonly used in previous publications. We observed only a modest deviation from the null P value distribution for the set of previously-associated variants, suggesting that many of these have been false positive findings (Supplemental Figure 4). Nevertheless, in addition to the AFF3 SNP associations described above, the associations between rs2838302 (in SIK1) and ‘ESRD versus non-ESRD’, and between rs17709344 (RGMA–MCTP2) and ‘CKD+DKD’ were directionally consistent and significant (P<1.1×10−3 to correct for 46 loci). The association between rs2838302 (SIK1) and ‘ESRD versus no DKD’ remained nominally significant (P=0.04) even after exclusion of the substantial overlap of the subjects (Supplemental Table 9).
We observed directionally consistent associations (less stringent P<0.05) for variants near WNT4-ZBTB40, SEMA6D-SLC24A5, ELMO1, ERBB4, 4p15.1, and 13q regions (Supplemental Figure 5). However, at four of these loci, the modest evidence of association was lost when overlapping samples were removed. Independent replication of previously reported signals (directionally consistent P<0.05) was therefore restricted to the signals at ELMO1, 13q, and SIK1 (Supplemental Table 9).
DKD Shows Shared Genetic Background with Obesity, T2D, and Smoking Cessation
Body mass index (BMI), systolic BP, serum lipids, and smoking have been associated with DKD in epidemiologic studies24,34–36 whereas T2D, obesity, hypertension, and lipid disorders have been reported to cluster in families with DKD,11,12,37 suggesting a shared genetic background or a close correlation among these phenotypes. We constructed weighted genetic risk scores (GRS) for 19 diabetes,38–45 obesity,46–48 hypertension,49 or lipid-related50 intermediate phenotypes on the basis of 10–96 established SNPs (P<5×10−8) for each phenotype from previously published GWAS. The GRS for increasing BMI46 was associated (P<2.6×10−3 to correct for 19 traits) with multiple DKD traits (P=2.2×10−5 for ‘late DKD’; Figure 3, Supplemental Table 10), implicating a causal role of BMI for DKD. Although observational studies have shown contradictory results,51,52 these findings are consistent with those from a recent Mendelian Randomization study including FinnDiane subjects.53 The GRS predisposing to T2D was associated with increased risk of multiple DKD traits (e.g., P=1.9×10−3 for ‘combined DKD’ excluding pleiotropic lipid and glycemic SNPs; P=6.1×10−4 for ‘combined DKD’ including pleiotropic lipid SNPs) in line with previous reports of parental T2D being associated with DKD in subjects with T1D.12 These data support a causal, mechanistic link between metabolic syndrome and DKD in T1D.23
The GRS for BMI and T2D were associated with DKD phenotypes in subjects with T1D. A GRS for MBI (z-transformed) was associated (P<2.6×10−3 to account for multiple testing; 19 GRS traits) with combined and late DKD, and CKD phenotypes; GRS for T2D was associated with late DKD.
We also applied the LD score regression method to estimate the genetic correlation between traits. In contrast to the GRS approach which uses only the most associated variants, this method examines the allelic effects of all SNPs in the meta-analysis.54,55 As expected, the seven DKD phenotypes were highly correlated with each other, at least in part due to overlapping phenotypic definitions, with the exception of ‘early DKD’ (Supplemental Figure 6). Smoking cessation was inversely correlated with ‘CKD’ (P=1.1×10−4) and other DKD traits (Figure 4). This is in line with a previous epidemiologic study showing higher risk of DKD for smokers, whereby exsmokers had similar risk of developing DKD as nonsmokers.36
Genome-wide comparison of DKD traits and other phenotypes, evaluated with LD score regression, shows negative correlation between DKD traits and smoking cessation. Significant correlations (P<0.003 to account for multiple testing; 17 related phenotypes) are indicated with *. Bars represent 95% confidence intervals.
Gene Set Enrichment Analyses Suggest Glucuronidation and Other Pathways Affecting DKD
We performed gene set enrichment analyses (GSEA) of the GWAS results to identify biologic pathways and processes enriched among the most significant GWAS signals. The ascorbate and aldarate metabolism, and pentose and glucuronate interconversions pathways showed significant enrichment (P<1.6×10−5 to correct for multiple tested pathways). Literature linking ascorbate metabolism to DKD is sparse, but cell studies have suggested a reduced uptake rate of ascorbic acid (vitamin C) in DKD,56 and vitamin C plus E supplementation was reported to improve glomerular function in T2D.57 In the latter pathway, the flagged genes overlapped especially the glucuronate subpathway (Supplemental Figure 7). Negatively charged glucuronic acid units on heparan sulfate side chains participate in the maintenance of the negative charge selectivity of the glomerular basement membrane in the kidneys, and their cleavage by heparanase (endo-β-D-glucuronidase) has been suggested as an underlying cause for DKD.58 Although knock-out of the HSPE (heparanase) gene protects mice from DKD,59 no specific genetic variants in or near HSPE have been associated with DKD. Other gene sets with less marked enrichment (false discovery rate <0.05; Supplemental Table 11) included the cholesterol biosynthesis pathway (P=8.0×10−4) flagging among others the HMGCR gene (lowest P=0.02 for rs11726245 for ‘CKD’), which encodes the main target of statins, commonly used for prevention of diabetic macrovascular complications. Cholesterol levels are reported to be related to eGFR decline in T1D, whereas statin use improved eGFR in subjects with T2D in some60 but not all studies.61
Single-Marker Association Tests of WES Show Suggestive Associations for ESRD
To identify low frequency (1%≤MAF<5%) and rare (MAF<1%) coding variants contributing to DKD, we performed WES of 997 subjects with T1D from the FinnDiane, SDR, and Steno Diabetes Center. To maximize power, subjects were ascertained from the tails of the liability distribution: we selected patients with onset of macroalbuminuria or ESRD relatively soon after diagnosis of T1D, and controls with normal AER despite prolonged duration of T1D (Supplemental Table 12). This setting allowed us to study the ‘late DKD’ and ‘ESRD versus no DKD’ case-control contrasts (Figure 1).
No associations were observed at exome-wide significance (P<5×10−7; Supplemental Figure 8). Variants in five loci reached a P value of <1×10−5 for association with ‘ESRD versus no DKD’ (Supplemental Tables 13 and 14) with the strongest associations obtained for an intronic SNP rs188427269 within NVL (MAF 0.2%; P=3.3×10−7) and for rs13003941 in the 3′UTR of ERBB4 (MAF 35%; P=3.5×10−6). Other variants in ERBB4 (not in LD with rs13003941) were previously suggestively associated with DKD,15 and ErbB4 was shown important for the development of kidneys in mouse models.62
Gene Aggregate Tests of WES
To improve power to detect low frequency and rare variant associations, we performed gene-level aggregate tests using three complementary approaches: a variable threshold burden test,63 a dispersion test (SKAT),64 and a hybrid test (SKAT-O).65 No gene achieved exome-wide significance (P<2.5×10−6, adjusted for 20,000 genes; Supplemental Figures 9 and 10). For the ‘late DKD’ phenotype, the lowest P value was for GGA1 (P=3.1×10−5; Supplemental Figure 11), showing rare (MAF≤0.13%)66 missense alleles in seven cases but no controls across five variant sites (Supplemental Tables 15–17). For the ‘ESRD versus no DKD’ phenotype, the strongest associations were found at CCDC77 (P=2.1×10−5), THADA (P=8.3×10−5), and CHPT1 (P=1.9×10−5; Supplemental Figure 12, Supplemental Tables 18–20).
Although it has been hypothesized that common disease diagnoses are partially comprised of rare monogenic forms of the disease, we did not see enrichment in genes causing rare human glomerular diseases in our WES results (Supplemental Table 21). In addition, there was no enrichment between the WES signals and those arising from our GWAS studies. Thus, the variants and genes generating suggestive evidence for association in WES seem independent of rare variants identified for monogenic forms of kidney disease, and of the common variant contribution to DKD in subjects with T1D.
Bivariate Analysis of GWAS and WES Data
As some genetic factors may affect disease outcome only in the presence of interacting genetic factors, we complemented the GWAS and WES analyses using a multivariate variant selection method (Algorithm based on a BivAriate CUmulative Statistic, ABACUS)67 applied to both GWAS and WES data. Although no variant reached simulation P value (PS<5×10−8), suggestive evidence of association was obtained for variants in OSGIN1 in two different data sets (PS<5×10−6 for ‘ESRD versus no DKD’ in EURODIAB GWAS and in Steno WES; Supplemental Tables 22 and 23). Functional clustering of the results pointed at processes previously linked with DKD such as cholesterol and sucrose/glucose metabolism and inflammatory response pathway (Supplemental Tables 24 and 25).
Discussion
This study represents a systematic evaluation of common genetic risk factors and rare coding variants for a broad range of kidney complication phenotypes in T1D. To complement the association analyses, we performed various enrichment and correlation analyses considering the genome-wide content to obtain insight of the genetic architecture, biologic background, and correlation with other traits.
This study confirmed that renal complications in T1D are highly heritable, as a substantial proportion of the phenotypic variation can be captured by common variants on the GWAS chip, with the strongest heritability estimates observed for the most extreme phenotypic comparisons. This captured heritability could consist of common variants with modest effects, and/or variants of lower frequency, some of them imperfectly tagged by common SNPs. The limited evidence for linkage to DKD68 and general experience regarding the architecture of other common complex traits, suggest that it is unlikely that DKD risk is dominated by rare variants of large effect, although ultimately sequence based studies will be required for definitive evaluation.
We found suggestive common variant signals for DKD in or near AFF3, CNTNAP2, NRG3, and PTPN13 genes, but additional data sets will be needed to confirm (or refute) these signals and to increase power to detect others. For example, the effective sample size required to implicate a risk allele of MAF 20% and allelic OR 1.2 at genome wide significance is approximately 15,000. We demonstrated that many previously claimed DKD associations signals are likely to be false positives, although modest, independent evidence was found for associations near ELMO1, 13q, and SIK1 (P<0.05). Although the three-stage analysis of this study brings together the largest studies with the largest number of participants with T1D for GWAS on DKD, the total sample size remains modest. Future efforts to identify novel genetic risk factors for DKD will require investment in additional sample ascertainment and genotyping.
Different genetic factors may affect the various stages of DKD progression. To address the phenotypic heterogeneity within DKD, we applied selection of subphenotypes for the evaluation of renal complications. As various phenotypic comparisons yielded significant findings for different tests, these comparisons may address diverse aspects of the disease progression. Future studies would likely benefit from improved phenotyping and, in particular, from the detailed longitudinal follow-up of research participants.
Even though this study concentrated on patients with T1D, some of the results may be extended to DKD in T2D as well. However, as the pathology behind renal complications in T2D is more heterogenous,5 with ageing, atherosclerosis, and hypertension contributing appreciably to the renal function, the genetic background of DKD in T2D may have more features of the kidney disease in the general population.
Although no novel susceptibility loci were identified in this study, some signals started to appear when we aggregated genetic data as in pathway analysis, GRS, and LD score regression analysis. The pathway analyses, for example, highlighted involvement of the glucuronate interconversions pathway and supported the role of cholesterol metabolism in DKD. Analysis of GRS suggested that high BMI, as well as the genetic risk factors behind T2D, causally increase the risk of DKD in T1D. Finally, the LD score regression identified genetic correlation between smoking cessation and reduced risk of DKD in subjects with T1D, supporting the previous research finding of smoking as a risk factor for DKD.36 Altogether, these results may provide valuable clues to the biologic processes relevant to the pathogenesis of DKD.
Concise Methods
Subjects with T1D in GWAS and WES
The GWAS discovery stage included subjects with T1D from the FinnDiane Study,23 the EURODIAB Family Study,24 SDR,25 and NFS-ORPS.26,27 WES included subjects from FinnDiane, SDR, and Steno Diabetes Center (characterized in Supplemental Table 12). All studies were approved by the local ethics committees and conducted according to the principles of the Declaration of Helsinki. Written consent was obtained from the participants in FinnDiane, EURODIAB, SDR, and Steno Studies. In the NFS-ORPS study, written consent was obtained from parents, and verbal assent was obtained from children.
Phenotype Definitions
All subjects had T1D as diagnosed by their attending physician, with age at diabetes onset ≤40 years and insulin treatment initiated within 1 year of diagnosis. The kidney status was classified on the basis of AER and eGFR (please see the Supplemental Material and Supplemental Table 26). Patients receiving dialysis treatment, with a kidney transplant, or with an eGFR≤15 ml/min per 1.73m2 were defined to have ESRD. On the basis of these definitions, we analyzed seven different case-control phenotypes (Figure 1).
Patient Selection for WES
Patients were selected from the extreme ends of the liability distribution of DKD from each participating study. Patients had rapid onset of macroalbuminuria (within 20/25 years of diabetes onset in FinnDiane and Steno, respectively; no threshold in SDR) or ESRD (onset within 25 years of diabetes onset in FinnDiane and Steno). Controls were subjects with normal AER despite prolonged duration of T1D (≥32, 30, or 27 years in FinnDiane, Steno, and SDR, respectively). In addition, the FinnDiane controls were enriched for higher glycated hemoglobin (HbA1C) values (excluding subjects with HbA1C<6.5%).
Genotypes
Genome-Wide Genotyping and Imputation of the Discovery Cohorts
The genome-wide genotyping of the subjects in the SDR, NFS-ORPS, and EURODIAB was performed with the Illumina OmniExpress assays (Illumina, San Diego, CA). In quality control, samples with a call rate <98%, gender discrepancy, extremely high or low heterozygosity, or excess of estimated relatedness, were removed. Common SNPs (MAF≥5%) with genotyping rate <95% or not in Hardy–Weinberg equilibrium (P≤5.7×10−7) were removed. For low-frequency SNPs (MAF 1%–5%), the thresholds were 99% and P<10−4, respectively. In the FinnDiane Study, genotyping was performed with the Illumina 610Quad assay and the quality control was similar to the other studies, as described previously in detail.15 Principal component analysis was performed in all cohorts with Eigensoft.69 Imputation was performed with IMPUTE270 using the 1000 Genomes project (phase 1 v.3, released March 2012) as the imputation reference panel.71 Variants were filtered postimputation to those with MAF≥1%, minor allele count ≥10 in both patients and controls, and SNPtest INFO estimate of imputation quality ≥0.4.
WES and Variant Calling
Samples were sequenced at two centers. Sequencing was performed on an Illumina HiSeq2000. We required an average 20× target capture above 80% coverage, otherwise additional DNA was requested to ‘top up’ the sample. This resulted in mean sequencing depth of 54.97 (FinnDiane) and 42.23 (SDR and Steno) bases per position. After additional sequencing 497 samples were included from FinnDiane and 500 from SDR and Steno. Samples were mapped with Burrows–Wheeler aligner v7.4. Genome analysis toolkit v2.1 was used to refine and recalibrate the sequences and to call variants. Polymorphic variants (MAF>0) with a mapping quality <250, Hardy–Weinberg equilibrium P value >1×10−10, and call rate ≥75% were retained in the analysis. Samples with ≥10% missingness or extreme heterozygosity were excluded. Population outliers, duplicates, and related samples were removed. Variants were annotated using CHAos (http://www.well.ox.ac.uk/~kgaulton/chaos.shtml), snpEff,72 and VEP73 for functional class and transcript.
With 530,565 variants (491,553 SNPs and 39,012 indels) across 479 controls and 481 patients after the quality control, each individual carried a mean of 7566 synonymous, 6452 missense, and 103 protein truncating variants. The lower number of total variant sites compared with other, more outbred populations74 is in line with fewer variable sites seen in founder populations such as the Finns.75
Statistical Methods
Heritability Estimates
The narrow-sense heritability of the kidney phenotypes was estimated as the proportion of the phenotypic variance explained by the additive effects of the genotyped SNPs on the basis of the FinnDiane GWAS data using the GCTA v. 0.93.9, excluding samples with estimated relatedness ≥0.025.76 The observed variance explained was transformed to the underlying population scale on the basis of rough prevalence estimates as given in Supplemental Table 1. The heritability was estimated without covariates, and adjusting for sex, duration of T1D, and age at T1D onset.
GWAS Analysis
The genome-wide association analyses in stage 1 studies were performed with two methods in parallel. To obtain stable effect size estimates, we performed additive tests for association using SNPtest with the score method adjusted for sex, diabetes duration, age at diabetes onset, and the two first principal components. P values were obtained with a variance component based mixed model method, efficient mixed-model association expedited (EMMAX), which accounts for the sample structure, allowing to include relatives in the analysis.77 Models were adjusted for sex, diabetes duration, and age at diabetes onset, and the kinship matrix was calculated with EMMAX. EMMAX algorithm was implemented with the EPACTS software (www.sph.umich.edu/csg/kang/epacts/). Meta-analyses of the effect sizes were performed with the fixed-effect inverse variance method (GWAMA78 and METAL79), whereas P values were combined on the basis of the study-wise P values, sample sizes, and effect directions (METAL79). Meta-analysis results were further filtered to those with valid results from at least two studies. P values <5×10−8 were considered genome-wide significant, not correcting for multiple testing due to seven phenotypic comparisons, as the patient and control groups were overlapping and the traits were correlated with each other.
Power calculations were performed with the genetic power calculator (pngu.mgh.harvard.edu/∼purcell/gpc/) for simple case-control setting,80 and with Power Calculator for Two Stage Association Studies (CaTS; http://csg.sph.umich.edu//abecasis/cats/).81
Replication
Independent variants with P<5×10−6 were selected for in silico replication in six additional studies: the All Ireland–Warren 3–Genetics of Kidneys in Diabetes UK collection15 and the Genetics of Kidneys in Diabetes United States Study15 from the GENIE Consortium, the Diabetes Control and Complications Trial, and the Epidemiology of Diabetes Interventions and Complications (DCCT/EDIC) study,82,83 1073 subjects from the Joslin Diabetes Center T1D nephropathy collection, and the French, Belgian, and Danish subjects (the Steno Diabetes Center) from the French–Danish Effort84 (Supplemental Tables 2 and 3). After in silico replication, variants that replicated with a P<0.05 or had a combined P value <1×10−7 after meta-analysis were selected for de novo genotyping with TaqMan in 1095 additional FinnDiane patients, not part of the GWAS (Supplemental Table 2). rs1989248 was not successfully genotyped. Association testing was performed with logistic regression (implemented in PLINK or SNPtest, depending on the study), adjusted for sex, duration of diabetes, age at diabetes onset, and study specific covariates.
GRS Analysis
SNPs associated (P<5×10−8) in previous studies with waist/hip ratio (adjusted for BMI),47 BMI (untransformed48 and z-transformed46), systolic BP,49 low-density lipoprotein cholesterol, triglycerides, high-density lipoprotein cholesterol,50 T1D,39 T2D38 (including all SNPs, and without any other effects other than on T2D or lipids50 and T2D or glycemic traits41,85), 2-hour glucose (adjusted for BMI),40 fasting glucose (adjusted for BMI),41 HbA1C,42 fasting insulin (natural log transformed and adjusted for BMI),41 fasting proinsulin (adjusted for BMI and fasting glucose),43 HOMA-B, HOMA-IR,44 and insulin resistance45 were included in a GRS for each trait respectively. The GRS was weighted by the allelic effect of each variant on the DKD risk factor and associated with the DKD phenotypes using the stage 1 GWAS meta-analysis data.49 The lipid GRS were restricted to variants that predicted that specific trait and removed those that had effects on other lipid traits. We did not include a GRS for smoking behaviors as there were too few genome-wide significant associations to form a sufficient instrument.
LD Score Regression to Estimate Genetic Correlation
To estimate the genetic correlation between the GWAS stage 1 meta-analysis results for DKD phenotypes, and the related traits, we assembled the genome-wide summary statistics from all of the studies used to calculate GRS except for systolic BP and T1D as the full summary statistics were not available. We additionally computed genetic correlation with smoking behavior phenotypes (cigarettes per day, smoking addiction, smoking cessation, and age at smoking onset).86
Gene Set Enrichment Analyses
Gene set enrichment analysis was performed in the GWAS stage 1 meta-analysis results with the MAGENTA(vs.2)87 applied on 10,992 partially overlapping gene sets from GO, PANTHER, INGENUITY, KEGG, REACTOME, and BIOCARTA data bases; 3126 gene sets with ≥10 genes were analyzed. The 95 percentile cut-off for the gene scores was used to define the significant results.
Correction for Multiple Testing
The significance threshold for the results of the evaluation of previous loci, GRS, LD score regression, and pathway enrichment analyses were Bonferroni corrected for multiple testing with α=0.05 significance level, accounting for the number of tested loci, traits, or pathways. The results were not corrected for the seven phenotypic comparisons due to a considerable overlap of the patient and control groups.
WES Single Variant Analysis
Single variants were tested for association with DKD (patients, n=481; controls, n=479) and ESRD (patients, n=168; controls, n=479) using the logistic score test88 implemented in Epacts, with sex and two principal components as covariates. Related individuals, monomorphic SNPs, and those with SEM>10 were excluded from the analysis. Although the study setting provided low statistical power to detect rare variants (MAF<1%) with exome-wide significance (P<9×10−8 to correct for 530,776 tested variants) in line with previous reports on the statistical power to detect rare variants,89 we had sufficient power (80%) to detect a low frequency variant (MAF=5%) with a large OR of 5.65 (Supplemental Figure 13).
WES Gene-Based Analysis
We applied three series of gene based tests: a burden test (variable threshold)63 that assumes the direction of effect of grouped variants is the same, a dispersion test (SKAT)64 that performs well when the direction of variant effect differs, and a hybrid (SKAT-O)65 that uses multiple methods in a single test. Analyses were adjusted for sex and principal components. For all three tests we grouped variants into four categories: protein truncating variants (PTV; e.g., nonsense, frameshift, essential splice site), deleterious protein altering variants subdivided into ‘strict’ and ‘broad’ grouping if predicted deleterious by all five/at least one annotation algorithm, and any protein altering variants (e.g., missense, in-frame indel, nonessential splice site). These four groups are referred to as (1) PTV-only, (2) PTV+strict, (3) PTV+broad, and (4) PTV+missense, from the strictest to the most permissive class. A MAF threshold of 1% was applied to the more permissive masks PTV+missense and PTV+broad to exclude common variants from the WES analysis.
WES Gene Set Enrichment Analysis
A total of 43 gene sets were created, on the basis of top findings from the GWAS analyses, kidney-related functional terms in public databases, text mining approaches, and kidney gene expression (Supplemental Table 21). These gene sets were analyzed for enrichment in ‘Late DKD’ WES association results (SKAT-O, all four masks) using the GSEA method with SKAT-O’s P value as the ranking statistic.90 To guard against potential miscalibration of GSEA’s reported P values we conducted 100 permutations of case/control status and subsequently reran burden tests and GSEA analysis for each permutation. Significance of enrichment was validated with permutation. For validation of enrichment of GWAS result, the samples in WES were removed from the GWAS data and the analyses were repeated.
Bivariate Analysis of GWAS and WES Data
We applied ABACUS67 to the individual GWAS discovery cohorts on each of the seven different case-control phenotypes. In addition, ABACUS was applied to the WES cohorts (FinnDiane, Steno, and SDR) on the two phenotypic comparisons. SNP-sets were defined on the basis of REACTOME, KEGG, and GO Biologic Processes. To analyze nonannotated and intergenic SNPs, we also defined SNP-sets of continuous 3000 SNPs. Functional clustering of the ABACUS results was performed with DAVID software.91,92
Disclosures
P.-H.G. has received lecture honoraria from AstraZeneca (London, UK), Boehringer Ingelheim (Ingelheim am Rhein, Germany), Eli Lilly (Indianapolis, IN), Genzyme (Cambridge, MA), MSD (Kenilworth, NJ), Novartis (Basel, Switzerland), Novo Nordisk (Gladsaxe, Denmark), and Sanofi (Paris, France), and research grants from Eli Lilly and Roche (Basel, Switzerland). P.-H.G. is also an advisory board member for AbbVie (North Chicago, IL), Boehringer Ingelheim, Eli Lilly, Janssen (Beerse, Belgium), Medscape (New York, NY), MSD, Novartis (Basel, Switzerland), and Sanofi.
H.M.C. has acted as consultant/advisory panel member for Pfizer (New York, NY), Sanofi Aventis (Paris, France), Regeneron (Tarrytown, NY), and Eli Lilly, and on the data safety panel for Novartis; has received research support from Roche, Pfizer, Eli Lilly, Boehringer Ingelheim, AstraZeneca, and Sanofi; has participated in a lecturers’/speakers’ bureau and received honorarium from Pfizer; and is a shareholder in Roche.
J.T. has acted as consultant/ advisory panel member for Bayer Health Care (Leverkusen, Germany), Sanofi Aventis, Eli Lilly, Roche, Impeto Medical (Paris, France), and Novo Nordisk, and has received research support from Roche, Pfizer, Eli Lilly, Boehringer Ingelheim, AstraZeneca, and Sanofi; and is a shareholder in Orion Pharma (Espoo, Finland).
PR has acted as consultant/advisory panel member for Eli Lilly, Novo Nordisk, AbbVie, Boehringer Ingelheim, Astra Zeneca, Janssen, Astellas (Chuo-ku, Tokyo, Japan), BMS (New York, NY), and MSD (all honoraria to institution); has received research grants from Novo Nordisk, Astra Zeneca, and Novartis; has shares in Novo Nordisk AS.
J.C.F. has acted as consultant for Sanofi.
D.Z., M.J.B., and E.F. are employees and stockholders of Pfizer.
Acknowledgments
We thank A. Sandelin, A.-R. Salonen, T. Soppela, and J. Tuomikangas for skillful laboratory assistance. We also thank all of the subjects who participated in the FinnDiane study and gratefully acknowledge all of the physicians and nurses at each center involved in the recruitment of participants (Supplemental Table 28). We thank the participants in the EURODIAB Family Study and gratefully acknowledge the physicians and nurses at each center involved in subject recruitment. A complete list of participants in the DCCT/EDIC research group can be found in the New England Journal of Medicine, 2011, 365: 2366–2376.82
The research was supported by the European Union’s Seventh Framework Program (FP7/2007–2013) for the Innovative Medicine Initiative under grant agreement IMI/115006 (the SUMMIT consortium). The FinnDiane Study was supported by grants from the Folkhälsan Research Foundation, the Wilhelm and Else Stockmann Foundation, the Liv och Hälsa Foundation, Helsinki University Central Hospital Research Funds (EVO), the Signe and Ane Gyllenberg Foundation, Finska Läkaresällskapet, the Novo Nordisk Foundation (NNF14SA0003), and the Academy of Finland (134379 and 275614). The EURODIAB Family Study was funded by the European Union's Biomedicine and Health Programme. The NFS-ORPS Study was supported by NIHR/Wellcome Trust Cambridge Clinical Research Facility. The GENIE consortium was funded by the National Institutes of Health/National Institute of Diabetes and Digestive and Kidney Diseases (R01-DK081923) to J.N.H., J.C.F., and P.-H.G., the Northern Ireland Research and Development Office (A.P.M.) and Science Foundation Ireland. The DCCT/EDIC has been supported by cooperative agreement grants (1982–1993, 2012–2017), and contracts (1982–2012) with the Division of Diabetes Endocrinology and Metabolic Diseases of the National Institute of Diabetes and Digestive and Kidney Disease (current grant numbers U01 DK094176 and U01 DK094157), and through support by the National Eye Institute, the National Institute of Neurologic Disorders and Stroke, the General Clinical Research Centers Program (1993–2007), and Clinical Translational Science Center Program (2006 to present), Bethesda, MD. S.S.R. was supported by Juvenile Diabetes Research Foundation (grant 17-2012-542).
Parts of this study have been presented in abstract form at the 49th European Association for the Study of Diabetes (EASD) Annual Meeting September 23–27, 2013 in Barcelona, Spain; at the 74th American Diabetes Association (ADA) Scientific Session June 13–17, 2014 in San Francisco, CA; at the 75th ADA Scientific Session June 5–9, 2015 in Boston, MA; and at the 51st EASD Annual Meeting September 14–18, 2015 in Stockholm, Sweden.
Industry contributors have had no role in the DCCT/EDIC study but have provided free or discounted supplies or equipment to support participants’ adherence to the study: Abbott Diabetes Care (Alameda, CA), Animas (Westchester, PA), Bayer Diabetes Care (North America Headquarters, Tarrytown, NY), Becton Dickinson (Franklin Lakes, NJ), Eli Lilly (Indianapolis, IN), Extend Nutrition (St. Louis, MO), Insulet Corporation (Bedford, MA), Lifescan (Milpitas, CA), Medtronic Diabetes (Minneapolis, MN), Nipro Home Diagnostics (Ft. Lauderdale, FL), Nova Diabetes Care (Billerica, MA), Omron (Shelton, CT), Perrigo Diabetes Care (Allegan, MI), Roche Diabetes Care (Indianapolis, IN), and Sanofi-Aventis (Bridgewater NJ).
Footnotes
Published online ahead of print. Publication date available at www.jasn.org.
See related editorial, “Genetics of Diabetic Kidney Disease—From the Worst of Nightmares to the Light of Dawn?,” on pages 389–393.
This article contains supplemental material online at http://jasn.asnjournals.org/lookup/suppl/doi:10.1681/ASN.2016020231/-/DCSupplemental.
- Copyright © 2017 by the American Society of Nephrology