Abstract
Steroid-sensitive nephrotic syndrome (SSNS) accounts for >80% of cases of nephrotic syndrome in childhood. However, the etiology and pathogenesis of SSNS remain obscure. Hypothesizing that coding variation may underlie SSNS risk, we conducted an exome array association study of SSNS. We enrolled a discovery set of 363 persons (214 South Asian children with SSNS and 149 controls) and genotyped them using the Illumina HumanExome Beadchip. Four common single nucleotide polymorphisms (SNPs) in HLA-DQA1 and HLA-DQB1 (rs1129740, rs9273349, rs1071630, and rs1140343) were significantly associated with SSNS at or near the Bonferroni-adjusted P value for the number of single variants that were tested (odds ratio, 2.11; 95% confidence interval, 1.56 to 2.86; P=1.68×10−6 (Fisher exact test). Two of these SNPs—the missense variants C34Y (rs1129740) and F41S (rs1071630) in HLA-DQA1—were replicated in an independent cohort of children of white European ancestry with SSNS (100 cases and ≤589 controls; P=1.42×10−17). In the rare variant gene set–based analysis, the best signal was found in PLCG2 (P=7.825×10−5). In conclusion, this exome array study identified HLA-DQA1 and PLCG2 missense coding variants as candidate loci for SSNS. The finding of a MHC class II locus underlying SSNS risk suggests a major role for immune response in the pathogenesis of SSNS.
Nephrotic syndrome is characterized by massive proteinuria, hypoalbuminemia, edema, and hyperlipidemia. Its prevalence is estimated at 16/100,000, making it the most common glomerular disorder of childhood.1 Nephrotic syndrome is classified into steroid-sensitive nephrotic syndrome (SSNS) and steroid-resistant nephrotic syndrome on the basis of the initial response to corticosteroid therapy.1 The SSNS variant is the most common in childhood and is responsible for 80% of cases. The etiology and pathogenesis of SSNS have not been completely elucidated. Early studies suggested that SSNS may be due to T-lymphocyte dysfunction.2 However, clinical and epidemiologic data suggest that both genetic and environmental factors are important susceptibility factors. First, children who share common backgrounds (e.g., siblings) have different risks of developing SSNS, implying that affected children share a common factor that makes them prone to SSNS. Genetic factors could explain, at least in part, such differential risk. Second, evidence indicates that ethnicity or ancestry may play a role in susceptibility to idiopathic nephrotic syndrome.3 Although ethnic groups differ in environments, lifestyles, and culture, they also differ in many genomic characteristics, including allele frequencies, linkage disequilibrium (LD), and signatures of selection. It is therefore possible that SSNS risk is mediated by specific genetic risk variants, either alone or in response to yet unidentified environmental triggers. A reasonable approach toward identifying risk loci is through genome-wide association studies in well characterized case-control series. SSNS is characterized by low disease prevalence in the general population and a striking phenotype that is usually responsive to a narrow class of pharmacologic agents.
We used an exome array to conduct an association study of coding variants in a South Asian cohort of children with SSNS (n=214) and identified suggestive loci on chromosome 6p. Four variants in HLA-DQA1 and HLA-DQB1 achieved Bonferroni-adjusted significance; the top two were missense variants C34Y (rs1129740) and F41S (rs1071630) in MHC II gene HLA-DQA1. We replicated these two HLA-DQA1 missense variants in a separate cohort of 100 children with SSNS from a population of white European descent and achieve a significance of P=1.83×10−17. Structural modeling suggests that the two HLA-DQA1 variants perturbed the secondary structure of the protein and may affect antigen presentation. Rare variant analysis showed that PLCG2 has the strongest association with SSNS. The protein encoded by PLCG2 is involved in adaptive immunity. The study suggests a major role for adaptive and autoimmunity in the pathogenesis of SSNS.
Results
Discovery Cohort
Characteristics of Study Participants
The discovery cohort comprised 363 participants: 214 cases and 149 controls (Table 1) of South Asian ancestry recruited from a single university hospital in Sri Lanka. The male-to-female ratio was 2:1. The median age at diagnosis of the cases was 3 years (range, 1.0–9.5 years). All the study patients responded to 8–12 weeks of oral corticosteroid therapy and were classified by the treating physicians as having SSNS. The samples in the study clustered closely together (Supplemental Figure 1) with one outlier that clustered closest to the European ancestry populations from the 1000 Genomes Project (Supplemental Figure 2). Analysis with and without this outlier yielded the same findings. The whole SSNS study sample clustered between the East Asian and European ancestry populations on the Principal Component 1 versus Principal Component 2 plot (Supplemental Figure 2). This observation is consistent with previous findings in studies of South Asians (e.g., the Gujarati Indians in the HapMap Project).4 Plots of other Principal Components showed that they clustered separately from the East Asian, European, American, and African ancestry populations.
Clinical characteristics discovery and replication cohort
Association Analysis in Discovery Sample
A Manhattan plot of the association tests and the QQ plot are shown in Figure 1 and Supplemental Figure 3. Four single nucleotide polymorphisms (SNPs) on chromosome 6 (Figure 2) reached the Bonferroni-adjusted P value for the single variants for this study (P<1.87×10−6; i.e., Bonferroni correction of α of 0.05 for 26,761 SNPs with minor allele frequency ≥ 0.05) (Table 2). Two of these SNPs are missense SNPs in HLA-DQA1, each of which has an odds ratio of 2.11 (95% confidence interval [95% CI], 1.56 to 2.86; P=1.684×10−6 [Fisher exact test]). The other SNP of the four top SNPs that is within a gene is a missense SNP in HLA-DQB1, while the fourth SNP is an intergenic variant in the same gene (Table 2). All the four SNPs are in a 20-kb region and are perfectly correlated (r2=1, D’=1 for all pairwise LD comparisons). This explains the identical test statistics and effect sizes in the association analysis. However, a closer examination of the region around these four top SNPs revealed that while the two HLA-DQA1 are contiguous, the other two SNPs are about 20 kb away. These other SNPs are separated from the two HLA-DQA1 SNPs by intervening SNPs that are not in LD with the former or the latter (Figure 2). The chromosome 6p variants explained only about 4.6% of the risk for SSNS in this cohort. The top 25 signals on single variant analysis are shown in Supplemental Table 1. Because SSNS is a complex trait of unclear genetic architecture, we performed association scans using both recessive and dominant models; in addition, we also examined X chromosome markers. None of the markers showed stronger association compared with the additive model (Supplemental Tables 2–4).
Manhattan plot of single marker association P values for SSNS showing significant markers in HLA-DQA1 and HLA-DQB1 genes. Observed P values are plotted y-axis, and chromosome locations are on the x-axis. Red line indicates exome-wide significance threshold. Four SNPS—rs1129740, rs9273349, rs1071630, and rs1140343—in HLA-DQA1 and HLA-DQB1 reached exome-wide significant threshold with a P value of 1.68×10−6 (Fisher exact test; P=1.187×10−6 allelic test).
Regional plot of chromosome 6 region showing three of the four exome-wide significant variants on one haplotype and the fourth on a different haplotype. Dots are colored by LD (r2) in 1000 Genomes East Asian (EAS) super population as indicated in the legend. (B) LD plot of region. The exome-wide significant SNPs in the discovery samples are indicated in green.
Most significant markers on association analysis
One or Multiple Loci on Chromosome 6
Because four SNPs yielded identical exome wide P values, we conducted several analyses to determine whether the chromosome 6 signal is only one locus or, in fact, represents multiple loci. First, we conducted a set of conditional analyses. Conditioning on each of the four SNPs in turn and evaluating association in the rest of the dataset showed residual evidence of association; 22 SNPs had a P value <5×10−4, including three SNPs with a P value between 5×10−5 and 8×10−5 (Supplemental Table 5). However, conditioning on any two or more of the four SNPs abolished all evidence of association at other loci (best P=1). Unsurprisingly, haplotype analysis showed that any haplotype containing two or more of the four SNPS yielded the same haplotype association statistics (identical to the single-SNP statistic for the first four SNPS in Table 2).
Second, we conducted fine mapping analysis using the “proxy association” procedures in PLINK.5 This approach attempts to refine a single SNP association by finding flanking markers and haplotypes that are in strong LD with the index SNP and testing these proxies for association using a haplotype based framework. However, this approach was unable to refine the locus further.
Third, we used LD-based clumping of the association results to identify the best clump of SNPs with the strongest association with disease. LD clumping is used to report the top single SNP results from a dense genotype study or genome-wide scan in terms of a smaller number of clumps of correlated SNPs. We used an LD cutoff of 0.5 to define a clump with a 1-MB physical threshold for clumping, a P value of 10−6 as the significance threshold for index SNPs, and a P value of 10−5 as the secondary threshold for clumped SNPs. Only one clump (consisting of the top four SNPS) emerged from this exome-wide analysis. This rigorous subanalysis was unable to refine the 20-kb chromosome 6 region defined by these four SNPs (chr6: 32609105–32629137) into separate signals using statistical methods.
Replication in an Independent Cohort
We evaluated replication of our main findings in an independent sample of 305 persons: 100 children with SSNS in a self-reported population of white European descent and 205 self-reported controls from a population of white European descent from the Duke CATHGEN repository.6 The characteristics of patients in the replication cohort are shown in Table 1. The top four SNPs were genotyped by direct sequencing of the replication samples. Genotyping failed for two SNPS (rs9273349, rs1140343) despite multiple attempts. Similar to the discovery sample, the top two HLA-DQA1 SNPS were in perfect LD in the replication sample. The association for the two HLA-DQA1 SNPs was confirmed in the replication sample, (i.e., P values showing significance, with the same direction of effect) (Table 3). Notably, a stronger effect (larger odds ratio (OR) and smaller P values) was observed in the replication sample: OR of 5.442 (95% CI, 3.581 to 8.119) and P=1.418×10−17. This was confirmed by a significant test of heterogeneity of effect (P=3.16×10−4). As a sensitivity test, we redid the replication analysis using the same cases from a population of white European descent but with European ancestry controls (n=379) from the 1000 Genomes European ancestry dataset. The associations were confirmed for the two HLA-DQA1 SNPs: rs1129740 and rs1071630. The association findings were as follows: for rs1129740, A allele frequency of 0.528, P=7.868×10−14, and OR of 4.077 (SEM, 0.198); for rs1071630, G allele frequency of 0.578, P=2.9×10−10, and OR of 3.328 (SEM, 0.198).
Replication and combined analysis with white European ancestry samples
Annotation and Structural Modeling of the HLA-DQA1 Variants
The two replicated HLA-DQA1 SNPs are missense SNPs. The two SNPs are in promoter histone marks, DNAase hypersensitivity sites, or protein binding (binding POL2 and POL24h8) (Supplemental Table 6). Both variants were predicted to be benign by three in silico modeling software: PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/), SIFT (http://sift.jcvi.org/), and MutationTaster (http://www.mutationtaster.org/) (Supplemental Table 7). Three-dimensional modeling of the effect of the variants on the protein structure of HLA-DQA1 (Figure 3) show that the amino acid substitutions resulted in substantial perturbation of protein secondary structure. The affected residues are near the dimer interface which may disrupt the assembly of the antigen recognition domain.
Locations of HLA-DQA1 variants in in silico prediction models of HLA-DQA1 showing that the mutations lead to perturbation of the secondary structure of HLA-DQA1. I-TASSER was used to model the three-dimensional structures of wild type (WT) and missense variants at residues 34 and 41. Substitution of the native cysteine (C) residue at position 34 for tyrosine (Y) and native phenylalanine (F) residue at position 41 for serine (S) resulted in perturbation of the secondary structure of HLA-DQA1.
Relationship with Other Kidney Disease Loci
Two kidney-related traits are associated with variants in or near HLA-DQA1: IgA and membranous nephropathy.7,8 However, the reported leading SNPs in these studies are not in LD with any of the top SNPs in this study, with IgA SNP (rs9275596) and membranous SNP (rs2187668) showing r2=0.007 (physical distance from rs1129740, 72.5 kb) and r2=0.051 (physical distance from rs1129740, 4 kb), respectively. We also investigated the association between SSNS and all the variable SNPs observed in our study in the established kidney disease genes: GPC5, APOL1, and MYH9.9–12 No SNP in these genes was significant.
Gene Set–Based Analysis
The gene set analysis of rare variants was carried out in the discovery sample. The most strongly associated gene set was PLCG2 (P=7.825×10−5) at a rare-variant minor allele frequency threshold of 0.05 (Supplemental Table 8). A Manhattan plot and the QQ plot for gene set–based analysis is shown in Supplemental Figures 4 and 5. The PLCG2 variants tested comprised seven missense SNPs with individual allele frequencies of 0.002 and 0.03 (Table 4, Supplemental Table 9). The minor allele frequencies of these variants in populations represented in the 1000 Genomes Project are shown in Supplemental Table 10. Some of these variants in the tested gene set are monomorphic in each of the 1000 Genomes Project phase 1 populations. Joint common and rare variant analysis genes showed that the genes with the strongest evidence for association are HLA-DQA1 (P=2.66×10−6), DDHD2 (4.59×10−5), HLA-DRB1 (5.91×10−5), and PLCG2 (P=6.91×10−5) (Supplemental Table 11). This finding essentially recapitulates the separate analyses done for single marker common variant association (Table 2) and the rare variant analysis.
Characteristics of markers in PLCG2, the most significant gene set in rare variant analysis
Discussion
Idiopathic SSNS remains one of the most common kidney conditions in childhood, and its prevalence varies widely among different populations. However, the basis for this wide variation has not been established. We used the knowledge of epidemiology of the disease to identify a well defined homogenous cohort by focusing on children with onset of disease in the first 10 years of life and compared these children with adult controls. For diseases that are common in children but rare in adults, this strategy minimizes phenotype misclassification because the adult controls have passed the age range of highest disease susceptibility but have remained disease-free.13 We identified a disease locus on chromosome 6p on a modest cohort and replicated this locus in another cohort of a different ancestry. This is the first exome array association study in childhood SSNS and the first study to identify missense variants as a risk factor for SSNS.
Smaller studies from different populations and ethnicities have previously reported association between variants in HLA-DQ3, HLA-DQ8, HLA-DR, HLADQW2, HLA-DQA1, and HLA-DQB1 in nephrotic syndrome.14–20 All of these studies were characterized by small sample sizes and were limited to a few markers in the HLA locus. The findings of the present exome-wide study confirms the observation from these studies that variation in HLA antigens are important risk factors for SSNS. The HLA-DQA1 region seems to be a pleiotropic locus, with significant genome-wide associations reported for many immune diseases and malignancies.21
Two glomerular diseases, IgA and membranous nephropathy, are associated with variants in or near HLA-DQA1.7,8 The lack of correlation between the top SNPs in this study with the IgA and membranous nephropathy loci suggests that the loci found in this study may represent independent loci. While there is lack of correlation between top SNPs in this study and reported IgA and membranous nephropathy loci, more data are needed to determine whether the variants found in this study are independent loci.
On the other hand, the finding of HLA loci associated with the three proteinuric nephropathies (IgA nephropathy, membranous nephropathy, and SSNS) suggests that they all share an underlying immune-mediated mechanism in disease pathogenesis. Our ability to replicate this locus in a different ethnic group strengthens the notion that the risk factor that we identified may not be population specific.
Formal estimates of the heritability of SSNS are not available; however, familial SSNS is rare, and it is estimated that only 3% of affected children have an affected sibling.22 Our finding that the locus explains only about 4.6% of the risk for SSNS is consistent with the expectation that other SSNS risk loci remain to be detected in studies with larger sample sizes and/or different populations. Although it remains to be shown whether the association is causal, possible mechanisms by which variants in HLA-DQA1 may predispose to SSNS include antigenic stimulation by infectious agents or yet-to-be-identified autoantigens. Defective antigen presentation by the HLA-DQA1 variants could lead to abnormal T-cell response and reversible injury to the podocyte.
In this regard, we hypothesize that the locus with the best signal on rare variant analysis (PLCG2) may play a role in SSNS risk. Although this locus did not reach Bonferroni-adjusted significance (9.13×10−6 for the gene set analysis), it has biologic annotations potentially relevant to SSNS. PLCG2 is a transmembrane signaling enzyme that catalyzes the conversion of 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate to 1D-myo-inositol 1,4,5-trisphosphate (IP3) and diacylglycerol (DAG).23–25 IP3 and DAG are second messenger molecules important for transmitting signals from growth factor and immune system receptors across the cell membrane. The protein PLCG2 is a signaling molecule that is important in the complex regulation of the immune system.23–25 PLCG2 domains include pleckstrin homology; EF-hand motifs; catalytic, multiple src homology; and autoinhibition region within the catalytic linker region (Supplemental Figure 6).23–25
Of note, most of the variants associated with SSNS in this study are located within the autoregulatory region, suggesting that the variants may affect the function of the gene. Mutations in PLCG2 have been associated with inherited autoinflammatory disease with immunodeficiency in humans.23,24 In addition, a gain-of-function mutation in murine plcg2 has been associated with inflammatory diseases including proliferative glomerulonephritis.25 Furthermore, mutations in PLCE1, a member of the PLC family of genes, is a cause of both SSNS and steroid-resistant nephrotic syndrome.26 On the basis of the findings in this modest cohort and existing data, resequencing of PLCG2 in a large cohort of children with SSNS may provide additional insights on the role of this gene in the pathogenesis of the disease.
Corticosteroid remains the primary treatment of SSNS, further supporting the central role of immune regulatory mechanisms in the pathogenesis of SSNS. However, the treatment is nonspecific, its mechanism of action is poorly understood, and it is associated with multiple adverse effects. Further characterization of risk loci for SSNS is essential to improve our understanding of its pathogenesis and variability in response to treatment while having the potential to lead to more targeted therapy. In this regard, more studies with larger sample sizes that rely on whole-genome or -exome sequencing approaches are needed.
A potential limitation of the present study is the possibility of residual population stratification in the replication sample. The replication cohort comprised self-identified Americans of European ancestry, who may be more heterogeneous than the discovery sample of South Asians. We attempted to mitigate this issue using two different sets of controls, reasoning that confounding is less likely if we find consistency with two sets of controls that are ascertained (CATHGEN) and/or genotyped (1000 Genomes) differently from the cases. There was no allelic differentiation between the two sets of controls used, as shown by the similar minor allele frequencies in CATHGEN versus 1000 Genomes European cohort. Label swapping permutation for the replication tests shows that both SNPs have empirical P values of 1×10−8 after 10 million permutations. Nonetheless, in the absence of dense genotyping data on the replication cohort, we are unable to definitively adjust for population stratification in this group using genotypes in the present study. Another potential limitation is the limited sample size, which could only reliably detect relatively large effect sizes (genotypic risk ratio, 2.5–3.0 or higher). A larger sample size could improve the detectable effect size and the power to detect more (especially non-HLA) loci.
Future studies will address the mechanisms by which the identified variants may predispose to disease, the role of the locus in other ethnic groups, and other putative disease-associated loci.
In conclusion, we identified missense variants in HLA-DQA1 and PLCG2 as genetic risk factors for childhood SSNS. These findings support the importance of immune dysregulation in the pathogenesis of SSNS.
Concise Methods
Discovery Study
Case Ascertainment
Children with nephrotic syndrome with age at onset between 1 and 10 years were eligible to be enrolled in the study. After written informed consent, patients underwent a medical history, clinical examination, and review of laboratory studies by a pediatric nephrologist. Nephrotic syndrome was defined as proteinuria (>40 mg/m2 per hour or spot urine protein (g)/creatinine (g) ratio >2), hypoalbuminemia and edema. Patients with nephrotic syndrome secondary to infectious agents, malignancies, medications, and other conditions associated with nephrotic syndrome (such as lupus nephritis and IgA nephropathy) were excluded. The following clinical data were obtained from participants in the study: basic demographic data (including age, sex, race, and ethnicity), family history of renal disease, age at onset of symptoms, and laboratory data (including urinalysis, 24-hour urine protein excretion or spot urine protein-to-creatinine ratio, and serum creatinine). Treatment received and pattern of response to therapy were recorded. We obtained blood or saliva from all participants for DNA extraction.
The discovery study comprised 214 South Asian children, 210 (98.1%) of whom were recruited from a single university hospital in Sri Lanka. All 149 controls were adults recruited from the same center. The primary self-reported ethnicity of patients in this hospital is Sinhalese (A. Abeyagunawardena, personal communication). The cases developed disease between 1 and 10 years of age and all responded to standard steroid therapy—that is, they all had SSNS—as defined by the International Study of Kidney Disease in Children and Arbeitsgemeinschaft für Pädiatrische Nephrologie guidelines.27,28 To minimize the risk of misclassification, we selected as controls adults (age≥19 years), reasoning that their risk of developing childhood-onset SSNS is essentially zero. Notably, >80% of children who develop nephrotic syndrome do so before the age of 5 years. Criteria for enrollment as a control included no history of nephrotic syndrome, normal urinalysis, and no edema. Institutional review boards of Duke University Medical Center (Durham, NC) and all collaborating institutions approved the study. The sample size of the study has ≥83% to detect a genotypic risk ratio of ≥3 for a locus with minor allele frequency of ≥0.1 and ≥73% power for a genotypic risk ratio of 2.5 for a locus with a minor allele frequency of ≥0.15 (Supplemental Figure 7).
Genotyping
Genotyping was performed at the Duke University Center for Human Genetics on the Illumina HumanExome-12v1_A BeadChip Array, an array of approximately 220,000 coding SNPS identified from multiple studies covering >12,000 exomes and supplemented with additional content from various other sources (http://genome.sph.umich.edu/wiki/Exome_Chip_Design). Genotypes were called and initial quality control done using Illumina GenomeStudio following the manufacturer’s recommendations. Annotation of the markers was updated with the latest Illumina annotation (v1–1A) for the chip and dbSNP 137 on hg build 37.3.
Statistical Analyses
Several quality control measures were applied to the data. First, monomorphic SNPs were filtered out, resulting in 54,240 SNPs. Next, filters were applied for a genotype locus success rate <90% (363 SNPs dropped) and for Hardy-Weinberg equilibrium P<10−6 (57 SNPs dropped). After exclusion of sex chromosome SNPs, 52,370 autosomal variable SNPs remained; these variants were used for analysis. All samples had an overall success rate >95% (mean individual sample success rate, 99.94%) and the mean success rate of the final set of markers was 99.774%. A subset of approximately 8000 markers was selected by pruning for LD and used to compute the principal components of the genotypes. LD pruning was done using the variance inflation factor to recursively remove SNPs within a sliding window. The specific parameters we used were as follow: variance inflation factor of 1.1, window size of 100 SNPs, and number of SNPs to shift the window at each step of 10. The same set of SNPs was extracted from the 1000 Genomes Project Resource (phase I v3.20101123 integrated release). This facilitated the evaluation of the relationships between the study sample and the 1000 Genomes Project populations by enabling the projection of the principal components of the genotypes to the same axes. Potential population stratification in the discovery sample was investigated by examining principal component plots (which showed one cluster [Supplemental Figures 1 and 2]) and the scree plot of the first 100 principal components (which showed a nearly flat surface, suggesting that no principal component is significant [Supplemental Figure 8]). In addition, we conducted a formal test of significant principal components34 showing that no principal component was significant. Single SNP-trait association analysis was done for SNPs with minor allele frequency≥0.05 (n=26,761) using PLINK 1.07.5 Association tests were done under an additive genetic model using a Fisher exact test as well as the allelic test. The Bonferroni-adjusted significance level was P<1.87×10−6 (Bonferroni-corrected of 0.05 for 26,761 SNPs). ORs and 95% CI are presented for the most significant SNPs.
Replication Study
Case Ascertainment
We evaluated replication of our main findings in an independent sample of 305 persons: 100 children with SSNS who were of self-reported white European ancestry and 205 controls of self-reported white European ancestry from the Duke CATHGEN repository.6 The children in the replication phase of the study were enrolled at Duke University Medical Center (26%) and the divisions of pediatric nephrology in major academic medical centers across the United States. Centers participating in the study and the number of persons recruited for this phase of the study, are listed in Supplemental Table 12). All the participants were enrolled by their local pediatric nephrologist during a clinic visit or during admission. The Duke CATHGEN study cohort is a biorepository of consenting adults undergoing cardiac catheterization for different symptoms referable to the cardiovascular system at Duke University Medical Center since 2001. The samples used as controls in this study are adults who are classified as not having kidney disease after screening.6 Among the initial samples selected for genotyping for this study, two subsequently developed chronic kidney disease (etiology not nephrotic syndrome) and were excluded. The second group of controls consisted of European ancestry participants in the 1000 Genomes Project (n=379) phase 1, comprising British in England and Scotland (n=89), Finnish in Finland (n=93); Iberian populations in Spain (n=14); Toscani in Italy (n=98); and Utah residents with Northern and Western European ancestry (n= 85). While the 1000 Genomes Project did not collect any phenotype data, the risk of one of these individuals being a cryptic SSNS is extremely low (at a population prevalence of 16/100,000, we expect less than one case [0.06 cases] in the 1000 Genomes European sample). Therefore, we considered this set a suitable group of controls because of the negligible risk of misclassification.
Genotyping
SNPs that reached the Bonferroni-adjusted P value for the number of single variants tested were genotyped in the replication study. Genotypes for the replication cases and CATHGEN controls were obtained by direct sequencing at Duke University Center for Human Genetics, while genotypes from the 1000 Genomes Project were obtained from the most recent (version 1 phase III) release (http://browser.1000genomes.org/index.html). We considered replication to be P<0.05 with the same allele and the effect in the same direction as the discovery finding. A random effects model that is optimized to detect associations under conditions of study heterogeneity29,30 was used to combine the association statistics of the SNPs from the discovery and replication studies.
Gene Set–Based Analysis
Given that studies of coding variation using exome arrays or whole-exome sequencing give rise to both common and rare variants, we undertook gene set rare variant analysis in the discovery cohort using sequence kernel association tests (SKAT) as implemented in the SKAT package.31–33 Such tests are necessary because standard single-marker methods used to test common variants (as in genome-wide association studies) are usually underpowered for rare variants. We selected sequence kernel association tests because they are flexible and computationally efficient and can easily be applied to genome-wide or exome-wide data. An added advantage is that they are robust to situations where rare variants in a gene/region may influence the variants in different directions and with differing magnitude of effect. A gene set comprised SNPs mapping to the region within the coordinates of the gene in the human genome build hg19. We conducted a rare variant analysis on our exome array dataset using minor allele frequency thresholds of 0.05 and 1/√2n (0.037 in our dataset) for definition of “rare variant.” The latter sample size–dependent frequency threshold is recommended in some situations because it may work better in modest samples to correctly classify rare variants while being able to identify in larger samples those variants that would have been classified as rare but could be tested individually in such samples.31 We conducted rare variant analysis using the SKAT combined-sum test on the 5476 gene sets with two or more rare variants (out of the 12,747 gene sets represented by the 52,370 variable SNPs in our dataset). Next, we conducted joint common and rare variant analysis using the combined sum with SKAT for both common and rare variants and evaluating all 12,747 gene sets in our dataset. This allowed us to evaluate the strength of evidence for association between SSNS and both common and rare variants within the same framework.
In Silico Modeling
The variants identified in the HLA-DQA1 gene were scored using three in silico software packages (PolyPhen, Sift, MutationTaster) to examine the predicted damaging effect of the amino acid substitution to the function of HLA-DQA1. The effect of amino acid change on secondary structure of the protein was assessed by the I-TASSER server (http://zhanglab.ccmb.med.umich.edu/I-TASSER/).
Disclosures
None.
Acknowledgments
We would like to thank the personnel of the Center for Human Genetics core facilities, Duke University, and, most importantly, the family members of the Duke Genetics of Kidney Disease study.
This work was supported by the Doris Duke Charitable Foundation. R.G. is the recipient of a Doris Duke Clinical Scientist Development Award and National Institutes of Health (NIH), National Institute of Diabetes and Digestive and Kidney Disease (NIDDK) K08-DK082495-05 award. R.G. is also the recipient of a P&F grant from Duke O’Brien Center for Kidney research supported by NIH NIDDK P30-DK096493 award. A.A. is supported by the Intramural Research Program of the Center for Research on Genomics and Global Health (CRGGH). The CRGGH is supported by funds from the Office of the Director, NIDDK and National Human Genome Research Institute at the NIH (Z01-HG200362). The funding sources have no role in the writing of the manuscript or the decision to submit it for publication.
Footnotes
R.A.G and A.A. contributed equally to this work.
Published online ahead of print. Publication date available at www.jasn.org.
This article contains supplemental material online at http://jasn.asnjournals.org/lookup/suppl/doi:10.1681/ASN.2014030247/-/DCSupplemental.
- Copyright © 2015 by the American Society of Nephrology