Plasma Proteomics of Renal Function: A Transethnic Meta-Analysis and Mendelian Randomization Study

Visual Abstract Significance Statement Studies on the plasma proteome of renal function have identified several biomarkers, but have lacked replication, were limited to European populations, and/or did not investigate causality with eGFR. Among four cohorts in a transethnic cross-sectional study, 57 plasma proteins were associated with eGFR, 23 of them also with CKD. Furthermore, Mendelian randomization and gene expression analyses in kidney tissue highlighted testican-2 as a physiological marker of kidney disease progression with potential clinical relevance, and identified a few additional proteins warranting further investigation. Background Studies on the relationship between renal function and the human plasma proteome have identified several potential biomarkers. However, investigations have been conducted largely in European populations, and causality of the associations between plasma proteins and kidney function has never been addressed. Methods A cross-sectional study of 993 plasma proteins among 2882 participants in four studies of European and admixed ancestries (KORA, INTERVAL, HUNT, QMDiab) identified transethnic associations between eGFR/CKD and proteomic biomarkers. For the replicated associations, two-sample bidirectional Mendelian randomization (MR) was used to investigate potential causal relationships. Publicly available datasets and transcriptomic data from independent studies were used to examine the association between gene expression in kidney tissue and eGFR. Results In total, 57 plasma proteins were associated with eGFR, including one novel protein. Of these, 23 were additionally associated with CKD. The strongest inferred causal effect was the positive effect of eGFR on testican-2, in line with the known biological role of this protein and the expression of its protein-coding gene (SPOCK2) in renal tissue. We also observed suggestive evidence of an effect of melanoma inhibitory activity (MIA), carbonic anhydrase III, and cystatin-M on eGFR. Conclusions In a discovery-replication setting, we identified 57 proteins transethnically associated with eGFR. The revealed causal relationships are an important stepping stone in establishing testican-2 as a clinically relevant physiological marker of kidney disease progression, and point to additional proteins warranting further investigation.

The kidneys' ability to filter blood and maintain homeostasis is reflected in the GFR. 2 Blood levels of serum creatinine, a filtration marker, can be used for eGFR. 3,4 CKD, characterized by reduced eGFR (,60 ml/min per m 2 ) and proteinuria, has a global prevalence of 10% to 16% 5,6 and is expected to be increasingly common in aging populations. 2 Increased serum creatinine is not evident until approximately 50% of the renal filtration function is lost, 7 making CKD a silent disease and creating a blind spot for early kidney disease detection. 8 Its rising prevalence, in addition to the lack of therapeutic options, 8 imposes a significant burden on health systems and individuals worldwide. 2,6 A number of biomarker research studies have been conducted in regard to early detection, diagnosis, and/or progression prediction of kidney diseases. 7,8 Early efforts in proteome research focused on urine biomarkers, where combining multiple urinary biomarkers was successful (e.g., CKD273 classifier). 7,9 More recent studies have focused on blood, an easily accessible tissue mirroring the metabolic status of multiple organs, with a complex profile requiring sensitive techniques for its study. SOMAscan, a platform using DNA aptamers to measure hundreds of plasma proteomic biomarkers, 10 has been successfully used in different epidemiological settings. [11][12][13][14][15] However, kidney disease has not been sufficiently investigated: prior studies have tested a limited number of proteins, 16 relied on small samples without replication, 10,17 or have not investigated causality. 18 Mendelian randomization (MR), a causal inference method relying on the random allocation of alleles at conception to estimate causal effects on outcomes, is an increasingly popular method used in genetic epidemiology studies to address causality. 19, 20 We present a cross-sectional study using a multiplexed aptamer-based proteomics platform to investigate associations between 1095 plasma proteins and eGFR/CKD, and other renal parameters in a discovery cohort (Cooperative Health Research in the Region of Augsburg S4 prospective cohort follow-up; KORA F4), with replication in three independent studies of European and admixed ancestry (INTER-VAL, Nord-Trøndelag Health Study [HUNT], and Qatar Metabolomics Study on Diabetes [QMDiab]). To better understand the biological significance of the identified proteins, we conducted enrichment, protein, and transcriptome analyses across tissues, and investigated their interconnection using protein-protein interaction (PPI) network analysis. We also investigated causal effects between eGFR and the replicated proteins using two-sample bidirectional MR. We further examined the correlation between their gene expression in kidney tissue, eGFR, and histological parameters using both publicly available datasets and transcriptomic data from a kidney resource.

Study Populations
The KORA study is a population-based sample from the general population living in the region of Augsburg, Southern Germany. The KORA F4 survey, a follow-up of the KORA S4 prospective cohort (1999)(2000)(2001), was conducted from 2006 to 2008, and included a total of 3080 participants. Clinical and demographic information, and peripheral blood for "omics" analyses, were collected; details on the standardized examinations, interviews, and tests conducted in the KORA study have been previously described. 21,22 This study acted as discovery cohort in the cross-sectional association study of plasma proteins and renal function ( Figure 1A).
Included in the replication phase were HUNT, namely the third survey (HUNT3) from this population-based study from Norway with data on participants of European descent; 23 the INTERVAL study, a randomized trial assessing blood donation practices across the United Kingdom with extensive phenotyping available for 50,000 participants of European descent; 24 and the QMDiab, a cross-sectional case-control study on type 2 diabetes from participants of Arab, South Asian, and Filipino descent in Qatar. 25 Population characteristics from the four studies are shown in Table 1. Information on data availability is given in Supplemental Note 1.
Sample Collection and Proteomic Profiling EDTA plasma samples collected by the studies following standardized procedures were centrifuged, aliquoted, and stored at -80 C. [26][27][28] Samples for proteomic profiling and GFR estimation were taken at the same time.
Proteomic profiling in all participating studies was done using SOMAscan (SomaLogic, Inc.), an aptamer-based, affinity proteomics platform. 10,[29][30][31] Plasma samples from KORA F4, HUNT3, and INTERVAL were shipped on dry ice to SomaLogic (Boulder, CO), and proteomic profiling was performed using a SOMAscan panel with 1129 protein-specific SOMAmer probes for KORA, 26 3622 for INTERVAL, 27 and 5000 for HUNT3. 28 In the QMDiab cohort, the kit-based SOMAscan platform was run by the Weill Cornell Medicine Qatar proteomics core following protocols and instrumentation provided by SomaLogic Inc., under supervision of SomaLogic personnel, to measure 1129 proteins in plasma samples. 26 The samples were all measured by individuals blinded to the identities corresponding to the samples. In summary, fluorescently labeled single-stranded synthetic nucleotides (Slow Off-rate Modified Aptamers, SOMAmers) immobilized on streptavidin-coated beads are incubated with plasma samples to capture proteins and generate SOMAmerprotein complexes. Washing steps eliminate unbound SOMAmers and unbound/nonspecifically bound proteins. The next steps are biotin labeling and photocleavage to liberate SOMAmer-protein complexes from the beads. This is followed by incubation in a buffer disrupting nonspecific interactions, recapturing the biotin-labeled protein/aptamer complexes in streptavidin-coated beads, and additional washing steps to remove nonspecific SOMAmers. These are then eluted from the target proteins and quantified on custom DNA microarrays using deposited SOMAmer-complementary oligonucleotides, which produces measurements in relative fluorescence units as proxies to protein concentrations. Quality control (QC) at the sample and SOMAmer levels using control aptamers and calibrator samples was conducted by the manufacturer. In brief, based on standard samples included on each plate, the resulting raw intensities were processed using a workflow including hybridization normalization, median signal normalization, and signal calibration to control for interplate differences. In the discovery cohort (KORA F4), QC resulted in  Figure 1. (A) Cross-sectional association study. Data from 995 participants and 1129 proteins from KORA F4 was used in the discovery phase of a proteome-wide association study of renal function using confounder-adjusted regression models. The replication studies were INTERVAL, HUNT, and QMDiab. Three rounds of replication are shown: R1, replication based on the meta-analysis of P values from the linear regression results of the studies with European ancestry; R2, replication based on the results of linear regression models performed in the Arab, South Asian, and Filipino descent sample QMDiab; R3, identification of proteins consistently associated with eGFR across samples and ethnicities. The set of proteins identified in R3 was then functionally annotated and brought forward to the causal analysis phase. (B) Causal analysis. Two-sample bidirectional MR using data on participants from European ancestry studies in the Chronic Kidney Disease (CKDGen) Consortium to instrument the forward analysis (eGFR causal to protein level) and data from INTERVAL and AGES-Reykjavik to instrument the reverse analysis (protein level causal to eGFR). Details on the data processing workflow for MR are shown. the exclusion of 29 proteins and one sample, and five proteins were further excluded due to crossreactivity (Assay Change Log SSM-064_Rev_0_DCN_16-263 issued by SomaLogic, available at the integrated web-server at http://proteomics. gwas.eu), 26 producing data on k51095 proteins in 999 participants (Supplemental Table 1). The same QC conducted in each study resulted in the inclusion of 3301 participants in INTERVAL, 27 2432 individuals in HUNT3, 12 and 352 participants in QMDiab, 26 and a set of 993 proteins passing QC in all studies (column "R" in Supplemental Table 1); further details on the proteomics profiling from the samples included in this study are described elsewhere. 12,26,27 Protein mapping to several identifiers was provided by the manufacturer (Supplemental Table 1).

Outcome Definitions
Our first analysis is a proteome wide association study: we investigated associations between proteins and renal traits as outcomes, using linear regression models with adjustment for potential confounders. The primary outcomes studied in this analysis were eGFR from serum creatinine and CKD, given their availability in all included studies. eGFR was calculated using the CKD Epidemiology Collaboration equation with serum creatinine 4 with the R package nephro v1.2. 32 Serum creatinine was measured using the modified kinetic Jaff e reaction in KORA, HUNT, and QMDiab (and calibrated by multiplying by 0.95), 33 and a nuclear magnetic resonance platform (Nightingale Health) in the INTER-VAL study. Pearson's correlation between serum creatinine-based eGFR and this nuclear magnetic resonance-based eGFR variable was estimated in KORA (Supplemental Figure 1). CKD was defined as eGFR ,60 ml/min per 1.73 m 2 . 34 We performed some analyses for outcomes available only in the discovery study. Urinary albumin and urinary creatinine were used to calculate urinary albumin-creatinine ratio (uACR) and its derived parameter microalbuminuria (MA, defined as uACR .30 mg/g). eGFR decline was defined as log(eGFR) follow-uplog(eGFR) baseline divided by the followup time, where KORA F4 (2006-2008) was used as baseline and KORA FF4 (2013-2014) as its follow-up survey. Sensitivity analyses were also run using eGFR cys (derived from the CKD Epidemiology Collaboration equation using cystatin C). 13

Definition of Covariates
Covariates used in the regression analyses were age at the time of examination, sex, body mass index (BMI), smoking status, diabetes (yes/no), hypertension (yes/no), log-transformed triglycerides, HDL, and intake of lipid-lowering drugs (yes/no). See Supplemental Note 2 for precise cohort-specific definitions of covariates used.

Statistical Analysis
Data preprocessing and statistical analyses were conducted using the R language for statistical computing v.3.6.0. 35 Before statistical analysis, proteomic data were log transformed and standardized. Linear regression was used to examine the association between protein levels and continuous kidney traits (log-transformed eGFR, uACR, eGFR change), whereas logistic regression was used for binary kidney traits (CKD, MA). Multiple testing was accounted for using a Bonferroni correction considering the total number of investigated proteins at each stage (k51095 in discovery).
Sensitivity analyses in the discovery sample included regression models with serum creatinine-based eGFR as an outcome and no adjustment for BMI or diabetes, and models including cystatin C-based eGFR as outcome and the same set of covariates from the main model. Pearson's correlations between the regression coefficients resulting from the sensitivity and the main analyses were calculated. Interaction analyses were also conducted for the proteins identified at discovery by adding an interaction term (each of age, sex, and smoking status individually) to the fully adjusted model (Supplemental Note 3). For those protein-outcome pairs significantly associated in the discovery, two replications were conducted: a European replication (R1) and a replication in an admixed population (R2) ( Figure 1A). Replication was defined as P,0.05 and consistent direction of effect as in the discovery study. The European replication for eGFR consisted of the meta-analysis of results from HUNT and INTERVAL using Stouffer's method, a P value combination method especially useful when raw data cannot be pooled across studies, which is the case with aptamer-based measurements, where data in relative fluorescence units is not directly comparable across studies. Also known as "inverse normal" or weighted Z-test, this method takes the P values for the i-th study (pi), transforms them by using the inverse normal transformation, and weights them according to the square root of the sample sizes (wi). The sum is then computed, and the combined P value is obtained using the distribution of the resulting statistic, T5SwiH(pi). 36 For CKD, only the HUNT study was used in the European replication (INTERVAL had only one patient), and the admixed population replication was based on the results of QMDiab. Our final set of transethnic associations (R3) were those pairs of proteins-outcomes that were replicated in both R1 and R2. Replicated eGFR-associated proteins were taken to the next stages of the analysis: proteomic target validation, enrichment analyses, and MR.

Validation of Proteomic Targets
We examined the plasma levels of proteins measured using Proximity Extension Assay (PEA) technology (Olink) in a subgroup of randomly selected participants from the KORA F4 study (n5173). 37,38 In brief, protein abundance was quantified using real-time PCR in the PEA proteomic technology (Olink), producing relative quantification data in NPX units (normalized protein expression levels, on log2 scale); NPX values were intensity normalized with the plate median for each assay as the normalization factor, and samples and proteins that did not pass QC were excluded. 38 Eight of the most relevant proteins (cystatin C, RELT, IGFBP-6, myoglobin, TNF sR-I, RGMB, FSTL3, contactin-4), and three of the proteins identified in the causal inference analysis (carbonic anhydrase 3, melanoma inhibitory activity [MIA], cystatin M) were available in this subset of proteomic measurements. Of note, testican-2 was not available for measurement using this technology. Scatterplots of the aptamer-based and PEA measurements, annotated with their Pearson's correlations and P values, are shown in Supplemental Figure 2. The lack of immunoassays fully validated for specificity, linearity, and possible interference for most of the measured analytes in SOMAScan (including testican-2) limits the investigation into the concordance between these two methods. 39 Information on specificity and crossreactivity of the aptamers was available from three independent studies 27,40,41 for 54 of the 57 proteins identified to be transethnically associated with eGFR. Target specificity issues (i.e., comparable binding observed to a target that is not the product of the same gene) were observed in four proteins (ephrin-A5, IGFBP-5, hemojuvelin, and cystatin SA) 27,40 (Supplemental  Tables 2-4). Moreover, in previous studies, 23 of the 57 proteins were directly validated via mass spectrometry in blood plasma/serum, and other biological matrices 41 (Supplemental Table 2), and 49 using solution affinity measurements 27,40 (Supplemental Tables 3-4).

Functional Annotation, Enrichment, and Expression Analyses
Annotation was conducted using the R package InterMineR v1.6.1, 42 a tool facilitating access to data from the HumanMine release 6.0 (May 2019). DAVID v.6.8 43 was used to look for annotations for Gene Ontology Terms (molecular function, biological process) and pathway information, and to identify publications relevant to the set of 57 replicated proteins. Gene information was retrieved from the human assembly GRCh37 (hg19) using BioMart v.4, 44 and shown in Supplemental Table 5.
To investigate the expression patterns of the 57 eGFRassociated proteins and their corresponding protein coding genes across tissues, we used proteomics and RNA-seq expression data from the ProteomicsDB 45,46 and the Genotype-Tissue Expression (GTEx) database. 47 The data presented and described in this manuscript were generated on October 2, 2020 through a multigene query on the ProteomicsDB Analytics Toolbox portal from: https://www.proteomicsdb.org/ proteomicsdb/#analytics/expressionHeatmap and GTEx portal https://www.gtexportal.org/home/multiGeneQueryPage.

PPI Network Analysis
We queried STRING, 48 the PPI server, to examine the relationship between the proteins that were identified as robustly associated with eGFR across studies and ethnicities (k557 transethnically eGFR-associated proteins). We used the set of SOMAscan proteins available across studies as background (k5993), adding no additional interactors (proteins) to the network during the analyses, and considered a minimum required interaction score for a medium confidence (0.400) (Supplemental Note 4).

MR
MR, an instrumental variable method used to infer causality, leverages the natural randomization inherent in the (random) assortment of genes during gamete formation to assess the effect of lifelong exposures on health outcomes. 49 Singlenucleotide polymorphisms (SNPs) are used as instrumental variables (IV; or instruments), given their alleles are randomly assigned to individuals before any exposures/outcome and they are nonmodifiable, thus minimizing the risk of reverse causation and confounding. 49 The idea behind MR is that if genetic variation produces differences mirroring the biological effects of environmental exposures that alter disease risk, then genetic variation itself should be related to disease risk by having an influence on the exposure. 49,50 MR uses SNPs as surrogates for an exposure of interest, allowing the estimation of the effects of life-long, genetically determined "exposures" on health outcomes. 49 MR produces robust causal inference estimates if the SNPs used are valid instrumentsthat is, if they meet the three assumptions on which MR relies: SNPs must be strongly associated with the exposure, and not associated with either (measured or unmeasured) confounders or with the outcome except potentially through the exposure. 51 Causality in MR is thus defined as the modification of an exposure leading to a change in the outcome, where the inferred causal effects by MR do not necessarily imply the existence of a straightforward interpretation with respect to direct causal factors, 50 nor do they offer information on the time interval (e.g., during development) or target tissue in which such modification of the exposure or intervention would need to be delivered. 52 To investigate whether a genetic liability to lower or higher eGFR causally alters plasma protein levels or vice versa, MR was conducted in the set of 57 proteins whose associations with eGFR showed transethnic replication. Two-sample bidirectional MR 19 was used to infer the causal effect of renal function (eGFR as proxy thereof) on plasma protein levels (forward MR) and vice versa (reverse MR, Figure 1B). Results from publicly available genome-wide association studies (GWAS) for (1) eGFR from the CKDGen consortium (meta-analysis of European-ancestry populations), 53 and (2) plasma proteins from INTERVAL 27 and Age, Gene/Environment Susceptibility-Reykjavik Study (AGES-Reykjavik) 41 were used to perform MR using MRBase. 54 A detailed account on the MR methods, data sources, and analyses conducted is available in Supplemental Note 5.

Instrument Selection
In the forward MR (i.e., assessing the effect of renal filtration on protein levels), 256 SNPs associated with eGFR at genome-wide significance in the CKDGen results were selected as candidate IV. These SNPs were then filtered based on their relevance to renal function (associated with BUN, a complementary renal trait, with an opposite direction of effect, N IV 547) and clumped based on linkage disequilibrium (r250.01 and Kb510,000) to identify independent variants (N IV 541). Summary statistics on 41 SNP-eGFR associations were extracted from the CKDGen results, and corresponding SNP-protein associations were extracted from the INTERVAL results for 47 proteins. For investigating the causal effects of eGFR on proteins, 47 eGFR-protein relationships were instrumented by 41 SNPs.
For the reverse MR (i.e., interrogating the causal effect of proteins on renal filtration), gene positions (GChr37, Supplemental Table 5) were used to identify genome-wide significant cis-SNPs for 28 proteins in the INTERVAL results as candidate IV and LD clumped (same criteria as forward MR). Summary statistics on SNP-protein associations for 28 proteins were extracted from the INTERVAL results, and its corresponding SNP-eGFR associations were extracted from the CKDGen results. The same strategy was followed to identify instruments in the AGES-Reykjavik results; SNP-protein results were extracted from this dataset for 29 proteins, and SNP-eGFR results were extracted for 26 proteins from the CKDGen data. Further details of the genetic instrument selection and data harmonization process are shown in Supplemental Figure 3 and Supplemental Table 6. Thus, for investigating the causal effect of proteins on eGFR, 35 protein-eGFR relationships were instrumented by 1-5 SNPs, of which 17 proteins were examined using data from both INTERVAL and AGES-Reykjavik (Supplemental Figure 4).

Data Harmonization, Phenotypic Variance Explained, and Instrument Specificity
Details on data harmonization, the handling of palindromic SNPs, and calculating the phenotypic variance explained by the SNPs are given in Supplemental Note 5. Harmonized datasets used in the MR analyses are available in Supplemental Table 7.
To look for further evidence of horizontal pleiotropy, association between our SNPs and other traits were searched for in the GWAS Catalog 55 (Supplemental Table 8).

MR and Sensitivity Analyses
The primary MR analysis used inverse variance weighted (IVW) regression. In this method, the coefficient of the gene-outcome association is regressed on the coefficient of the gene-exposure association with the intercept constrained to zero, assuming no directional pleiotropy. 56,57 Because IVW requires two or more SNPs, in patients where only one SNP instrumented the analysis, Wald's ratio (coefficient of the gene-outcome association divided by the gene-exposure association) was calculated whenever only one SNP instrumented the analyses, as IVW MR requires two or more SNPs. 57 For MR analyses instrumented by more than two SNPs, three further MR methods were used as sensitivity analyses. 58 MR-Egger regression was used to assess pleiotropy, because this method allows for horizontal pleiotropy and provides an estimate of the unbalanced horizontal pleiotropic effects in its intercept. 59 Weighted median 60 and weighted mode MR, 61 methods less sensitive to the presence of invalid instruments and to pleiotropic SNPs behaving as outliers, were also used. A number of additional analyses were run to check for outliers, directional pleiotropy and heterogeneity, as recommended. 58,62 Details are given in Supplemental Note 5.
Causal estimates were assessed at a Bonferroni-corrected significance level, namely, 0.05 divided by the number of proteins assessed in each MR direction (47 in forward and 51 in reverse MR). Causal effects were considered robust if they were significant at Bonferroni P,0.05 in the IVW or Wald estimator, and results from the pleiotropy-robust sensitivity MR analyses examined to test for violations to MR assumptions.

Expression Analyses in Human Kidney Tissue
The correlation between expression of SPOCK2, one of the genes coding for proteins showing evidence for a causal relationship with eGFR, was calculated with data from microdissected tubulointerstitial components of human renal biopsies from 26 individuals with CKD at different disease stages (I-IV) 63 (GEO accession: GSE69438). Gene expression of the protein-coding genes identified in MR (SPOCK2, CA3, CST6, MIA) and renal traits was further assessed in (1) data from Nephroseq v5 (n5458), a platform of comprehensive kidney disease gene expression datasets, 64 and (2) a human kidney tissue resource based on RNA sequencing (n5427, see Supplemental Note 6). 1 Within Nephroseq, univariate correlation analyses between eGFR and gene expression were conducted separately in study-defined histological compartments of the human kidney (i.e., glomerular and tubulointerstitial) in 458 available kidney samples from three datasets of patients with kidney disease ( . 67 The correlations were meta-analyzed using inverse variance weighted random effects models, 68 and heterogeneity was assessed using Cochran's Q test. Multivariable regression analyses were conducted in the human kidney resource (n5427). 1 In brief, we constructed linear regression models with renal expression of each candidate as the response variable; whereas eGFR and histologically confirmed measures of structural kidney damage were used as independent variables together with age, sex, BMI, three genetic principal components, diabetes, and a variable number of surrogate variables (29 for eGFR and 26 for all histology phenotypes). 1,69 eGFR was based on circulating levels of creatinine, as reported before. 1 Histologic measures of structural integrity (glomerular sclerosis, glomerular Bowman's capsule thickening, tubular atrophy, interstitial fibrosis, interstitial inflammation, and vascular lesions) were assessed microscopically and scored on a semiquantitative scale (whereby 0 indicates no or minimal damage and 3 is consistent with the highest degree of structural injury), as reported previously. 70 Figure 1 illustrates the design of this study. First, a crosssectional association study was performed to identify proteins associated with renal function parameters in a discoveryreplication setting: KORA F4 acted as the discovery, and INTERVAL, HUNT3, and QMDiab as replication studies ( Figure 1A). Replicated transethnic protein associations were then assessed for causality using two-sample MR, using data from the largest GWAS available for the traits of interest (CKDGen, INTERVAL, and AGES-Reykjavik) ( Figure 1B).

CROSS-SECTIONAL ASSOCIATION OF PLASMA PROTEINS AND RENAL FUNCTION
Population characteristics of the four cohorts included in the cross-sectional association study are shown in Table 1.
Sensitivity analyses showed that 71 of the 80 eGFR-associated-proteins identified in the main analysis were consistently associated with cystatin C-based eGFR, with a high correlation between regression coefficients (r50.841, P,0.001). Models with no adjustment for BMI or diabetes produced highly similar estimates to those obtained in the main analysis (r50.99, P,0.001 for both; Supplemental Table 9). Likewise, the exclusion of individuals with CKD (n538) did not significantly affect the correlation between the plasma levels of proteins and eGFR (Supplemental Figures 6 and 7). Interaction analyses suggested associations with five plasma proteins were accentuated with age (Supplemental Table 10).
Additional renal outcomes were assessed in the discovery cohort: eGFR change was associated with five proteins, three proteins were associated with uACR, and no proteins were associated with MA (Supplemental Table 11).

Results from Replication Studies
Serum creatinine was the only available trait across all replication cohorts ( Figure 1A), thus only associations with eGFR/ CKD were further explored.
The European replication (R1) was conducted using HUNT3 and INTERVAL; results from this analysis confirmed the association of 62 of the 76 proteins available across studies (Supplemental Figure 5). The second replication round (R2) was performed in QMDiab, a population of admixed ancestry; www.jasn.org CLINICAL EPIDEMIOLOGY this confirmed 65 eGFR-protein associations (Supplemental Figure 5). High correlations between z values from the discovery study and replication studies were observed (correlations ranging from 0.66 with INTERVAL to 0.93 with HUNT3, Supplemental Figure 8). The overlap of the proteins replicated in R1 and R2 produced the set of 57 robustly replicated transethnic eGFR-protein associations (Supplemental Table 12). Figure 2 shows the cross-sectional effect estimates for the top 10 protein-eGFR associations across the four cohorts; INTERVAL, a largely healthy and younger population, showed the smallest effect sizes, whereas the strongest effects were observed in HUNT3, a cohort of older individuals with lower mean eGFR and higher CKD prevalence. One novel protein, contactin-4, was identified. All 57 proteins were replicated in the cystatin C-based eGFR sensitivity analysis (Supplemental Table 9).
All 34 CKD-protein associations from the discovery phase were replicated in HUNT3 and 23 in QMDiab; these 23 were thus considered transethnically robust (Supplemental Table  13). Figure 3 shows the overlap between proteins associated with eGFR/CKD.

Functional Annotation, Enrichment and Expression Analyses
Several pathways, biological processes and molecular functions were represented in the set of replicated proteins (Supplemental Table 14). No enrichment was observed, perhaps due to the coverage of the analytical platform. 10,30,71 Peptides for most of the replicated proteins were detected in multiple tissues and body fluids, including kidney tissue (Supplemental Figure 9). Most genes showed ubiquitous expression across the human tissues represented in the Pro-teomeDB (Supplemental Figure 10) and GTEx datasets (Supplemental Figure 11).

PPI Network Analysis
We queried STRING to examine PPIs in the set of 57 replicated proteins. More interactions than expected were observed in the resulting network (P51.12E-04, Supplemental Figure 12). Because the inclusion of proteins in the main PPI analysis was conditional on the platform's coverage and study design, less stringent sensitivity analyses allowing for the inclusion of additional proteins connected additional nodes (e.g., SPOCK2 or MIA) not connected in the main analysis to the network (Supplemental Note 4).

MR
To assess whether genetically determined higher or lower plasma levels of the 57 proteins identified as transethnically associated with eGFR may affect this renal trait, and whether genetically determined eGFR causally alters circulating levels of plasma proteins, two-sample bidirectional MR was conducted ( Figure 1B).

Forward MR: eGFR Has an Effect on Testican-2
In the forward direction of the MR (i.e., inferring the effect of eGFR on levels of 47 proteins), 40 SNPs explaining 1.59% of the variance of eGFR were used as instruments (Supplemental Table 15). Plasma levels of seven proteins were identified as causally affected by eGFR according to the IVW MR model (Supplemental Table 16 and Supplemental Figures 13-19). Although no evidence of directional pleiotropy, influential SNPs, or instrument heterogeneity was observed (with the exception of IGFB6, Supplemental Table 17-19), pleiotropyrobust sensitivity MR analyses did not provide further evidence of causality for six of them, suggesting IVW findings may be driven by undetected horizontal pleiotropy. 72 In contrast, a positive causal effect of eGFR on testican-2 was identified by multiple MR methods (weighted median P52.84E-04, Figure 4). Assuming this robust evidence reflects a true relationship, the results suggest that if eGFR is altered by an intervention mimicking the effect of the SNP on eGFR, plasma levels of testican-2 will also increase.
In total, 11 of the SNPs instrumenting the forward MR analysis were identified as potentially pleiotropic (associations at P,5E-08 with other traits, Supplemental Table 8). Restrictive MR was conducted excluding these SNPS; although not significant, perhaps due to reduced statistical power derived from using fewer SNPs as instruments and/or the exclusion of SNPs that might be on the actual causal pathway of interest, these results were in agreement with those from the main analysis (same direction and size of effect, Supplemental Table 20).

Reverse MR: MIA, Cystatin M, and Carbonic Anhydrase III Affect eGFR
In the reverse direction of the MR (i.e., assessing the effect of 35 proteins on eGFR), one to five cis-SNPs explaining 0.91%-29.33% of phenotypic variance were used as genetic instruments (Supplemental Table 15).
A negative effect of MIA on eGFR was identified by multiple MR models ( Table 2), results that suggest if plasma protein levels are lowered by means of an intervention mimicking the effect of the SNP on MIA, eGFR will increase. No evidence of influential SNPs was observed (Supplemental Tables 17-19), yet the funnel plot suggested directional pleiotropy (Supplemental Figure 20). One SNP instrumenting this analysis was identified as potentially pleiotropic (Supplemental Table 8).
Positive effects of carbonic anhydrase III and cystatin M on eGFR were also identified (Wald's ratio P55.04E-04 and 8.41E-05, respectively) ( Table 2). Although further sensitivity analyses could not be conducted given the availability of one . Results from the transethnic discovery-replication observational study. Depicted in the left circle are the 57 proteins associated with eGFR, the continuous measurement of renal function; the 34 eGFR-specific proteins reflect associations along the full range of renal function, whereas the 23 proteins also associated with CKD reflect a direct association with a clinically relevant low eGFR (,60 ml/min per 1.73m 2 ). Shown in bold is contactin-4, a novel protein identified by this study, and underlined are the four proteins for which evidence on causal effects was identified by MR.
www.jasn.org CLINICAL EPIDEMIOLOGY cis-SNP for each protein, no gene-trait associations were found for these SNPs in the GWAS Catalog, 55 indicating a lack of evidence for pleiotropic effects. MR estimates obtained with different pGWAS data (see Methods, Supplemental Figure 4) had the same direction of effect (Supplemental Table 16). Of note, no statistically significant effect of testican-2 on eGFR was identified (Supplemental Table 16).

DISCUSSION
We conducted a cross-sectional association study of plasma proteomics and eGFR/CKD in four independent cohorts, identifying known and potential novel biomarkers. Twosample bidirectional MR suggested the existence of causal effects in four eGFR-protein associations.
A total of 80 proteins were associated with eGFR in our discovery analysis, with transethnic replication confirming 57 of these; 23 were also found to be associated with CKD. Although our analyses use serum creatinine-based eGFR due to its availability across studies and its utility in clinical practice, models using cystatin C-based eGFR show the results are robust to the GFR estimation method. Likewise, further sensitivity analyses indicate the associations are largely independent of adjustment for BMI or diabetes.
Additional analyses in the discovery cohort produced associations between eGFR decline and DAN, TNF sR-1 and FSTL3, in line with those previously reported. 18 No overlap between the set of proteins associated with uACR and the proteins previously reported 73 was observed, which may be explained by the different time points of the eGFR and albuminuria measurements. 73 No proteins were significantly associated with MA, possibly due to its low prevalence (5.9%) in KORA.
We identify several well-known biomarkers of renal function, 10,16,17,74,75 supporting the validity of our eGFR analyses. Our results are also in line with previous proteomic studies on eGFR, 10,[16][17][18] inflammation in ESKD, 75 and a "standalone" renal health test 40 (Supplemental Table 12). Contactin-4, involved in neuronal network development, was identified as a novel eGFR-associated protein. Its plasma (but not urine) detection suggests it is either not filtered at the glomerular capillaries, or filtered but later reabsorbed into blood from the tubules, so that variations in its plasma levels may reflect changes in glomerular and tubule function. 76,77 The ageinteraction effects identified for four proteins may be a consequence of their age-varying trajectories, 78 and require future investigation. The ubiquitous expression of our proteins across tissues, and our PPI network results, suggests the proteins are involved in cellular functions relevant to multiple tissues (Supplemental Table 14), 79 potentially mirroring the systemic nature of kidney disease. 80 Interestingly, podocyteexosome enrichment in urine identified 23 of our proteins, pointing to their participation in processes underlying glomerular filter permeability. 81 To investigate whether genetically determined renal function (using eGFR as a proxy thereof) or plasma protein levels may have a causal effect on the other, bidirectional MR with publicly available GWAS data was conducted. Our findings in the forward MR direction, supported by pleiotropyrobust sensitivity MR methods, identified a causal effect of eGFR on plasma levels of testican-2. Considering the MR definition of causality, these results suggest lifestyle or pharmacological interventions designed to improve eGFR (as a proxy of renal function) have the potential to increase plasma testican-2 levels.
Testican-2 is a secreted protein of the SPARC family, 82 a group of matricellular proteins regulating extracellular matrix-cell interactions and extracellular matrix processing, 83 and is involved in a number of biological processes (Supplemental Table 24). In line with our results, higher testican-2 plasma levels have also been associated with less eGFR loss over time and reduced odds of incident CKD. 18 Its protein-coding gene, SPOCK2, is associated with both normal maintenance of organ and tissue integrity, and with wound healing and other responses to injury. 85 Despite the ubiquitous expression of this gene, its enriched expression in human glomeruli in comparison to other nonrenal tissues 18,86-89 and its renal downregulation in diabetic kidney disease 86 suggest it has a particularly relevant role in renal cellular mechanisms. Furthermore, recent evidence from arteriovenous sampling demonstrated renal release of this protein into the bloodstream, 18 which in addition to its urine detection, 84 suggests changes in its plasma levels may be indicative of kidney function. 18,76 However, the contribution other organs might have in its plasma levels cannot be ruled out.
Although the association between eGFR and SPOCK2 renal expression did not reach statistical significance in some of our analyses (which may be partly explained by low statistical power in the dataset with 26 patients with CKD), the directionality of the coefficient (i.e., positive association) was consistent across datasets. Moreover, multivariable analyses showed higher scores of histologic measures of renal structural damage to be negatively associated with SPOCK2 renal expression, consistent with prior evidence. 86 All in all, the agreement between the cross-sectional results reported by us and by others, 17,18,40 and our MR findings and the associations with histologic measures, indicate testican-2 and its protein-coding gene SPOCK2 may have an important role in kidney function. Low plasma levels of testican-2 may be indicative of poor renal function, meaning this protein may be a physiological biomarker of kidney health and disease progression. 17,18 Although the reverse direction of this causal association did not reach statistical significance in our MR analyses, a reverse causal effect (i.e., testican-2 on renal function) is biologically plausible given the role matricellular proteins play in extracellular matrix repair 83,90 and its in vitro effects on human glomerular endothelial cells. 18 The utility of testican-2 as a biomarker with regard to its potential functional effects, tissue of origin, or the mechanisms influencing its blood levels, requires further study.
Three proteins (MIA, cystatin M, and carbonic anhydrase III) were identified as potentially having a causal effect on eGFR, effects biologically plausible given their known roles (Supplemental Table 24). Nevertheless, the precise mechanisms through which these proteins could be exerting effects on eGFR remain to be elucidated. Discordant directions of effect from the observational and the causal estimates (cystatin M, carbonic anhydrase III) could be explained due to differences in sample size/characteristics, reverse causation, or confounding in the case of the observational estimates, or due to limitations inherent to the MR methods. 91,92 A further explanation might be that they represent different effects: MR examines lifelong exposures to higher or lower protein levels, whereas results from observational studies could be reflective of acute or short-term effects. 92 The strengths of our study include the use of a multiplex proteomics platform and large sample size. This is the first report of eGFR-protein associations adjusted for multiple potential confounders, replicated in independent samples of diverse ancestries, and assessed using causal inference. MR was conducted with the largest available GWAS results from nonoverlapping European ancestry populations, thus avoiding issues derived from population stratification and sample overlap. We reduced the possibility of horizontal pleiotropy by using GWAS summary statistics from a complementary renal trait (blood urea nitrogen) to improve the specificity of the genetic instruments for eGFR, by focusing on cis-SNPs, and by using multiple pleiotropy-robust sensitivity analyses.
Our study also has several limitations. Aptamer-based proteomic methods may be affected by probe cross-reactivity and nonspecific binding, 79 although the aptamer-based measurements of most of the reported proteins have been validated in multiple independent studies. 27,40,41 This platform does not produce absolute plasma concentrations or cover posttranslational modifications, limiting the interpretability of the regression coefficients and the scope of the studied plasma proteome. 79 Our findings are based on cross-sectional data, so studies examining their longitudinal changes are warranted. Despite the multiethnic nature of our study, our results may not extend to ethnic groups not represented in our analyses. We avoided weak instrument bias in MR, but cannot discount the possibility of having incurred selection bias in the case of the SNP-protein data. The sample size in which genetic associations with protein levels were calculated was significantly smaller than the sample used to identify genetic associations with eGFR, which likely resulted in differences in power. Finally, knowledge of the biological role of the proteins identified is insufficient for our findings to suggest mechanistic insights. A follow-up of our findings in appropriate experimental models would provide additional evidence on the inferred causal associations reported here and help to unravel the molecular mechanisms underlying our findings. Likewise, future validation studies using validated absolute quantitative assays with increased sensitivity for the detection of testican-2, and other proteins identified here, are warranted to establish reference ranges and to explore their suitability as prognostic and diagnostic biomarkers in clinical settings.
In summary, our transethnic population-based study of plasma proteomics and renal function identified multiple markers of kidney function. Our MR findings are a stepping stone in establishing testican-2 as a physiological marker of kidney disease progression, and further identify proteins warranting additional investigation. Our results may serve as the starting point for future translational work on the utility of these proteins as diagnostic or prognostic biomarkers of disease, and for research on mechanistic insights at the tissue and single-cell levels.

SUPPLEMENTAL MATERIAL
This article contains the following supplemental material online at http://jasn. asnjournals.org/lookup/suppl/doi: 10 Supplemental Table 5. Gene information from proteins included in MR. Supplemental Table 6. Genetic instrument selection and data harmonization (INTERVAL and AGES-Reykjavik). Supplemental Table 7. Harmonized summary statistics used in MR. Supplemental Table 8. Gene-trait information retrieved from GWAS Catalog for pleiotropic SNPs.
Supplemental Table 9. Sensitivity analyses with CysC-based eGFR, no adjustment for BMI and no adjustment for T2D.
Supplemental Table 10. Sensitivity analyses (interaction with age, sex, and smoking) in KORA F4.
Supplemental Table 11. Results from observational analysis of supplementary renal phenotypes (eGFR decline, log(uACR) and MA) in KORA F4.
Supplemental Table 18. Sensitivity analyses: Pleiotropy in MR-Egger. Supplemental Table 19. Sensitivity analyses: Leave-one-out analyses. Supplemental Table 20. Sensitivity analyses: Results from restrictive MR. Supplemental Table 21. Results from correlation analyses between gene expression and eGFR from Nephroseq datasets. Supplemental Table 22. Clinical characteristics of studies included in gene expression analyses.
Supplemental Table 23. Results from multivariate regression analyses on gene expression, eGFR and histological characteristic scoring from human kidney resource.
Supplemental Table 24. Description and biological roles of selected proteins.
Supplemental Figure 1. Correlation of serum creatinine variables in KORA F4.
Supplemental Figure 2. Correlation between aptamer-based and other measurements for proteins in KORA F4.
Supplemental Figure 3. Genetic instrument selection and data harmonization.
Supplemental Figure 4. Protein overlap in pGWAS datasets used in reverse direction of MR.
Supplemental Figure 5. Proteins and log(eGFR) distribution in discovery dataset.
Supplemental Figure 6. Cross sectional results for eGFR-protein associations across studies. Supplemental Figure 7. Proteins and log(eGFR) distribution in discovery dataset after CKD exclusion.
Supplemental Figure 8. Correlation between Z-values for eGFR-protein associations across studies.
Supplemental Figures 13-19. Forward MR results for effects of eGFR on proteins.
Supplemental Figure 20. Reverse MR analysis for MIA-eGFR. Supplemental Figure 21. SPOCK2 gene expression in renal tissue from 26 patients with CKD.