Introduction

Systemic onset juvenile idiopathic arthritis is a chronic inflammatory disease of childhood characterized by a combination of systemic features [fever, rash, serositis (e.g., pericarditis, pleuritis)] and arthritis. Current diagnosis of SJIA is based solely on clinical findings [1] and requires arthritis, daily fever for at least 2 weeks, and at least one of the following: evanescent erythematous rash, generalized lymph node enlargement, hepatomegaly and/or splenomegaly, or serositis. This makes early diagnosis of SJIA challenging, as its clinical manifestations are similar to other diseases, including malignancy, infection, Kawasaki disease (KD), and other autoimmune or inflammatory disorders. Long-term disease outcome in SJIA is variable. About 50% of patients experience a single episode that resolves. However, the other half experience either polycyclic or non-remitting disease.

Sensitive and specific diagnostic biomarkers for SJIA would allow its differentiation from other febrile illnesses, such as KD or acute infections (febrile illness (FI)), and facilitate prompt initiation of appropriate treatment at disease onset. Early treatment may reduce the risk of long-term complications and subsequent disabilities. In addition, biomarkers that distinguish intercurrent SJIA flare from infection in patients with known SJIA would be clinically useful, as would markers that predict impending disease flare or responder status to particular therapies, or provide an early indication of a treatment response. Finally, biomarkers may provide clues to unanswered questions concerning SJIA pathogenesis.

There have been several previous biomarker discovery efforts in SJIA. Initial studies, including ours [2], attempted to identify early clinical variables that predict long-term outcomes, such as joint damage or functional disability at ≥2 years after disease onset [36]. Studies of serum found elevated cytokines, chemokines, and acute-phase reactants in active SJIA [710]. More recently, transcriptional profiling of peripheral blood mononuclear cells from SJIA subjects with active disease revealed a signature of active SJIA that normalized in association with clinical response to treatment [11, 12]. A single SELDI-based analysis of plasma identified serum amyloid A as a plasma biomarker of disease activity [13]. However, all these efforts fall short of robust diagnostic and prognostic biomarkers with practical clinical utility.

We sought to explore urine as a source of biomarkers. Such markers would permit frequent tests, which would be of use, especially in children, for a chronic pediatric disease with a polycyclic course. A normal adult human excretes 30–130 mg of protein and 22 mg of peptides per day in urine [14, 15]. Urine proteomic analysis has identified more than 1,500 proteins including a large proportion of membrane proteins [16]. Urine peptidomic analysis revealed over 100,000 different peptides [17]. Our own in-depth 2D mass spectra (MS)/MSMS analysis led to the identification of 11,988 different urine peptide sequences from 8,519 unique protein precursors in normal human urine [18]. Recent reviews have indicated that analysis of the urinary proteome/peptidome can be highly informative for both urogenital and systemic diseases and used for disease classification [18, 19]. Naturally processed urine peptides have certain advantages over proteins as biomarkers. The roughly equal mass of protein and peptide in urine translates into at least a tenfold greater molar abundance of peptides. While the urine proteome contains a number of abundant proteins that obscure the lower abundance proteins more likely to be biomarkers, this problem does not complicate analysis of peptides in urine. A one-dimensional HPLC separation is sufficient for the analysis of greater than 25,000 urine distinct peptides.

Among the emerging quantitative proteomics technologies, isobaric tags for relative and absolute quantification (iTRAQ) allows concurrent protein sequence identification and relative quantification of those peptides with known protein sequences in up to eight different biological samples in a single experiment [20]. However, due to its limited throughput and current cost, iTRAQ is not feasible for simultaneous comparison of the large number of disease subjects needed to achieve the discovery of differential features with sufficient statistical power. As an alternative, a label-free liquid chromatography-mass spectrometry (LC-MS)-based approach has been applied as a quantitative biomarker discovery method. The label-free LC-MS approach can compare and quantify peptides with precision and accuracy comparable to those based on isotope labeling [21]. Although LC/ESI mass spectrometry is typically used in label-free quantitative proteomics, matrix-assisted laser desorption/ionization-time-of-flight (MALDI-TOF) mass spectrometry is increasingly being used and demonstrates low average coefficients of variation for all peptide signals across the entire intensity range in all technical replicates [22, 23]. Using the label-free LC/MALDI-TOF profiling approach, we previously discovered candidate urine peptide biomarkers of renal transplant rejection [19, 24]. Subsequent urine peptide biomarker validation [19] by multiple reaction monitoring (MRM) [25, 26] showed significant correlation between the urine peptide measurements obtained from label-free MALDI-TOF and from MRM using stable isotope-labeled synthetic marker analogues to derive absolute quantification.

The label-free LC-MALDI-TOF approach involves the comparison of urine peptidomes of different samples, and thus, multiple LC-MS spectra. However, comparing multiple LC-MS spectra in a label-free analysis is computationally intensive, demanding robust detection of LC-MS peaks, alignment of all LC-MS peaks, and determination of the common peak indices across all assayed samples. The output of data processing is essentially a P X N table in which each of the indexed P peptides has been quantified across the N studied samples. This table, reduced from LC-MS spectra of all samples, can be subjected to downstream statistical learning including transformation, normalization, and unsupervised/supervised analyses suited to the experimental design to mine for a differential subset of the P peptides, which will then be subjected to MSMS protein sequence identification and future quantitative prospective MRM [25, 26] or antibody-based validation.

We identified naturally occurring urine peptides with specificity for active systemic SJIA compared with other sources of fever. We hypothesized that SJIA flare is associated with increased levels of circulating mediators of inflammation that activate catabolic pathways leading to the generation of novel peptide biomarkers that are found in urine. We tested this hypothesis through global LC-MS analysis of urine and plasma peptides as well as targeted analysis of plasma proteins using antibody arrays.

Materials and Methods

Materials

The following reagents were used for the proteomics sample analysis: nanopure or Milli-Q quality water (~18 megohm cm or better); Amicon Ultra centrifugal filtration tubes were obtained from Millipore (Bedford, MA, USA) ammonium bicarbonate, ammonium formate, and formic acid were obtained from Fluka (St. Louis, MO, USA); Tris–HCl, urea, thiourea, DTT, iodoacetamide, calcium chloride, and TFA were obtained from Sigma–Aldrich (St. Louis, MO, USA); HPLC-grade methanol (MeOH) and HPLC-grade ACN were purchased from Fisher Scientific (Fair Lawn, NJ, USA); 2,2,2-trifluoroethanol was obtained from Aldrich Chemical (Milwaukee, WI, USA); and sequencing grade-modified trypsin was purchased from Promega (Madison, WI, USA). Sodium tetraborate, glycine, and picrylsuofonic acid were obtained from Sigma–Aldrich (St. Louis, MO, USA).

Samples

Informed consent was obtained from the parents of all patients and assent from all patients >6 years of age. This study was approved by the human subject protection programs at UCSD, UCSF, and Stanford University. Urine samples were obtained from two new onset SJIA disease (ND), 18 active systemic disease plus arthritis (SAF), nine SJIA with active arthritis (AF), 18 quiescent SJIA on medication (QOM), nine SJIA in remission off medication (RD), and ten healthy control (HC) from Stanford University Medical Center and UCSF. In addition, urine samples were obtained from 23 KD and 23 age-similar FI control patients evaluated for fever at Rady Children’s Hospital San Diego. All KD patients had fever and ≥4 of the five principal clinical criteria for KD (rash, conjunctival injection, cervical lymphadenopathy, changes in the oral mucosa, and changes in the extremities) or three criteria plus coronary artery abnormalities documented by echocardiography [27] All FI control, patients had naso- or oro-pharyngeal and stool viral cultures. Urine sample patient demographics are described in Tables 1 (SJIA) and 2 (KD and FI). Plasma samples included 25 SJIA flare (F), 14 SJIA (Q) for the training analysis, and 41 SJIA F and 11 Q for the “bootstrapping” testing analysis. Instead of in silico bootstrapping simulation, samples belonging to different visits of the same patient and even the same samples were assayed, i.e., “bootstrapped” experimentally, for testing. For the bootstrapping testing, a total of 52 SJIA samples were analyzed by the antibody array, where 41 samples were from 19 patients at the time of SJIA flare, and 11 samples were from eight patients at the time of SJIA quiescence. Plasma sample patient demographics are described in Table 4.

Table 1 Patient characteristics—SJIA patient
Table 2 Patient characteristics—KD and FI patients

Urine Peptidome Preparation for MALDI Analysis

Urine samples (5–10 mL) were collected in sterile tubes and held at 4°C for up to 48 h before centrifugation (2,000×g × 20 min at room temperature) and freezing of the supernatant at −70°C. Urine processing, preparation of peptides, extraction, and fractionation are as previously described [18, 19, 24]. Urinary samples were processed by centrifugal filtration at 3,000×g for 20 min at 10°C through Amicon Ultra centrifugal filtration devices (10 kDa cutoff; Millipore, Bedford, MA) pre-equilibrated with 10 ml Milli-Q water. The filtrate (urine peptidome) containing the low MW naturally occurring peptides was processed with Waters Oasis HLB Extraction Cartridges (Waters Corporation, Milford, MA) and extracted with ethyl acetate. The resulting urine peptide samples were quantified by the 2,4,6-trinitrobenzenesulfonic acid (TNBS) assay, as previously described [28]. Three nanomoles peptides were injected on a 100 μm × 15 cm C18 reverse-phase column (Michrom) and eluted with a gradient of 5% to 55% acetonitrile over 50 min using a Michrom MS4 HPLC. Twenty-second fractions were collected onto MALDI targets with a Probot fraction collector (LC Packings). A total of 100 fractions were collected and analyzed on 4700 MALDI-TOF/TOF (Applied Biosystems) in MS mode. One microliter of matrix solution containing 4.8 mg/ml α-cyano-4-hydroxycinnamic acid (Agilent Technologies, Palo Alto, CA, USA) and 30 fmol/μl glu-fibrinopeptide (Sigma–Aldrich, St. Louis, MO, USA) was automatically deposited by the Probot on each spot.

Plasma Peptidome Preparation for LTQ-FTICR Analysis

The plasma peptidome preparation protocol was adapted from that of the urine peptidome analysis. The plasma samples were centrifuged at 3,000×g for 20 min at 10°C through Amicon Ultra centrifugal filtration devices (10 kDa cutoff; Millipore, Bedford, MA) pre-equilibrated with 10 ml Milli-Q water. The retentate (plasma proteome) was washed twice, brought to the final volume of 400 μl with 20 mM Tris–HCl (pH 7.5), and quantitated by the BCA protein assay (Pierce, Rockford, IL). The filtrate (plasma peptidome) containing the low MW naturally occurring peptides was processed with Waters Oasis HLB Extraction Cartridges (Waters Corporation, Milford, MA), and extracted with ethyl acetate. The resulting plasma peptide samples were quantified by the TNBS assay, as previously described [28]. Three nanomoles of peptides were fractionated by two-dimensional chromatography—a SCX column as the first and a RP column as the second dimension, and then subjected to extensive MSMS sequence identification involving a Thermo Finnigan LTQ-FTICR spectrometer.

Urine Peptidomic MS Label-Free Data Analysis

The ABI 4700 oracle database MS spectra were exported as raw data points via ABI 4700 Explorer software version 2.0 for subsequent data analyses. The m/z ranges were from 800 to 4,000 with peak density of maximum 30 peaks per 200 Da, minimal S/N ratio of 5, minimal area of 10, minimal intensity of 150, and 200 maximum peaks per spot. We previously had developed an informatics platform [18] which contains an integrated set of algorithms, statistical methods, and computer applications, to allow for MS data processing and statistical analysis of LC-MS-based urine peptide profiling. The MS peaks were located in the raw spectra of the MALDI data by an algorithm that identifies sites (mass-to-charge ratio, m/z values) whose intensities are higher than the estimated average background and the ~100 surrounding sites, with peak widths ~0.5% of the corresponding m/z value. To align peaks from the set of spectra of the assayed samples, we applied linkage hierarchical clustering to the collection of all peaks from the individual spectra. The clustering, computed on a 24-node LINUX cluster, is two dimensional, using both the distance along the m/z axis and the HPLC fractionation time, with the concept that tight clusters represent the same biological peak that have been slightly shifted in different spectra. We then extracted the centroid (mean position) of each cluster, to represent the “consensus” position as the peak index (bin) across all spectra. The normalization of the MALDI-TOF signal intensity for each peptide feature was performed at two steps: (1) within each LC fraction (MALDI plate spot), all peptide peak signal intensities were normalized to the externally spiked reference peptide (30 fmol/μl glu-fibrinopeptide) at each MALDI plate spot; (2) each clustered peptide, with unique m/z and LC fraction time, was normalized to the total signal intensity of all the clustered peptides within the same sample.

MS/MS Analysis for Peptide Biomarkers

For urine peptidome analysis, we used the approach of ion mapping [29, 30], whereby biomarker candidate MS peaks were selected on the basis of discriminant analysis, and then targeted for MS/MS sequencing analysis to obtain protein identification.

Extensive MALDI-TOF/TOF and LTQ Orbitrap MS/MS analyses coupled with database searches [29, 30] were performed to sequence and identify these peptide biomarkers. The identity of a subset of peptides detected was determined by searching MS/MS spectra against the Swiss-Prot database (June 10, 2008) restricted to human entries (15,720 sequences) using the Mascot (version 1.9.05) search engine. Searches were restricted to 50 and 100 ppm for parent and fragment ions, respectively. No enzyme restriction was selected. Since we were focusing on the naturally occurring peptides, hits were considered significant when they were above the statistical significant threshold (as returned by Mascot). Selected MS/MS spectra were also searched by SEQUEST (BioWorks™ rev.3.3.1 SP1) against the International Protein Index human database version 3.5.7 restricted to human entries (76,541 sequences). mMASS, an open source mass spectrometry tool (http://mmass.biographics.cz/), was used for manual review of the protein identification and MS/MS ion pattern analysis for additional validation.

Customized Antibody Analysis

Customized antibody arrays, consisting of pairs of capture and detection antibodies against 43 proteins, were utilized to profile SJIA plasma samples. These 43 proteins include three groups of different molecular functions: (1) chemokines and cytokines: CCL2 (MCP-1), CCL5 (RANTES), CCL7 (MCP-3), CCL8 (MCP-2), CCL11 (Eotaxin), CX3CL1 (Fractalkine), CXCL12 (SDF-1), IGF1, IFNG, IGFBP3, IGFBP4, IGFBP6, IL-1A, IL-1B, IL-1R1, IL-2, IL-4, IL-5, IL-6, IL-7, IL-8, IL-10, IL-12A, IL-12B, IL-13, IL-15, IL-17, IL-18, MIP-1α, TNF, TNFRSF11B; (2) protein catabolism regulators: TIMP1, TIMP2, MMP2, MMP2/TIMP2, MMP9, MMP10; (3) cell surface molecules involved in leukocyte adhesion: E-Selectin, L-Selectin, P-Selectin, ICAM1, and VCAM1. Antibody array fabrication, processing, data extraction and analysis were performed as previous described [31].

Statistical Analysis

Patient demographic data were analyzed using “Epidemiological calculator” (R epicalc package, version 2.10.0.0). The binned LC-MALDI MS peak data obtained for all urine peptidome samples were analyzed for discovery of discriminant biomarkers using algorithms [32] of nearest shrunken centroid (NSC) for biomarker feature selection, tenfold cross-validation analyses, and Gaussian linear discriminant analysis (LDA) for classification analyses. To control the number of false significant features found during NSC mining, we permuted the data set 500 times to calculate GFDR [33]. To quantify the difference between classes for the identified peptide biomarkers, Student’s t test and Mann–Whitney U tests were used for hypothesis testing, and local false discovery rate (FDR) [34] tool was used to correct multiple hypothesis testing. In order to test whether the selected discriminated features could serve as a diagnostic biomarker panel, a logistic regression model was used to find a linear combination of the biomarkers that minimizes the total classification error.

In order to avoid bias in data sets, we utilized a bootstrapping technique to bootstrap 500 times to evaluate the impact of the data construction on overall classification performance of the biomarker panel. For each of the bootstrapping sets, we used the LDA-derived prediction scores for each sample to construct receiver operating characteristic (ROC) curves [35, 36]. To summarize the results, the vertical average of the 500 ROC curves was plotted, and the boxes and whiskers were used to describe the vertical spread around the average.

Results

SJIA, KD, and FI Sample Collection and Patient Characteristics

We collected 56 intraday urine samples from pediatric SJIA patients at two sites, Stanford and UCSF (Tables 1, 2, and 3). These included patients with ND (n = 2), SAF (n = 18), AF (n = 9), SJIA quiescence (inactive disease on medication; QOM, n = 18), SJIA remission (inactive disease off all medications) (RD, n = 9). For comparison, samples from subjects with KD (n = 23), and acute FI (n = 23) (Table 2; collected at UCSD), and healthy, age-matched controls (HC, n = 10, collected at Stanford) were collected.

Table 3 Patient characteristics—Student’s t test significant analysis (P value)
Table 4 SJIA patient (flare and quiescent plasma samples) characteristics—patient demographics
Table 5 SJIA patient (flare and quiescent plasma samples) characteristics—Student’s t test significant analysis (P value)

We also collected 66 plasma samples from pediatric SJIA patients (Tables 4 and 5). These included patients with SAF (n = 25) and (n = 14). If available, plasma samples from multiple visits, considered as experimentally “bootstrapped” samples, of the same SJIA patient at different disease states were also collected for confirmatory analyses using bootstrapping. Thirteen patients provided both urine and plasma samples.

As expected, based on known differences in demographics [37], there were differences in the age and gender distribution of our SJIA and the KD and FI urine subjects. Except for ND patients (median age, 3 years; range, 1–5 years), the SJIA patients (SAF: median age, 12.5 years; range, 3–17 years; AF: median age, 13 years; range, 11–16 years; QOM: median age, 13 years; range, 5–17 years; RD: median age, 14 years; range, 6–21 years) are older than KD (median age, 3 years; range, 1–10 years) and FI (median age, 2 years; range, 1–10 years) patients. Except for ND patients (100% male), there are fewer male SJIA patients (SAF, 33% male; AF, 33% male; QOM, 39% male; RD, 22% male) than KD (82% male) and FI (61% male) patients. As expected, active SJIA patients differ significantly from inactive patients for variables reflecting active inflammation. Clinical parameters indicative of systemic disease activity (i.e., ESR (erythrocyte sedimentation rate), C-reactive protein (CRP), white blood cell count (WBC), platelets (PLT)) were increased in ND and SAF groups compared with the subjects in the AF, QOM, and RD groups (Tables 1, 2, and 3). Among patient groups with systemic inflammatory conditions, SJIA (ND and SAF) and KD patients have increased values for ESR compared with FI patients, and the SJIA patients (ND and SAF) have higher CRP values than either KD or FI groups (ESR, ND: mean of 54, SAF mean of 52, KD, mean of 51, FI, mean of 31; CRP, ND: mean of 29.4, SAF mean of 28.05, KD, mean of 7.4, FI, mean of 3.39). However, active SJIA patients, KD patients and FI patients do not differ significantly (Student’s t test, P value > 0.01) in WBC, ESR, CRP, and PLT, respectively. Demographic analysis analyses were also performed for SJIA F and Q plasma samples. There were no significant differences (Student’s t test, P value > 0.01) of age and gender but significant differences (Student’s t test, P value <0.01) of WBC, ESR, and PLT (CRP, insufficient data for analysis) between F and Q in both training and bootstrapping testing patient samples (Table 5).

Discovery of a Biomarker Panel of 17-Urine Peptides Indicative of SJIA Systemic Flare

Mass spectrometry-based urine peptidomics analyses suffer from two major sources of variance [18]: analytical issues including mass spectrometric ion suppression; and biological issues including dilution of urine by different hydration states of the urine donors. To standardize amount of urine peptides for comparative analysis, we have quantified each extracted urine peptidome by the TNBS assay [28] and 3 nmol of peptides were subjected to the downstream LC-MALDI-TOF profiling analysis. The initial step in our biomarker discovery effort was to collect urine peptide spectra by LC-MALDI-TOF profiling from the 56 urine samples. The MS spectrum of each HPLC fraction was analyzed by “MASS-Conductor” software (Ling, unpublished), which extracts peaks from raw MALDI spectra, enables common peak alignment, generates consensus representative peaks across all spectra via two-dimensional hierarchical clustering of both mass/charge and the HPLC fractions, and normalizes peak signal measurements.

To discover an SJIA systemic flare signature, the urine spectra of the subjects with systemically active disease, SJIA ND (n = 2), and SAF (n = 18) patients, were compared simultaneously to the non-systemic group of SJIA AF (n = 9) and the QOM (n = 18) patients and the other systemic inflammation groups: KD (n = 23) and FI (n = 23). The data mining process included selection of the discriminative urine peptides, supervised classification, bootstrapping, and ROC analysis, as outlined in Fig. 1.

Fig. 1
figure 1

Schematic of the experimental design to discover an SJIA systemic flare urine peptide signature. Long-term goals: Aim #1, identification of diagnostic urine peptide profile that distinguishes new onset SJIA patients from other systemic inflammatory states, including Kawasaki disease (KD) and febrile illness (FI). Aim #2, prediction of impending flare during quiescent periods of SJIA

Classifier discovery and feature selection by the NSC algorithm [32] were performed using all the features in the data set. NSC algorithm iteratively shrinks the standardized class mean of the abundance for each peptide. Eventually all urine peptides were ranked by the difference between the shrunken class means. Tenfold internal cross-validation analysis and LDA led to the discovery of a biomarker panel of 17-urine-peptide biomarkers effectively differentiating SJIA flare (ND and SAF) from contrasting group of AF, QOM, RD, KD, and FI samples. Extensive MALDI-TOF/TOF and LTQ Orbitrap MS/MS analysis coupled with database searches [29, 30] were then performed to identify these peptide biomarkers.

As shown in Tables 6 and 7, the 17-peptide biomarkers were found to be degradation products of eight different proteins: alpha1 antitrypsin (A1AT, two peptides having overlapping sequences), collagen type I alpha 1 (COL1A1; five peptides and three of them having overlapping sequences), collagen type I alpha 2 (COL1A2; one peptide), collagen type III alpha 1 (COL3A1; one peptide), collagen type IX alpha 2 (COL9A2; one peptide), fibrinogen alpha (FGA; two peptides having overlapping sequences), fibrinogen beta (FGB; two peptides having overlapping sequences), and uromodulin (UMOD; three peptides having overlapping sequences). Sequence alignment of these peptide biomarkers revealed tight sequence clusters for A1AT-, COL1A1-, FGA-, FGB-, and UMOD-derived biomarkers. The Mann–Whitney U tests (Table 6) were performed to evaluate the significance of discriminations between disease classes, as indicated, and between active systemic disease (ND/SAF) and contrasting AF, QOM, RD, and HC samples respectively. Two-nested peptides from A1AT (A1AT1796 and A1AT1945), and peptides COL1A1-1580, and FGB1794 differentiated ND/SAF from all other inflammatory classes (KD/FI/AF) and non-inflammatory classes (QOM/RD/HC), with P values <0.05 or ~0.05. Of the remaining 13 peptides, COL1A1-11734, COL9A2-1126, FGB1631 and UMOD1755 differentiated ND/SAF from all other inflammatory classes (KD/FI/AF); the other nine peptides did not show an obvious pattern. Analysis of the Student’s t test statistic (Table 7) showed all but one urine peptide (COL9A2-1126) in the 17-peptide signature are found at higher levels in urine from systemic SJIA (ND/SAF) when compared with other inflammatory (KD/FI/AF) and non-inflammatory (QOM/RD/HC) classes respectively.

Table 6 The 17-urine-peptide SJIA biomarkers were found to be degradation products of eight different proteins
Table 7 Protein sources of urine peptides and standardized differential mean expression in indicated disease groups based on Student’s t test

Seventeen-Urine-Peptide Biomarker Panel Effectively Discriminates SJIA Flare from KD and FI

In order to test whether the 17-peptide biomarkers could collectively serve as a diagnostic biomarker panel, a logistic regression model was then used to find a linear combination of the 17-peptide biomarkers that minimizes the total classification error discriminating SJIA systemic ND and SAF patients from KD and FI patients. Figure 2a plots the linear discriminant probabilities of the peptide biomarker panel. Samples had good separation between the highest and next highest probability for the classification. Seventeen of the 20 SJIA flare and all 46 non-SJIA (KD and FI) patients were correctly classified. The maximum estimated probabilities for each of the wrongly classified samples, are labeled with arrows. A modified 2 × 2 contingency table (Fig. 2b) shows the percentage of classifications that agreed with clinical diagnosis. Overall, the 17-peptide biomarker panel classified the SJIA systemic flare samples with 85% positive agreement with the clinical diagnosis, and the other systemic disease samples with 100% agreement with the clinical diagnosis (P = 4.53 × 10−13).

Fig. 2
figure 2

Evaluation of the 17-urine-peptide biomarker panel as a classifier of SJIA versus systemic inflammation from Kawasaki disease or acute febrile illness. a A logistic regression model was used to find a panel-based algorithm that minimizes the total classification error discriminating SJIA systemic disease from inflammation due to KD/FI. The maximum estimated probabilities for each of the wrongly classified samples, are labeled with arrows. b A modified 2 × 2 contingency table shows the percentage of classifications that agreed with clinical diagnosis. c The discriminant analysis-derived prediction scores for each sample were used to construct a receiver operating characteristic (ROC) curve; 500 testing data sets, generated by bootstrapping, from the SJIA systemic flare, KD, and FI data were used to derive estimates of standard errors and confidence intervals for our ROC analysis. The plotted ROC curve is the vertical average of the 500 bootstrapping runs, and the box and whisker plots show the vertical spread around the average. d Distribution of the standardized ROC AUC values of the 500 falsely discovered panels upon the 500 class-label permutated data set of the cohort of SJIA F and KD/FI urine peptidomes. Examining all the 500 falsely discovered biomarker panel ROC AUC values, the number of falsely discovered same-size panels that have ROC AUC values greater than that of the original urine biomarker panel (represented by the red vertical line) dividing the total number of the “falsely discovered” biomarker panels led to the estimation of false discovery rate FDR

To evaluate the performance of our peptide panel for separating SJIA flare from KD and FI, we used the discriminant analysis-derived prediction scores for each sample and constructed ROC curves [35, 36]. In addition, we utilized bootstrapping, a re-sampling technique to construct multiple-testing data sets to further evaluate the classification performance of the 17-urine-peptide biomarker panel. Figure 2b summarizes the 500 bootstrapping runs of the SJIA systemic flare, KD, and FI samples to derive the estimates of standard errors and confidence intervals for our ROC analysis. The plotted ROC (Fig. 2c) curve is the vertical average of the 500 bootstrapping runs, and the boxes and whiskers plot the vertical spread around the average. The ROC analysis yielded an averaged area under the curve (AUC) value of 0.999, indicating high performance.

Seventeen-Urine-Peptide Biomarker Panel Effectively Discriminates SJIA Flare from Quiescence and Remission

We next sought to determine whether the panel of the 17-urine-peptide biomarkers could serve as a flare signature to discriminate SJIA flare samples from samples of patients at QOM and RD. A logistic regression model was used to find a linear combination of the 17-urine-peptide biomarkers to minimize the total classification error, classifying patients of SJIA systemic flare from QOM and RD. Figure 3a plots the linear discriminant probabilities of the peptide biomarker panel. Samples had good separation between the highest and next highest probability for the classification. Eighteen of 20 SJIA flare (90%) and all 27 SJIA quiescent or remission patients were correctly classified. The maximum estimated probabilities for each of the wrongly classified samples are labeled with arrows. A modified 2 × 2 contingency table (Fig. 3b) shows the percentage of classifications that agreed with clinical diagnosis. Overall, the 17-urine-peptide biomarker panel classified the SJIA flare samples with 90% positive agreement with the clinical diagnosis and quiescent or remission samples with 100% agreement with the clinical diagnosis (P = 4.16 × 10−11). Figure 3c summarizes the 500 bootstrapping runs of the SJIA systemic flare, quiescent and remission samples to derive the estimates of standard errors and confidence intervals for our ROC analysis. The plotted ROC (Fig. 3c) curve is the vertical average of the 500 bootstrapping runs, and the boxes and whiskers plot the vertical spread around the average. The ROC analysis yielded an AUC value of 0.998.

Fig. 3
figure 3

Evaluation of the 17-peptide biomarker panel as a classifier of active SJIA versus inactive (quiescent or remitted) SJIA. a A logistic regression model was used to find a panel-based algorithm that minimizes the total classification error discriminating active systemic SJIA from inactive SJIA. The maximum estimated probabilities for each of the wrongly classified samples, are labeled with arrows. b A modified 2 × 2 contingency table shows the percentage of classifications that agreed with clinical diagnosis. c The discriminant analysis-derived prediction scores for each sample were used to construct a receiver operating characteristic (ROC) curve; 500 testing data sets, generated by bootstrapping, from the SJIA systemic flare, and inactive SJIA data were used to derive estimates of standard errors and confidence intervals for our ROC analysis. The plotted ROC curve is the vertical average of the 500 bootstrapping runs, and the box and whisker plots show the vertical spread around the average. d Distribution of the standardized ROC AUC values of the 500 falsely discovered panels upon the 500 class-label permutated data set of the cohort of SJIA F and QOM/RD urine peptidomes. Examining all the 500 falsely discovered biomarker panel ROC AUC values, the number of falsely discovered same-size panels that have ROC AUC values greater than that of the original urine biomarker panel (represented by the red vertical line) dividing the total number of the “falsely discovered” biomarker panels led to the estimation of false discovery rate FDR

Unbiased Significance Analysis and Multiple Hypothesis Testing

In the ROC analyses of the 17-urine-peptide biomarker panel for discriminating SJIA F versus QOM/RD or SJIA F versus KD/FI, bootstrapping (a re-sampling technique) was used to avoid bias due to the presence of outliers in our assayed samples. In both cases shown in the ROC plots, the ROC analyses (Figs. 2 and 3c) yielded a significant AUC, indicating the ROC curve was not affected significantly by the bootstrapping process and demonstrating the robustness of our 17-urine-peptide biomarker panel in discriminative analyses.

As observed in other high throughput analyses, e.g., microarray expression profiling, where the number of profiled features greatly exceeds that of the assayed samples, concurrent analysis of MALDI-TOF spectral peaks to evaluate null hypotheses for differential urine peptide biomarkers leads to the multiple-testing problem. To address the multiple-hypothesis testing problem, we estimated the FDR in concurrent statistical tests of peptide panels, of the same size as our biomarker panel; multiple permutated “random” training data sets were constructed. The class labels of our training samples in either cohorts of SJIA F and QOM/RD, or cohorts of SJIA F and KD/FI, were permutated 500 times such that each time every sample would be randomly assigned a new class label (SJIA F or QOM/RD in F and QOM/RD discrimination; SJIA F or KD/FI in SJIA F and KD/FI discrimination). For each of the 500 simulated “training” sets, NSC algorithm was applied to rank all the MALDI-TOF spectral peak features based upon their ability to discriminate the binary classes: SJIA F versus QOM/RD; and SJIA F versus KD/FI, respectively. The NSC-selected top 17-peak features were then designated as the “panel” for LDA analysis. ROC analysis subsequently was used to calculate the AUC for this “falsely discovered panel”. The AUC values of the 500 falsely discovered panels were standardized, and the density distribution was plotted in Figs. 2 and 3d. FDR was calculated as the number of AUC values greater than that of our 17-urine-peptide panel divided by the total number of AUC values of the “falsely” discovered panels. As shown in Figs. 2 and 3d, the FDRs of our urine peptide biomarker panel are estimated as 0.2% in SJIA F versus QOM/RD discrimination, and 0.2% in SJIA F versus KD/FI discrimination respectively. These results support the notion that the discovery of our peptide biomarker panel is unlikely to be the outcome of chance in multiple hypothesis testing.

Direct Sequencing and Cataloging of Naturally Occurring, Normal Plasma Peptides Revealed Nested COL1A1, FGA, and FGB Peptides That Are Related to SJIA Urine Peptide Biomarkers

We reasoned that, at least, some of the SJIA urine peptide biomarkers, such as FGA peptide biomarkers, are likely filtered from the circulation into urine. To explore this, we fractionated normal plasma by two-dimensional chromatography to extract the naturally occurring peptides for MSMS peptide sequencing. Similar to other plasma peptide direct sequencing efforts [38], we observed FGA peptide clusters. We found seven FGA peptide clusters in plasma (Electronic Supplementary Table 1), and our urine peptide biomarkers FGA (20–38) 1,536.69, FGA (605–628) 2,560.2, and FGA (605–629) 2,659.24 were observed in plasma FGA peptide clusters I and VII. We did not detect FGA (605–621) 1,826.80, although this peptide was found in a published plasma peptidome sequencing effort [38]. Urinary peptide FGA (607–622) 1,639.77, and FGA (605–622) 1,883.80 were not detected in either our analysis or the previously published one [38]. The urinary FGA peptides found in active SJIA samples (Table 8) lack one or two C-terminal residues compared with a related plasma peptide and thus appear to derive from exopeptidase activity. The urinary COL1A1 peptide that is most differentially expressed in active SJIA (Table 8) extends beyond the C terminus of a closely related peptide found in normal plasma, suggesting it may be generated by inhibition of normal protease activity during SJIA. A urinary FGB peptide found in SJIA (at similar levels in KD, but not comparison groups with less systemic inflammation) is identical to a peptide found in normal plasma, suggesting that the precursor protein is increased during inflammation. Our data indicate that at least some SJIA urine peptide biomarkers likely originate in circulation and are filtered into the urine.

Table 8 Identification of peptides found in normal plasma that are related to SJIA urine peptide biomarkers

Identification of TIMP1, IL18, RANTES, P-Selectin, MMP9, and L-Selectin as SJIA Plasma Flare Biomarkers

Fibrinogen is degraded by MMP9 [3941], and the fibrinogen degradation fragments have been shown to be biologically important molecules with numerous pro-inflammatory actions [42]. We reasoned that the generation of the SJIA urine biomarker peptides, including those derived from fibrinogen, may be an outcome of the actions of inflammatory mediators on protease expression and regulation, generating a disease-specific degradation pattern of source proteins, such as fibrinogen.

To explore this hypothesis, we utilized an antibody array, consisting of pairs of capture and detection antibodies against 43 proteins of chemokines and cytokines, protein catabolism regulators, and cell surface molecules involved in leukocyte adhesion, to profile and compare the F and Q plasma samples (demographics shown in Tables 4 and 5). Our training data set derived from plasma samples from 25 patients at the time of SJIA systemic flare and 14 patients at the time of quiescence. Classifier discovery and feature selection by a nearest shrunken centroid (NSC) algorithm [32] was performed with all the 43 proteins. Ten fold internal cross-validation analysis led to the discovery of a candidate flare signature consisting of six proteins: TIMP1, IL-18, RANTES, P-Selectin, MMP9, and L-Selectin (Fig. 4a).

Fig. 4
figure 4

Identification of six plasma proteins as a SJIA plasma flare panel. a All of the six plasma biomarker proteins are of higher abundance in SJIA flare. Relative abundance: the nearest shrunken centroid values [32] have been utilized to represented the relative abundance of biomarkers in either SJIA F or Q patient class. b A logistic regression model was used to find a panel-based algorithm that minimizes the total classification error discriminating SJIA F from Q. The maximum estimated probabilities for each of the wrongly classified samples, are labeled with arrows. c A modified 2 × 2 contingency table shows the percentage of classifications that agreed with clinical diagnosis. d The discriminant analysis-derived prediction scores for each sample were used to construct a receiver operating characteristic (ROC) curve; 500 testing data sets, generated by in silico bootstrapping, from the SJIA F and Q, both the training and the experimentally bootstrapped, data were used to derive estimates of standard errors and confidence intervals for our ROC analysis. The plotted ROC curve is the vertical average of the 500 bootstrapping runs, and the box and whisker plots show the vertical spread around the average

We used the NSC algorithm to derive shrunken class means of biomarker protein abundance and gauged the relative quantity of each plasma protein in the SAF and QOM samples to assess the relative resolving power of each biomarker (Fig. 4a). To validate the antibody array observations, TIMP1 and MMP9 concentrations in SJIA plasma were also determined using enzyme immunometric assay kits from RayBiotech, Inc (Norcross, GA; data not shown). All of the SJIA flare biomarker proteins were found at higher levels in plasma at SJIA flare state. The LDA classification results were used to calculate the percentage of classification that agreed with clinical diagnosis, as shown in a modified 2 × 2 contingency table (Fig. 4b, left panel). The six-protein biomarker panel classified the SJIA flare samples with 92% positive agreement and the non-flare samples with 71.4% agreement (Fig. 4c, left panel) (P = 7.9 × 10−5).

To assess the performance of the peptide biomarker panel in the classification of “unknown” samples, we carried out an experimental bootstrapping approach. Instead of in silico bootstrapping simulation, samples belonging to different visits of the same patient and even the same samples were assayed, i.e., “bootstrapped” experimentally, for testing. For the bootstrapping testing, a total of 52 SJIA samples (demographics shown in Tables 4 and 5) were analyzed by the antibody array where 41 samples were from 19 patients at the time of SJIA flare, and 11 samples were from eight patients at the time of SJIA quiescence. Figure 4b plots the linear discriminant probabilities of the peptide biomarker panel for the training (left) and bootstrapping data (right); in both cases, samples had good separation between the highest and next highest probability for the classification.

Our six-biomarker panel classified blindly the bootstrapping samples with 87.8% agreement with the clinical diagnosis for the flare samples and 81.8% agreement for the quiescent samples (Fig. 4c, right panel) (P = 2.4 × 10−5 for the bootstrapping test). Based upon the discriminant analysis-derived prediction scores for each sample, we constructed ROC curves [35, 36] to evaluate the performance of our plasma protein panel for distinguishing flare from quiescence samples. Figure 4d summarizes the 500 bootstrapping runs of the assayed SJIA flare and quiescent samples to derive the estimates of standard errors and confidence intervals for our ROC analysis. The ROC analysis yielded an AUC values of 0.922 for the training (Fig. 4d left panel) and 0.907 for the bootstrapping testing (Fig. 4d right panel), respectively.

Discussion

Urine based proteomic profiling is a novel approach that may lead to the discovery of non-invasive biomarkers for diagnosing patients with different diseases, with the aim to ultimately improve clinical outcomes [18]. Given new and emerging analytical technologies and data mining algorithms, the urine peptidome has become a rich resource for the discovery of naturally occurring peptide biomarkers. For pediatric diseases, urine is expected to become one of the most useful body fluids in clinical proteomics for diagnosis and risk-stratification. Mass spectrometry-based urinary protein and peptide profiling has led to the discovery of highly informative biomarkers for both urogenital and non-uro-genital diseases [43].

At the current time, urine proteomics have been applied primarily to diseases affecting the kidney and urinary tract. Our focus on changes in urine that reflect systemic inflammation is novel and potentially of broad use [18]. One of our long-term goals is to use urine biomarkers to develop clinical tests that are non-invasive and feasible for frequent sampling and determination. With this in mind, urine peptidomes from SJIA patients were profiled to identify naturally processed urine peptide biomarkers and 17-urine peptides emerged as a candidate SJIA flare panel. This panel was found to be robust using statistical analyses. Nonetheless, the panel requires validation using a new sample set of sufficient size, guided by power analysis.

The panel discovered in this study appears capable of discriminating patients with active SJIA from those with quiescent or remitted disease. Similar to other molecular changes, such as plasma protein profiles [44], this urine peptide panel may detect incipient SJIA disease activity prior to clinical evidence of disease. In order to offer a significant clinical advantage to justify routine monitoring of urine biomarkers, a test would have to be more sensitive than the history and physical exam at predicting impending flare, and would need to predict those disease flares which do not self-resolve and therefore require escalation of medical therapy. Serial evaluation of urine samples from SJIA subjects using MRM analysis will be performed to test this hypothesis.

The urine peptide panel also identifies subjects with active SJIA when compared with those with KD and FI. Our ability to discriminate between SJIA patients and other acute systemic inflammatory conditions is promising for future development of diagnostic tools. However, the SJIA patients in this study, except for the two new onset patients, are older than KD and FI patients. Collagens, bone growth and other connective tissue production may differ substantially. Therefore, development of diagnostic markers discriminating new onset SJIA from confounding acute inflammatory diseases, e.g., KD and FI, requires age-matched subjects. To continue the discovery efforts and to validate the current biomarker panel, we plan to assemble a larger cohort of new onset SJIA. Another potential utility of the SJIA urine biomarker panel is to distinguish SJIA flare from infection in a patient with known SJIA, as these scenarios require different therapeutic responses. Urine samples from cohorts of both SJIA flare and SJIA patients with known infection will be assembled to validate the urine peptide biomarker panel revealed from this study.

The parent proteins of the urine peptide biomarkers can be found in the circulation or kidney. For example, A1AT, FGA, and FGB are all acute-phase plasma proteins, which are synthesized by hepatocytes and megakaryocytes [45] and are found at high levels in the circulation in the setting of acute or chronic inflammation. Increased fibrinogen and fibrin deposition within joints are prominent indicators of active SJIA flare [46] and arthritis [47]. We hypothesize that the SJIA urine peptide biomarkers may result from changes in the concentrations of inflammatory mediators and protein catabolism regulators, altering the levels of peptides that are ultimately filtered into urine or generated in the urinary tract from local protease activity. The consequence is the generation of a disease-specific molecular phenotype in SJIA urine. In support of this model, we and others [38] have found urine FGA peptide biomarkers in plasma. This would suggest that the FGA urine peptide biomarkers are likely to be present in circulation. Future prospective studies are needed to determine whether the plasma A1AT, FGA, and FGB peptides have diagnostic or prognostic value in SJIA disease management. However, UMOD protein is not derived from blood, but is produced by the thick ascending limb of the loop of Henle in kidney. Our plasma analysis failed to find any UMOD peptides, suggesting that UMOD peptide biomarkers are coming from kidney. The differential abundance of UMOD urine peptide biomarkers in SJIA suggests that SJIA is likely to have an impact on normal kidney function. Renal disorders in SJIA patients are not well characterized. One 9-year-old SJIA patient was characterized by an aggressive disease course and developed renal amyloidosis just 2 years after the disease onset [48]. A variety of renal disorders can occur in patients with rheumatoid arthritis (RA), which may due to the underlying disease. The most common disorders associated with RA are membranous nephropathy, secondary amyloidosis, a focal, mesangial proliferative glomerulonephritis, rheumatoid vasculitis, and analgesic nephropathy [49, 50]. It is unclear whether SJIA directly affects renal function or indirectly causes renal inflammation.

It would have been of interest to assess changes in total protein or peptide excretion in our tested SJIA and contrasting KD/FI samples. Amounts of inflammatory proteins excreted in urine can change dramatically, and febrile states have often been associated with increased protein excretion [51, 52]. Future characterization of the SJIA and KD/FI urine proteomes can help determine whether disease-related changes in total of protein excretion explain the changes in urine peptide profiles we observe in SJIA. Disease-specific alterations of gene transcription in the affected tissue and change in the balance of proteolytic and anti-proteolytic activities in urine, as we have proposed previously [18], may also contribute to the altered pattern of urine peptides in SJIA. To further explore the possible underlying mechanisms related to urine peptide generation, we used an antibody array with 43 proteins known to be involved in the inflammation of SJIA, including certain proteases and their regulators. Plasma profiling of SJIA flare and quiescent samples using this antibody array identified a biomarker panel of TIMP1, IL-18, RANTES, P-Selectin, MMP9, and L-Selectin, all of which are present at higher abundance in SJIA flare than in quiescence. Given that fibrinogen is a substrate of MMP9 [3941], it is possible that up regulation of MMP9 and TIMP1 in circulation may be directly associated to the generation of FGA peptide biomarkers that ultimately are enriched in urine during active SJIA. It has been shown that treatment of active rheumatoid arthritis with golimumab (human monoclonal antibody to TNF-α) plus methotrexate significantly decreases serum IL-18, E-selectin, TIMP1 and MMP9 levels [53]. IL-18 also has been reported to be a candidate for a key cytokine in the pathogenesis of SJIA [54]. Notably, IL-18 synthesis is increased in SJIA, but not in KD [55], indicating that there are differences in the inflammatory milieu in these (sometimes clinically similar) diseases. Such differences may explain the differences in expression of the FGA peptide biomarkers in urine between SJIA flare and Kawasaki disease. The observation of P/L-Selectins as part of the plasma biomarker panel suggests that P/L-Selectin-mediated leukocyte migration might be important in SJIA pathogenesis, possibly by mediating the recruitment and/or trafficking of specific leukocyte subtypes into inflammatory foci. Previously analysis [56] of rheumatoid arthritis (RH)-specific collagen breakdown products indicates that RH-specific fragments are formed locally in synovial fluids during diseases process and then released into the circulation. It is likely that the SJIA urine peptide biomarkers, in a similar formation mechanism as RH-specific collagen degradation products, originate due to local inflammation, and then are released in the circulation, which are ultimately enriched and ended in urine.

Together, urine peptidomics and targeted plasma profiling revealed a urine biomarker panel of 17-urine peptides and a plasma biomarker panel of six plasma proteins as SJIA flare signatures. Shown in Fig. 5, our integrated analyses suggest that the differential abundance of urine peptides in SJIA urine may be an outcome of both the pathophysiological changes initiated by IL-18 and RANTES and P/L-Selectin-mediated inflammatory responses and the function of leukocyte-derived TIMP1/MMP9; the latter would influence protein catabolism in SJIA. The inflammatory cytokines may also directly affect levels of expression of substrate proteins and influence levels of expression of peptide derivatives of these proteins.

Fig. 5
figure 5

Current model: SJIA urine peptide biomarkers reflect changes in expression of inflammatory mediators and proteolytic and anti-proteolytic activities during active SJIA

Evaluation of urine peptide profiles in future prospective studies will test the robustness and diagnostic/prognostic values of these urine peptide biomarkers and may provide new insights into SJIA pathogenesis.