You are on page 1of 8

ARTICLE pubs.acs.

org/ac

Automated Iterative MS/MS Acquisition: A Tool for Improving Efficiency of Protein Identification Using a LCMALDI MS Workflow
Haichuan Liu, Lee Yang, Nikita Khainovski, Ming Dong, Steven C. Hall, Susan J. Fisher, Mark D. Biggin, Jian Jin, and H. Ewa Witkowska*,

UCSF Sandler-Moore Mass Spectrometry Core Facility and Department of Obstetrics, Gynecology and Reproductive Sciences, University of California, San Francisco, California 94143, United States Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States Consultant, Framingham, Massachusetts, United States
S b Supporting Information

ABSTRACT: We have developed an information-dependent, iterative MS/MS acquisition (IMMA) tool for improving MS/ MS eciency, increasing proteome coverage, and shortening analysis time for high-throughput proteomics applications based on the LCMALDI MS/MS platform. The underlying principle of IMMA is to limit MS/MS analyses to a subset of molecular ions that are likely to identify a maximum number of proteins. IMMA reduces redundancy of MS/MS analyses by excluding from the precursor ion peak lists proteotypic peptides derived from the already identied proteins and uses a retention time prediction algorithm to limit the degree of false exclusions. It also increases the utilization rate of MS/MS spectra by removing low value unidentiable targets like nonpeptides and peptides carrying large loads of modications, which are agged by their nonpeptide excess-to-nominal mass ratios. For some samples, IMMA increases the number of identied proteins by 2040% when compared to the data dependent methods. IMMA terminates an MS/MS run at the operator-dened point when costs (e.g., time of analysis) start to overrun benets (e.g., number of identied proteins), without prior knowledge of sample contents and complexity. To facilitate analysis of closely related samples, IMMAs inclusion list functionality is currently under development.

he eectiveness of protein identication in untargeted shotgun1 MS workows suers from several limitations. First, the data dependent MS/MS acquisition process is strongly biased toward abundant proteins.2 Second, the stochastic nature of precursor ion selection in shotgun analyses adversely aects repeatability and reproducibility.3 Third, the complexity of the proteome under study and the bandwidth of a MS instrument are often grossly mismatched, resulting in undersampling.4 Parallel benchmarking experiments which used targeted analysis as a reference provided a measure of these limitations.5 Faced with the need for untargeted analysis of thousands of samples in the course of protein complex identication in Desulfovibrio vulgaris Hildenborough (DvH) using the tagless strategy,6 we have sought to improve the eciency of the reversed phase (RP) liquid chromatography (LC)matrix assisted laser desorption ionization (MALDI) MS workow that we employ for this study. We focused on the precursor ion selection step because of its high impact on the number of identied proteins. The most commonly used precursor ion selection criterion is intensity because of its relatively strong, albeit not universal,
r 2011 American Chemical Society

direct relationship to the information content of the product ion spectrum. In this scenario, peptides derived from abundant proteins are likely to dominate the MS spectrum and thus will be preferentially selected for MS/MS, generating redundant information. Of note, an approach for excluding redundant precursors, i.e., peptides derived from the already identied proteins from further MS/MS analysis was introduced for LC electrospray ionization (ESI) MS by Wallace et al.7 Unlike LCESI MS, the LCMALDI MS workow has an inherent advantage of decoupling peptide fractionation from MS and decoupling MS from MS/MS. Once samples are spotted onto the MALDI plate, they are frozen in time,8 and MS and MS/MS steps can be planned and executed in a highly controlled manner. However, beyond implementation of dynamic exclusion, a standard feature for most commercial MALDI instruments today, signicant improvements in precursor ion selection in
Received: April 27, 2011 Accepted: June 28, 2011 Published: July 15, 2011
6286
dx.doi.org/10.1021/ac200911v | Anal. Chem. 2011, 83, 62866293

Analytical Chemistry LCMALDI MS workows have been slow to come. Important enhancements include limiting MS/MS to a subset of the elution time aligned m/z features of interest,9 exclusion of the already analyzed precursors from the replicate runs,10,11 and optimization of precursor ion selection in 2D LC workows,12,13 the latter work employing a sophisticated combination of exclusion and inclusion lists. Advantages of eliminating redundant precursors were demonstrated for the MALDI MS analysis of unseparated digests of proteins fractionated by gels.14 To the best of our knowledge, redundancy in LCMALDI MS workows was not systematically addressed until 2009 when Zerck et al.15 introduced iterative precursor ion selection (IPS) and we presented our work on iterative MS/MS acquisition (IMMA) at the 57th ASMS conference.16 Both IPS and IMMA strategies utilize iterative MS/MS acquisition as a fundamental operational design. Zerck et al.15 introduced the concept of scoring potential precursors for the likelihood that they would produce novel identications and applied these scores to aect precursor ion selection at the next iteration of analysis. Our approach is based on excluding undesirable molecular ions from the precursor ion list. The outcomes of the IPS and IMMA strategies show similar trends, and performance of both algorithms depends on sample complexity. IMMA software employs precursor ion lters based on the concepts of (i) peptide-specic nuclear mass defects17 that are encrypted in peptide monoisotopic masses,1820 (ii) proteotypic peptide character,2124 (i.e., the likelihood of generating molecular ions detectable in MALDI MS), and (iii) amino acid sequence dependence of RP HPLC peptide retention times (RT)2528 to generate exclusion lists. Detailed descriptions of these lters are presented in the Supporting Information. IMMA was developed to circumvent the two major limitations of data dependent protocols, which are (1) redundant analysis of abundant peptides at the expense of the less-abundant ones and (2) long MS/MS acquisition times. In this work we describe the development of the IMMA strategy, which was tested and deployed on 4800 and 5800 (AB Sciex, Foster City, CA) MALDI time-of-ight (TOF)/TOF platforms. To measure the success of the algorithm, IMMA results were compared to those delivered by a commercially available data dependent protocol, to which we refer to as brute force MS/MS acquisition or BruMMA. BruMMA performs MS/MS analysis on a preselected number of the highest quality precursors per spot in a predetermined, relative intensity-based order. While the IMMA software reported here was created to support our specic high-throughput project, we trust that it will perform well in other LCMALDIbased proteomics applications.

ARTICLE

presented at the 56th and 58th ASMS conferences.33,34 The detailed description of the LC protocol and MS and database settings is provided in the Supporting Information. LC Peptide Fractionation. The proteolytic peptides were fractionated by RP LC using a 17 min gradient elution with acetonitrile in 0.1% trifluoroacetic acid (Ultimate 3000 dualcolumn HPLC system, Dionex, Sunnyvale, CA). Elutes were mixed with the matrix and spotted on the MALDI target using a SunChrom fraction collector/spotter (SunChrom, Friedrichsdorf, Germany). Each sample was separated into 129 fractions over a 8 min collection time, with a frequency of 3.66 s per spot. Mass Spectrometry. All analyses were performed using either a 5800 or 4800 MALDI TOF/TOF mass spectrometer (ABSciex). To trigger generation of the LCMALDI peak list, a mock MS/MS job was submitted by selecting a single precursor for MS/MS, (e.g., m/z of 1570.677, gluFib) and imposing a high precursor S/N threshold to minimize the number of actual acquisitions. During the IMMA and BruMMA tests, the laser power and number of laser shots were gradually increased for successive iterations. The MS/MS acquisition was performed in a stepwise manner, selecting one precursor per spot at a time. For BruMMA, the MS/MS precursor list was generated using the manufacturers interpretation method algorithm and manually submitted to start the MS/MS analysis. Each IMMA job was automatically submitted using in-house derived software. Database Search. ProteinPilot Software 3.0 (revision 114732; AB Sciex) with the Paragon method35 was employed to search MS/MS data. Generation of Theoretical DvH Protein Peptide Lists. Theoretical peptide lists for all applications were generated in silico from 3 525 DvH proteins using the following parameters: trypsin as the enzyme, no cleavages before Pro; maximum 1 missed cleavage; peptide mass range of 8006000 Da; fixed modification: carbamidomethyl at Cys. In addition, fixed iTRAQ modifications at N-termini and Lys side chains, using the average monoisotopic masses of all iTRAQ 8-plex reagents, were included for modeling false exclusion rates. Variable modifications of sulfoxide at Met and deamidation at Asn were considered for development of the fractional mass filter and peptide retention time predictions.

RESULTS AND DISCUSSION


Overview of IMMA. IMMA software supports an automated and integrated pipeline for data acquisition, processing, analysis, and storage. IMMA was designed to improve sample throughput and proteome coverage for shotgun proteomics analyses that utilize a LCMALDI MS/MS platform. To realize these goals, we developed tools for restricting MS/MS acquisition to a subset of nonredundant precursor ions that are likely to provide identifiable spectra and for terminating the analysis once costs (e.g., overall time) overreach the benefits (e.g., a number of new protein identifications). The former tool is an information based exclusion list builder aimed at avoiding repeated analyses of ions derived from the already identified species (i.e., redundant ions) and unproductive analyses of nonpeptide or otherwise unidentifiable ions. The latter tool provides the means for performing costbenefit analysis on the fly and terminating the run once the parameters of productivity fall below their preset threshold values. The current version of IMMA was built and installed on 4800 and 5800 MALDI TOF/TOF (AB Sciex) mass spectrometers.
6287
dx.doi.org/10.1021/ac200911v |Anal. Chem. 2011, 83, 62866293

EXPERIMENTAL SECTION
Sample Preparation. Two types of samples were used: commercially available Universal Proteomics Standard 1 (UPS1) (Sigma-Aldrich, St. Louis, MO) and fractions of DvH proteins generated by the tagless protein complex identification strategy.6 UPS1 contained an equimolar mixture of 48 proteins. The DvH samples were comprised of 7 or 8 size exclusion column fractions, each labeled with a distinct iTRAQ reagent, and contained 70100 DvH proteins of dissimilar concentrations. All samples were digested with trypsin and labeled using iTRAQ 8-plex reagents (AB Sciex)29,30 using 96-well PVDF membrane plates (Millipore), according to the protocol that we developed on the basis of the work of Papac et al.31 and Basa et al.32 and

Analytical Chemistry

ARTICLE

Figure 1. Schematic diagram of the IMMA data and process ow.

An overview of the process is graphically presented in Figure 1. The raw MS peak list, stored in the Oracle database, is generated from the MS scans of all LCMALDI spots. The Interpretation algorithm of the 4000 Series Explorer software creates a LCMALDI MS peak list by assigning apices of elution, i.e., the best MALDI spots, for all m/z features that meet the required criteria of intensity, S/N, and resolution. The IMMA process begins by retrieving the LCMALDI MS peak list, removing trypsin autolytic peptides, and using the peptide fractional mass (PFM) lter, m/z features that fall outside the allowable peptidespecic values. All of the IMMA jobs are based on this ltered peak list. The single most intense ion per spot is selected, and a MS/MS job is triggered by the IMMA software. Once a job completion signal is received, a database search of all the MS/MS data already acquired using ProteinPilot 3.0 software (AB Sciex) is initiated. The search results are parsed to the condent and noncondent identication categories and numbers of condent protein/peptide IDs that accumulated up to the current point of analysis are counted. The status of analysis is compared to the run termination settings that were specied at the start of the run, and the run is stopped if the gains in protein/peptide identications are not sucient to warrant further iterations. Otherwise, IMMA software proceeds to generate, in a MySQL database, an exclusion list of proteotypic peptides representing condently identied proteins. A calibrated peptide RT prediction algorithm assigns the theoretical RT values to these peptides, and an exclusion list is constructed by specifying m/z and RT tolerances, in this work set at (25 and (50 ppm for internally and externally calibrated spectra, respectively, and (0.9 min for RT. At the same time, satellite peaks, i.e., lower abundance ions that cannot be isolated from their higher abundance neighbors because of the timed ion selector resolution limits are excluded.

The exclusion lists are updated after each iteration, and the process continues until the run termination criteria are met. We stress that inherently IMMA does not limit the ability to acquire multiple peptides per protein but rather it allows the operator to control this number. For example, the stage of analysis at which the exclusion list functionality is activated is fully adjustable by customizing the settings of the required number of condently identied peptides and/or total protein score. Likewise, the run termination criteria can be set to any level depending on the goal of analysis, i.e., an exhaustive examination of the sample vs facile generation of a minimal data set that conforms to the current recommendations in the proteomics eld.36,37 The biggest challenge in optimization of IMMA and any method that relies on trimming the precursor ion selection list is reaching a healthy balance between over- and under-exclusion of contenders deemed to be undesirable. IMMA utilizes both general and sample specic ltering to remove peaks of low value for the purpose of identifying the maximum number of proteins, as briey described below. Peptide Fractional Mass Filter. The PFM filter categorizes the observed m/z features as peptides and nonpeptides on the basis of their excess38 to nominal38 (E/N) mass ratios. (We refer here to fractional mass in a generic sense, as this term has been widely used in the MS field to denote a manifestation of the nuclear mass defect at the level of atomic and molecular masses; see discussion in the Supporting Information.) Fundamentally, there is an intrinsic ambiguity in the values of nominal and excess masses for an m/z feature of unknown composition. Specifically, the contribution of excess mass to the integer portion of the monoisotopic mass can be difficult to predict. Therefore, the PFM filter calculates the ratios for a series of excess and nominal
6288
dx.doi.org/10.1021/ac200911v |Anal. Chem. 2011, 83, 62866293

Analytical Chemistry

ARTICLE

Figure 2. Development of the calibrated peptide retention time prediction algorithm. Panel a: Distribution of errors in retention time prediction follows a Gaussian t. Errors are expressed as a distance, in MALDI spots, between the predicted spot position and the actual spot where the intensity of the peptide m/z signal was the highest. The accuracy of the measurement of the apex of peak elution is limited by the time of spot deposition, here 3.66 s. Panel b: A typical least-squares t for the theoretical and experimental retention times within a given LC separation calculated for peptides identied during the rst IMMA iteration. The resulting linear equation (top left corner) is used to predict retention times of proteotypic peptides for all subsequent IMMA iterations.

mass pairs that are possible for a given m/z feature and selects the ratio that comes closest to the average peptide specific value, i.e., 0.05%. This approach is biased toward peptide-like values, and therefore it minimizes the risk of excluding true biological peptides from the precursor ion list. The E/N mass ratios of theoretical DvH peptides (see the Experimental Section) occupy a range of 0.0380.072%. To increase throughput, we have limited the acceptable peptide-specific E/N mass ratio range to 0.0460.062% as a compromise between the losses in protein identification due to the false exclusion of peptides and the gains in shortening analysis time by removing from further analysis nonpeptide precursors, i.e., matrix cluster, contaminants, or peptides carrying extensive modifications (examples shown in Figure S1 in the Supporting Information). The PFM filter is an adjustable parameter in IMMA. Proteotypic Peptide Prediction. We have applied the PeptideSieve software of Mallick et al.39 that was developed using a PAGE-MALDI training data set to assign proteotypic probabilities to theoretical DvH peptides. A brief overview of features of this algorithm is provided in the Supporting Information. Out of 179 598 theoretical peptides, 118 970 (66%) had proteotypic probabilities of g0.3. To date, we have identified over 15 676 DvH distinct peptides and more than 50% (8,679) were predicted to be highly proteotypic (probability g0.90), indicating that the PAGE-MALDI PeptideSieve algorithm performed reasonably well for our system, despite significant differences between

our workflow (digestion on PVDF, RP LC) and model workflow (in gel digestion, no LC). We selected a conservative value of the required proteotypic probability g0.30 as a reasonable compromise between under-exclusion of target peptides and overexclusion of nonrelated peptides with overlapping m/z and RT values. Of note, current predictions ignore the presence of iTRAQ 8-plex modifications, a hallmark feature in our study, since the training set on which the PeptideSieve algorithm was based did not contain this label. While developing the IMMA software, we needed to utilize a generic set of classiers based on data collected by other laboratories. Now we have our own extensive data set that will allow us to improve the robustness of proteotypic peptide predictions for our specic analytical platform.40 Calibrated Peptide Retention Time Prediction. IMMA employs a calibrated peptide retention time prediction algorithm and alignment tools to aid the generation of exclusion lists. The RT prediction algorithm assumes a linear relationship between the contributions of peptide building blocks, i.e., amino acid & modification retention indices (AAMRIs), and the peptide elution characteristics under specified conditions of RP LC. To generate AAMRIs, a training set of RP LCMALDI MS data (88 662 peptides, 524 runs) was analyzed by the singular value decomposition algorithm (SVD).41 The resulting AAMRIs (Figure S2 and Table S1 in the Supporting Information) are treated as constants for a given analytical platform and are used to predict peptide RT in future runs. The errors of the SVD fit, shown in the form of the differences between the experimental RTs and the fitted RTs, exhibit a Gaussian distribution, with an average error of 1.1 spots and a standard deviation of 6.2 spots (Figure 2a). For each new IMMA run, the experimental RTs of peptides confidently identified at the first iteration of MS/MS are used to align this run to a zero-shift data set (i.e., virtual spotted sample plate) representing an ideal LC run where peptide RTs match the values calculated using AAMRIs (Figure 2b). Through the application of SVD, the slope and shift parameters are calculated and applied to RT prediction for proteotypic peptides in this run. The values of slope are usually fairly close to unity and in practice the prediction algorithm utilizes only shift values derived from the first MS/MS iteration and applies them to all further iterations. However, IMMA software enables further refinements in slope and shift parameter values at later iterations, if deemed necessary. Following a series of trial IMMA experiments (data not shown), the current accuracy window for RT prediction was set conservatively to 1.8 min, which is equivalent to 30 MALDI spots (4.8 standard deviations of the RT error distribution shown in Figure 2a). IMMA Software. The IMMA software was developed in Microsoft Visual Studio 2005 and later in Visual Studio 2008, using C# language and ASP.NET technology. TCP/IP communications between the IMMA software and the 4800/5800 MALDI TOF/TOF instruments were established with the software library, 4000 Series Instrument Control Application Program Interface, provided by AB Sciex. In particular, we have adapted the key communications module, CSocket from C++ into C# language for compatibility purposes with our development infrastructure. The detailed description of the software design and its graphic user interface (Figure S3) is provided in the Supporting Information. Estimating False Exclusion Rates. We have examined the theoretical and experimental false exclusion rates associated with PFM filtering and theoretical false exclusion rates related to the
6289
dx.doi.org/10.1021/ac200911v |Anal. Chem. 2011, 83, 62866293

Analytical Chemistry exclusion of redundant precursors. The experimental data set included 343 933 MS/MS spectra (367 LC runs) that were acquired on iTRAQ-labeled DvH peptides using the BruMMA algorithm. The experimental false exclusion rate associated with the PFM filter was estimated at 20%. The theoretical false exclusion rate was 14.1%. Discrepancy between the experimental data and a theoretical model is most likely a consequence of nonconsidered modifications, multiple missed cleavages, and mass measurement errors. To establish the theoretical false exclusion rates associated with removal of redundant precursors, we considered iTRAQmodied proteotypic peptides (g0.3) within the 8006000 mass range. This theoretical set contained 179 592 peptides, out of which 65.2% were represented by distinct m/z values. The exact masses of the remaining 62 506 peptides were represented by 18 880 m/z values: 36% of these m/z values had a cardinality of 2 and 84% of less than 11, but many m/z bins contained several more isomeric species (up to 70), see Figure S4a in the Supporting Information. To limit further analysis to species that theoretically are resolvable by MS, we reduced the degenerate mass peptide set to a single, randomly selected peptide representative per m/z bin. In total, a distribution of 135 966 distinct peptide m/z values within the 135 966 exclusion tolerance windows that were centered at each of these species was derived, and the numbers of theoretical peptides close in mass to the potential target redundant species were calculated for each exclusion window. Despite the small size of the DvH proteome (less than 3 600 open reading frames), the danger of false exclusions of nonintended precursors is very high: 30% and 75% of peptides selected for exclusion would be accompanied by 04 and 013 unrelated peptides, respectively (Figure S4b in the Supporting Information). The importance of including RT as an orthogonal parameter of peptide identity is clearly demonstrated by this analysis: the number of close neighbors increases 23 times when the RT lter is not used. The calculations presented above illustrate the worst-case scenario for analysis of the DvH proteome, i.e., the absence of protein separation prior to LCMALDI MS analysis. In our application, for which IMMA was developed, a typical sample contains 70100 proteins. Hence, the extent of mass redundancy will be signicantly smaller than discussed above. However, it is to be expected that additional peptides that fall outside the modeled categories will add to the sample complexity. With these challenges and limitations in mind, we present the data that illustrate the performance of the IMMA version 1.0 deployed on the current experimental platform. Evaluation of Performance of IMMA. Initially, we used RP LC technical replicates to compare the performance of the information-dependent IMMA and data-dependent BruMMA algorithms. However, subtle run-to-run variations between LC MS replicates often rendered a fair comparison difficult due to the dissimilarities among the raw MS peak lists. Zerck et al.15 has also recognized this problem and turned to using simulation strategies to circumvent consequences of replicates being nonidentical. We have also utilized simulation approaches by mimicking application of the IMMA algorithm to the existing MS/MS data generated by BruMMA and obtained results generally favoring IMMA (data not shown). Ultimately, we adopted an experimental procedure, described below, to directly compare IMMA to the industry standard. In this procedure, both algorithms are executed in a stepwise ping-pong fashion while employing the same sample spotted onto a MALDI target. As

ARTICLE

described above, IMMA is executed in discrete steps while in a typical BruMMA scenario, all precursors are analyzed in a single run. In our comparative protocol, the execution of the precursor ion list generated by BruMMA is also split into multiple iterations, each selecting one precursor per spot, but in contrast to IMMA, no adjustments to this list are made as the run progresses. The MS/MS acquisition starts with either the IMMA or BruMMA algorithm. Once the first iteration is completed, the run stops and the first iteration of the alternative method is activated. The process is repeated until the requisite number of iterations is performed. In this experimental design, while consecutive steps of BruMMA and IMMA intertwine, both algorithms refer to the same MS peak list and hence differences between the same iterations for each method are minimized. Two types of samples were used for this analysis: a standard (UPS1) and real life DvH protein fractions. All sample sets generated similar numbers of LCMALDI peaks (20003000) out of which 25% were interrogated by MS/MS. Spectra utilization (i.e., a fraction of all acquired MS/MS spectra that were confidently matched to a peptide) was 65% and 45% for UPS1 and DvH samples, respectively. Typically, 68 productive MS/MS acquisitions were generated from a single spot, albeit this number would vary widely (415) depending on spot composition. The summary of the representative results of IMMA and BruMMA are shown in Figure 3 and in Tables S2 and S3 in the Supporting Information. The potential of IMMA for streamlining MS/MS analysis in LCMALDI workows lies in two areas: (i) identifying more proteins and (ii) shortening analysis time by limiting the number of MS/MS acquisitions. In terms of the number of nonredundant protein identications, IMMA signicantly outperformed BruMMA in analysis of DvH samples (Figure 3a). Accordingly, the eciency of MS/MS analysis dened as the number of MS/MS acquisitions required to identify a single protein was signicantly better for DvH samples when using IMMA, as compared to BruMMA: pairwise t-test p value of 7.5 103, 4.5 104, and 1.9 103 for three samples across all MS/MS iterations (Figure 3b). In contrast, no improvements were demonstrated for UPS1 samples (data not shown), for which very limited gains in new protein identications were seen after completion of the third iteration, regardless of the algorithm used. Apparently, the relatively low complexity of UPS1 did not warrant the high number of iterations that was originally set for analysis of this sample. We conclude that the potential benets of IMMA are highly sample dependent, in accordance with the observations of Zerck et al., who used UPS1 and a comparable real life sample to evaluate their IPS approach.15 The key to increasing time eciency of analysis is to terminate the MS/MS data acquisition at the stage when an output, however dened, outweighs the eort required for its production. IMMA software provides tools for making this decision by comparing, on the y, the current status of analysis to the predetermined parameters of success. For example, the new ID accrual rate that measures the productivity of MS/MS analysis can be utilized as a criterion of run termination. New ID accrual rates are calculated as ratios between the number of new protein IDs at iteration [N + 1] and the number of all IDs accumulated over iterations [1] through [N], plotted in Figure 3c,d. If we had set the new ID accrual rate threshold above 5% as a run termination parameter when analyzing the UPS1 sample, the MS/MS acquisition would have been halted after 3 iterations, missing few IDs but avoiding 180 MS/MS acquisitions of very low productivity (Figure 3c). In contrast,
6290
dx.doi.org/10.1021/ac200911v |Anal. Chem. 2011, 83, 62866293

Analytical Chemistry

ARTICLE

Figure 3. Eciency of protein identication using IMMA vs BruMMA algorithms. IMMA and BruMMA results are shown with dark and light gray lines, respectively. The top panels compare IMMA vs BruMMA results for DvH fractions: (a) benets, i.e., gains in new protein identications at consecutive steps of MS/MS acquisition, and (b) associated costs expressed as a number of MS/MS acquisitions required for an identication of a single protein. The dynamics of changes in accrual rates of new protein identications are shown in panels c and d for UPS1 and DvH samples, respectively. Accrual rates were calculated as ratios between the number of new protein IDs at iteration [N + 1] and the number of all IDs accumulated over iterations [1] through [N]. (c) The example threshold value of accrual rate lower than 0.05 that would lead to run termination was reached at the 3rd iteration for UPS1 samples. (d) In contrast, none of DvH samples reached this threshold after 56 iterations. The order of experiments shown in each panel corresponds to the order of experiments detailed in Tables S2 (UPS1) and S3 (DvH) in the Supporting Information.

DvH samples would still qualify for further analysis at the stage of 5 or 6 iterations (Figure 3d). The major advantage of IMMA in increasing the number of identied proteins is seen at the early stages of analysis. After a sample specic number of iterations, the results obtained by both strategies start to converge. Why? The exclusion of low-value candidates allows for the early selection and successful identication of precursors of lower intensity (Figure S5 in the Supporting Information). As the analysis progresses, the number of redundant ions derived from low-abundance proteins will be progressively smaller. Hence the positive impact of their exclusion will eventually loose practical signicance. An increase in a number of identied proteins with IMMA is achieved in part at the expense of lowering protein sequence coverage to the extent that is determined by the operator-set stringency of protein ID acceptance criteria. In practice, ID scores of abundant proteins often exceed these limits because exclusion lists are limited to a subset of theoretically possible peptides. Hence unpredicted peptides representing the already excluded proteins are still selected for MS/MS in subsequent iterations leading to increases in their ID scores (Tables S2 and S3 in the Supporting Information). At the same time, IMMA is likely to produce a signicant number of identications in the one-hit wonder category37 that are less likely to be observed by BruMMA at the same stage of analysis. On occasion, few proteins identied by BruMMA would escape detection by IMMA, even though the total number of protein IDs in this particular analysis was signicantly higher than in BruMMA (Figure S6 in the Supporting Information). Out of various possible reasons for this failure, the exclusion caused by the closeness of m/z values to the missed precursor and redundant precursor(s) placed on the exclusion list was the primary culprit.

In summary, the extent of improvements in the number of identied proteins (e.g., 1736% for DvH) and in time savings (e.g., 3 vs 56 iterations) provided by IMMA strategy are in agreement with the results reported by Zerck et al.15 While modest, in the high throughput setting such as used in the tagless strategy for protein complex identication, the gains provided by IMMA quickly accumulate. More extensive peptide separation and higher stringency of tolerances for m/z and RT values are likely to improve the overall outcome of analysis. While further optimization was not attempted here, we note that all the parameters governing MS/MS acquisition and exclusion list generation are adjustable in IMMA software. The natural extension of IMMA is an addition of the inclusion list function to enhance the performance of the software for analysis of a series of similar samples, as is the case of the tagless strategy. Application of inclusion lists will reliably dierentiate between the absence of a protein in a given column fraction and a failure to detect it. It will also provide the consistency of using the same set of peptides to quantify relative concentrations of proteins and the exibility of selecting the best performing peptides. In initial experiments, a signicant number of peptides that were placed on the inclusion list created on the basis of analysis of a set of size exclusion column fractions were successfully identied in neighboring fractions (data not shown). The IMMA algorithm, with its capability to generate inclusion lists, can be applied to any application where the similarity of samples is expected to be signicant and high reproducibility of analysis is required.

CONCLUSIONS We have developed an information-dependent, iterative data acquisition tool for improving MS/MS eciency and proteome
6291
dx.doi.org/10.1021/ac200911v |Anal. Chem. 2011, 83, 62866293

Analytical Chemistry coverage for applications based on the LCMALDI MS/MS platform. IMMA reduces redundancy of MS/MS analyses by excluding from the precursor ion peak lists peptides derived from previously identied proteins. It also increases the utilization rate of MS/MS spectra by removing low-value unidentiable targets. An increase in the number of proteins identied by IMMA vs the data dependent algorithm is sample type dependent and it appears to be higher for more complex samples. A major advantage of IMMA is the ability to perform costbenet analysis on the y and to terminate MS/MS analysis at the point of decreasing returns, e.g., low productivity, without prior knowledge of sample contents and complexity. The inclusion list functionality is currently being developed to support the tagless strategy of protein complex identication where large numbers of similar samples are interrogated to prole protein elution through a multidimensional separation space or any other application that requires analysis of samples of similar composition.

ARTICLE

ASSOCIATED CONTENT
Supporting Information. (1) Overview of lters used by IMMA software; (2) detailed experimental methods; (3) extended discussion on Calibrated Peptide Retention Time Prediction and Design of IMMA software; (4) amino acid and modication retention time indices (Figure S2 and Table S1; (5) IMMA and BruMMA results for 2 UPS1 sample replicates and 3 DvH samples (Tables S2 a,b and Tables S3 ac); (6) examples of MS/MS spectra on nonpeptide ions (Figure S1); a screenshot of the IMMA GUI (Figure S3); (7) a graphic representation of the theoretical false exclusion rates (Figure S4); (8) an example order of MS peak selection in IMMA and BruMMA (Figure S5); and (9) an example of distinct sets of proteins identied only by either IMMA or BruMMA (Figure S6). This material is available free of charge via the Internet at http://pubs.acs.org.
S b

AUTHOR INFORMATION
Corresponding Author

*E-mail: witkowsk@cgl.ucsf.edu.

ACKNOWLEDGMENT H.L. and L.Y. contributed equally to this work. This work conducted by ENIGMA, Ecosystems and Networks Integrated with Genes and Molecular Assemblies, was supported by the Oce of Science, Oce of Biological and Environmental Research of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. The authors thank Dr. Terry Hazen and his co-workers from LBNL for providing DvH biomass and Dr. Sean L. Seymour from AB Sciex for technical support regarding the ProteinPilot search engine and Oracle database. We acknowledge the support of Dr. Fadi Abdi and AB Sciex for providing a 5800 TOF/TOF mass spectrometer for this study. We thank Dr. O. David Sparkman for helpful discussion. We are indebted to the members of the Fisher lab and the UCSF Sandler-Moore Mass Spectrometry Core Facility and especially to Drs. Rich Niles, Simon Allen, Katherine Williams, and Evelin D. Szakal for their support and discussion. REFERENCES
(1) Wolters, D. A.; Washburn, M. P.; Yates, J. R., 3rd Anal. Chem. 2001, 73, 56835690.

(2) Veenstra, T. D.; Conrads, T. P.; Issaq, H. J. Electrophoresis 2004, 25, 12781279. (3) Tabb, D. L.; Vega-Montoto, L.; Rudnick, P. A.; Variyath, A. M.; Ham, A. J.; Bunk, D. M.; Kilpatrick, L. E.; Billheimer, D. D.; Blackman, R. K.; Cardasis, H. L.; Carr, S. A.; Clauser, K. R.; Jae, J. D.; Kowalski, K. A.; Neubert, T. A.; Regnier, F. E.; Schilling, B.; Tegeler, T. J.; Wang, M.; Wang, P.; Whiteaker, J. R.; Zimmerman, L. J.; Fisher, S. J.; Gibson, B. W.; Kinsinger, C. R.; Mesri, M.; Rodriguez, H.; Stein, S. E.; Tempst, P.; Paulovich, A. G.; Liebler, D. C.; Spiegelman, C. J. Proteome Res. 2010, 9, 761776. (4) Frahm, J. L.; Howard, B. E.; Heber, S.; Muddiman, D. C. J. Mass Spectrom. 2006, 41, 281288. (5) Sandhu, C.; Hewel, J. A.; Badis, G.; Talukder, S.; Liu, J.; Hughes, T. R.; Emili, A. J. Proteome Res. 2008, 7, 15291541. (6) Dong, M.; Yang, L. L.; Williams, K.; Fisher, S. J.; Hall, S. C.; Biggin, M. D.; Jin, J.; Witkowska, H. E. J. Proteome Res. 2008, 7, 18361849. (7) Wallace, A.; Ritchie, M. A.; Jones, C.; Leicester, S.; Langridge, J. I. J. Biomol. Techn. 2003, 14, 80; Poster 105-W presented at the ABRF Meeting; Denver, CO, 2003. (8) Benkali, K.; Marquet, P.; Rerolle, J.; Le Meur, Y.; Gastinel, L. BMC Genomics 2008, 9, 541. (9) Hattan, S. J.; Parker, K. C. Anal. Chem. 2006, 78, 79867996. (10) Chen, H. S.; Rejtar, T.; Andreev, V.; Moskovets, E.; Karger, B. L. Anal. Chem. 2005, 77, 78167825. (11) Graber, A.; Juhasz, P. S.; Khainovski, N.; Parker, K. C.; Patterson, D. H.; Martin, S. A. Proteomics 2004, 4, 474489. (12) Gandhi, T.; Fusetti, F.; Wiederhold, E.; Breitling, R.; Poolman, B.; Permentier, H. P. J. Proteome Res. 2010, 9, 59225928. (13) Juhasz, P.; Lynch, M.; Sethuraman, M.; Campbell, J.; Hines, W.; Paniagua, M.; Song, L.; Kulkarni, M.; Adourian, A.; Guo, Y.; Li, X.; Martin, S.; Gordon, N. J. Proteome Res. 2011, 10, 3445. (14) Scherl, A.; Francois, P.; Converset, V.; Bento, M.; Burgess, J. A.; Sanchez, J. C.; Hochstrasser, D. F.; Schrenzel, J.; Corthals, G. L. Proteomics 2004, 4, 917927. (15) Zerck, A.; Nordho, E.; Resemann, A.; Mirgorodskaya, E.; Suckau, D.; Reinert, K.; Lehrach, H.; Gobom, J. J. Proteome Res. 2009, 8, 32393251. (16) Liu, H. C.; Yang, L.; Khainovski, N.; Dong, M.; Allen, S.; Szakal, D.; Hall, S.; Fisher, S.; Biggin, M. D.; Witkowska, H. E.; Jin, J. Proceedings of the 57th ASMS Conference on Mass Spectrometry and Allied Topics, Philadelphia, PA, May 31June 4, 2009. (17) Inczdy, J.; Lengyel, T.; Ure, A. M.; IUPAC Compendium of e Analytical Nomenclature: Denitive Rules 1997 (IUPAC Chemical Nomenclature), 3rd ed.; Blackwell Science: Osney Mead, Oxford; Malden, MA, 1998. (18) Kirchner, M.; Timm, W.; Fong, P.; Wangemann, P.; Steen, H. Bioinformatics 2010, 26, 791797. (19) Lehmann, W. D.; Bohne, A.; von Der Lieth, C. W. J. Mass Spectrom. 2000, 35, 13351341. (20) Mann, M. Proceedings of the 43rd ASMS Conference on Mass Spectrometry and Allied Topics, Atlanta, GA, May 2126, 1995; p 639. (21) Aebersold, R. Nature 2003, 422, 115116. (22) Craig, R.; Cortens, J. P.; Beavis, R. C. Rapid Commun. Mass Spectrom. 2005, 19, 18441850. (23) Kuster, B.; Schirle, M.; Mallick, P.; Aebersold, R. Nat. Rev. Mol. Cell Biol. 2005, 6, 577583. (24) Le Bihan, T.; Robinson, M. D.; Stewart, I. I.; Figeys, D. J. Proteome Res. 2004, 3, 11381148. (25) Meek, J. L. Proc. Natl. Acad. Sci. U.S.A. 1980, 77, 16321636. (26) Petritis, K.; Kangas, L. J.; Ferguson, P. L.; Anderson, G. A.; Pasa-Tolic, L.; Lipton, M. S.; Auberry, K. J.; Strittmatter, E. F.; Shen, Y.; Zhao, R.; Smith, R. D. Anal. Chem. 2003, 75, 10391048. (27) Krokhin, O. V.; Craig, R.; Spicer, V.; Ens, W.; Standing, K. G.; Beavis, R. C.; Wilkins, J. A. Mol. Cell. Proteomics 2004, 3, 908919. (28) Kaliszan, R.; Baczek, T.; Cimochowska, A.; Juszczyk, P.; Wisniewska, K.; Grzonka, Z. Proteomics 2005, 5, 409415.
6292
dx.doi.org/10.1021/ac200911v |Anal. Chem. 2011, 83, 62866293

Analytical Chemistry
(29) Ross, P. L.; Huang, Y. N.; Marchese, J. N.; Williamson, B.; Parker, K.; Hattan, S.; Khainovski, N.; Pillai, S.; Dey, S.; Daniels, S.; Purkayastha, S.; Juhasz, P.; Martin, S.; Bartlet-Jones, M.; He, F.; Jacobson, A.; Pappin, D. J. Mol. Cell. Proteomics 2004, 3, 11541169. (30) Ow, S. Y.; Cardona, T.; Taton, A.; Magnuson, A.; Lindblad, P.; Stensjo, K.; Wright, P. C. J. Proteome Res. 2008, 7, 16151628. (31) Papac, D. I.; Briggs, J. B.; Chin, E. T.; Jones, A. J. Glycobiology 1998, 8, 445454. (32) Basa, L. J.; Katta, V.; Haskins, W. E.; Cochran, P. K. Proceedings of the 53rd ASMS Conference on Mass Spectrometry and Allied Topics, San Antonio, TX, June 59, 2005. (33) Liu, H. C.; Dong, M.; Yang, L.; Allen, S.; Johansen, E.; Hall, S.; Fisher, S.; Hazen, T. C.; Geller, G. T.; Singer, M. E.; Jin, J.; Biggin, M. D.; Witkowska, H. E. Proceedings of the 56th ASMS Conference on Mass Spectrometry and Allied Topics, Denver, CO, June 15, 2008. (34) Liu, H. C.; Dong, M.; Yang, L.; Allen, S.; Szakal, E. D.; Hall, S.; Fisher, S.; Hazen, T. C.; Jin, J.; Biggin, M. D.; Witkowska, H. E. Proceedings of the 58th ASMS Conference on Mass Spectrometry and Allied Topics, Salt Lake City, UT, May 2327, 2010. (35) Shilov, I. V.; Seymour, S. L.; Patel, A. A.; Loboda, A.; Tang, W. H.; Keating, S. P.; Hunter, C. L.; Nuwaysir, L. M.; Schaeer, D. A. Mol. Cell. Proteomics 2007, 6, 16381655. (36) Bradshaw, R. A.; Burlingame, A. L.; Carr, S.; Aebersold, R. Mol. Cell. Proteomics 2006, 5, 787788. (37) Gupta, N.; Pevzner, P. A. J. Proteome Res. 2009, 8, 41734181. (38) Murray, K. K.; Boyd, R. K.; Eberlin, M. N.; Langley, G. J.; Li, L.; Naito, Y. www.iupac.org/reports/provisional/abstract06/murray_prs. pdf, 2006. (39) Mallick, P.; Schirle, M.; Chen, S. S.; Flory, M. R.; Lee, H.; Martin, D.; Ranish, J.; Raught, B.; Schmitt, R.; Werner, T.; Kuster, B.; Aebersold, R. Nat. Biotechnol. 2007, 25, 125131. (40) Sanders, W. S.; Bridges, S. M.; McCarthy, F. M.; Nanduri, B.; Burgess, S. C. BMC Bioinf. 2007, 8 (Suppl. 7), S23. (41) Press, W. H.; Teukolsky, S. A.; Vetterling, W. T.; Flannery, B. P. Numerical Recipes in C, 2nd ed.; Cambridge University Press: New York, 1988.

ARTICLE

NOTE ADDED AFTER ASAP PUBLICATION This paper was published on the Web on July 15, 2011 with the tables and gures missing in the Supporting Information. The corrected version of the Supporting Information was reposted on July 18, 2011.

6293

dx.doi.org/10.1021/ac200911v |Anal. Chem. 2011, 83, 62866293

You might also like