首页   按字顺浏览 期刊浏览 卷期浏览 Tutorial review. Future prospects for the analysis of complex biological systems using ...
Tutorial review. Future prospects for the analysis of complex biological systems using micro-column liquid chromatography–electrospray tandem mass spectrometry

 

作者: John R. Yates,  

 

期刊: Analyst  (RSC Available online 1996)
卷期: Volume 121, issue 7  

页码: 65-76

 

ISSN:0003-2654

 

年代: 1996

 

DOI:10.1039/AN996210065R

 

出版商: RSC

 

数据来源: RSC

 

摘要:

Analyst, July 1996, Vol. 121 (65R-76R) 6SR Tutorial Review Future Prospects for the Analysis of Complex Biological Systems Using Micro-column L i q u id C h r o mat o g rap h y- E I ec t r 0s p ray Ta n d e m Mass Spectrometry John R. Yates, 111, Ashley L. McCormack, Andrew J. Link, David Schieltz, Jimmy Eng and Lara Hays Department of Molecular Biotechnology, School of Medicine, University of Washington, Seattle, WA 98195-7730, USA. E-mail: jyates@u.washington.edu An overview is provided of methods for the study of complex biological processes by using micro-column liquid chromatography-electrospray ionization tandem mass spectrometry. Procedures discussed include electrospray ionization, micro-column liquid chromatography, tandem mass spectrometry, tandem mass spectra data interpretation for peptides, and database searching with mass spectral data.Several problems in immunology are discussed to illustrate this approach. Keywords: Tandem mass spectrometry; database; peptide; protein; electrospray; review Introduction Recent advances in MS have created new technological capabilities with application to the study of peptides and proteins involved in biological processes. These advances have allowed new approaches to be created for the study of complex biological processes such as antigen presentation in cells of the immune system. The study of these processes by MS is greatly facilitated by the information produced through whole genome sequence analysis and large-scale DNA sequencing. Complete genomic analysis of organisms is creating sequence infra- structures that make the study of biochemical and physiological processes more straightforward.In the near future the sequence for the human genome2 and for several complex model organisms (C. elegans, D. melanogaster, S . cerevisiae, E. coli, etc.) will be available. In fact, a large number of partial gene sequences (EST) are already available.3-6 Complete genome sequences for the bacteria, Haemophilus injluenzae and Myco- plasma genitalium, have recently been completed.7.8 These model organisms will serve as experimental systems to study prokaryotic and eukaryotic cell biology, cell differentiation and developmental biology. The information derived from genome sequences will not alter the need for protein biochemistry and analysis, but shift the emphasis from the arduous task of primary sequence determination to identifying the functions, structure and regulation of each gene product.Whole genome sequencing efforts will have a major impact on our under- standing of biology and physiology and will increase the importance of protein analytical chemistry. Protein Structure The synthesis of proteins in organisms starts by transcription of the DNA sequences to messenger RNA (mRNA). In eukaryotic organisms mRNA is processed to remove introns and is then translated on the ribosomal complex to synthesize the protein. Proteins consist of various numbers of some 20 different amino acids covalently linked by amide bonds to form linear heteropolymers. The polypeptide backbone has a repeating mass of 56 u with the side chains of each amino acids contributing masses from 1 to 130 u (Table 1).A variety of molecular entries constitute the side chain groups and each contribute to the rich chemical diversity inherent in protein structure. After translation, proteins can undergo a wide variety of modifications ranging from proteolytic processing to cova- lent modification before becoming functional.9 An active enzyme may consist of multiple sub-units derived from different genes or be derived from one gene with subsequent processing into two or more protein sub-units. Key elements of control and regulation of biological processes are post- translational covalent modifications. Which amino acid resi- dues in a protein sequence are actually modified is not directly encoded in the genome.Furthermore, it is estimated that at least 1000 kinases exist in the human genome, indicating phosphor- ylation, one form of covalent modification that is a common mechanism for the transmission of signals and control of enzyme activation. 10 In a biological study the goal is to identify the functional elements involved in a process. In the near future this will involve correlating the sequence of proteins observed in a process to genomic sequence information contained in the databases. Following identification, the covalent modifications involved in regulation of the pathway would be determined. Mass spectrometry is poised to play a large role in both the identification of gene products and covalent modifications. Quadrupole MS One of the most common types of mass spectrometer is based on the quadrupole mass filter.Mass separation is achieved by establishing an electric field in which ions of a certain mass-to- charge ratio (mlz) have stable trajectories through the field. The electric fields are created by placing a dc voltage and an oscillating voltage (ac voltage at rf frequencies) on the four metal quadrupole rods with adjacent rods having opposite polarity. Ions introduced into the mass analyser spiral down the66R Analyst, July 1996, Vol. 121 centre of the quadrupoles. By increasing the magnitude of the dc and rf voltages while maintaining an appropriate dc-to-rf ratio, stable trajectories are created for ions of increasing m/z. Mass resolution is dependent on the number of rf cycles an ion spends in the field.The more cycles an ion undergoes, however, the lower the ion transmission and the greater the loss of ions at the selected m/z. The mass filtering effect of quadrupoles can be viewed as a separation process. By coupling quadrupole mass filters together, a powerful approach for structural analysis can be created. By placing a reaction region such as a collision cell filled with a neutral, inert gas between the two quadrupoles ions can be dissociated to obtain structural information. Typically, an enclosed quadrupole mass filter operated with a dc voltage on the rods functions as a collision cell. Under these conditions all ions above a set mass value are focused through the collision cell. By raising the gas pressure to a level that permits multiple ~ ~~~ Table 1 Single and three-letter designations are given for amino acid residues along with the average chemical and monoisotopic mass.The sites of cleavage along the peptide backbone to form b- and y-type ions are given at the bottom of the table Single Amino acid letter Alanine A Serine S Proline P Valine V Threonine T Leucine L Isoleucine I Glycine G Cysteine C Asparagine N Aspartic acid D Glutamine Q Lysine K Glutamic acid E Methionine M Histidine H Phenylalanine F Arginine R Tyrosine Y Tryptophan W Carboxymethyl cystine C Three- letter code Ala Ser Pro V a1 Thr Leu Ile Asn ASP Gln LYS Glu Met His Phe Cmc GlY CYS Arg TYr Trp Average mass 57.05 1 7 1.078 87.078 97.116 99.133 10 1.105 103.139 113.159 113.159 114.147 115.089 128.131 128.174 129.116 131.193 137.141 147.177 156.188 161.176 163.176 186.2 13 Mono- isotopic mass 57.021 7 1.037 87.032 97.052 99.068 101.048 103.009 1 13.084 1 13.084 114.079 115.027 128.059 128.095 129.043 131.041 137.059 147.068 156.101 161.015 163.063 186.079 ,“Hn+ Sheath Liquid collisions in the Ela,, range of 10-40 eV, ions undergo many low-energy collisions in a short time-frame and become sufficiently activated to fragment.The principle benefit of a quadrupole collision cell is the ability to re-focus ions scattered by collision with the neutral gas. The m/z values of dissociation products are then measured in the second mass analyser. Collision-induced dissociation (CID) experiments allow the structure of ions to be determined by fragmenting bonds within the ion. For the study of peptides, the CID process cleaves primarily at amide bonds to produce sequence-specific frag- mentation.1 LC Electrospray Ionization and Micro-column LC-MS Mass spectrometers measure m/z values of ions. For this process to occur molecules must be ionized and in the gas phase. The creation of gas phase ions from polar or charged solution phase molecules such as peptides and proteins requires overcoming significant energy barriers without fragmentation or pyrolysis of the ions. A significant advance in MS has been the development of atmospheric pressure ionization techniques (APT, e.g., electrospray, ion spray, pneumatically assisted electrospray) to create gas phase ions from biomolecules. More importantly, the work of Wong et al.12 showed that the ions produced in the electrospray process were multiply charged.A practical consequence of multiple charging is the ability to measure the m/z values of large ions using mass spectrometers of limited mass range making the technique well suited for the analysis of peptides and proteins. Studies employing peptides and proteins sparked immediate interest in the technique leading to the development of new approaches for the analysis of biomolecules. Excellent reviews on the subject of electrospray ionization and its use for the analysis of peptides and proteins have appeared. 13-16 Electrospray ionization results when a potential is applied to a liquid flowing from a sample capillary. Shown in Fig. 1 is a generalized apparatus for electrospray ionization. As the liquid leaves the sample capillary, a fine mist of small droplets containing the analyte is created.The droplets migrate towards an opening in the mass spectrometer where they must be desolvated to produce ions for mass analysis. This desolvation process usually requires some form of heat or energy. In Fig. 1 the entrance to the mass spectrometer is a heated capillary that desolvates ions as they traverse the length of the capillary ( = 10 cm). As the ions exit the heated capillary into a lower pressure region they are focused by a tube lens to a skimmer and then into the mass analyser. Refinement of the electrospray process achieved the long sought after goal of a robust integration of LC with MS making possible the use of chromatographic columns of many different diameters, and thus flow rates.17 For most studies of peptides and proteins the low flow rates produced by using smaller diameter columns ( < 1 mm) offer the best sensitivity of analysis.Low flow rates decrease the elution volume of the peptides and hence increase the concentration of Sheath Gas Fig. 1 Generalized apparatus €or electrospray ionization. A fused silica micro-column or transfer line is inserted into the central tube to deliver the analyte to the tip of the apparatus. Sheath liquid, when required, flows around the fused silica column. The sheath gas flows concentrically around the exit tube and can be used to stabilize and direct the electrospray.Analyst, July 1996, Vol. 121 67R the analyte exiting the column to the detector. Griffin and co- workers18.19 and Wahl et a1.20 noted an increase in sensitivity associated with a decrease in flow rate with separations on small diameter columns.Fig. 2 shows the analysis of 10 fmol of rat cytochrome c on a 50 pm packed capillary column at a flow rate of approximately 50 nl min-1. Infusion experiments can also benefit from a decrease in flow rate, but can suffer from A. Base Peak Protein \ I 6. 400 600 Scan Number 200 '0°1 1 1 0 0 0 7 > 5 0 0 12000 1 2 6 0 0 40 20 600 800 1000 m/= 1200 1 4 0 0 Fig. 2 A, Reconstructed total ion chromatogram showing the ion current intensity of the most abundant ion at each scan for reversed-phase, packed, micro-column chromatography with a 50 pm X 15 cm column. The peak marked protein in the figure represents the signal obtained from 10 fmol of cytochrome c.B, Mass spectrum for the 10 fmol of cytochrome c. Inset is the mass calculated by deconvolution of the multiply charged mass spectrum. dynamic range limitations in the acquisition of full scan mass spectra particularly if numerous components are present at different am0unts.~1-~~ Thus, for complex mixtures of mole- cules fractionation of the mixture is desirable. In order to achieve the low rates that produce the best sensitivity of analysis, micro-column LC offers a favourable trade-off between performance, flow rate and ease of column construc- tion. Micro-columns can be easily prepared from fused-silica capillary tubing in the id range of 50-320 pm. Methods for the construction of micro-columns are described by Shelley et ~ 1 . ~ 5 and Kennedy and Jorgenson.26 Interfacing micro-column LC to electrospray ionization is straightforward. The column is directly inserted into the metal electrospray needle (Fig.1). For low flow rate chromatography a sheath liquid is used to stabilize the electrospray over the course of the LC solvent gradient and to make electrical contact with the liquid exiting the non-conducting fused-silica column. A voltage of 2-4.5 kV is placed on the electrospray needle. A lower voltage is required for lower flow rates and the needle is positioned closer to the entrance to the mass spectrometer which also serves as a ground (0.5-1.5 cm). A sheath gas is also used to stabilize both the electrospray and the formation of ions producing a stable ion current in the mass spectrometer.Electrospray ionization is compatible with 0.1 % trifluoroacetate buffers, the most commonly used reversed-phase HPLC gradient, but better sensitivities are observed with peptides by utilizing solvents containing 0.5% acetic acid. Typically, peptides are eluted from the reversed-phase column by using a linear gradient of 5 to 90% solvent B (80 + 20 acetonitrile-0.5% acetic acid) over a 30-50 min period. In order to achieve a flow rate suitable for micro-column LC, the solvent from the HPLC pumps is split pre-column to produce a final flow rate through the column of 0.3-2 pl min-1. When working with small bore columns and small amounts of peptide, conventional sample injection loops can lead to unacceptable losses. In order to circumvent this problem, samples can be pneumatically injected directly onto the column by using a high-pressure device (Fig.3). By collecting the effluent displaced from the column with a 5 pl graduated glass capillary, the amount of solution injected onto the column is measured. These methods for HPLC are relatively generic and provide a basis for tailoring methods to specific biological problems. For example, some biological studies may require collection of a portion of material separated on the HPLC column for other analyses apart from MS. A method developed by Cox et ~ 1 . ~ 7 allows the correlation of peptide m/z values with activity in biological assays during low +==: Micro-capillary column f Pressure Vessel Helium - Vial of Sample Solution Electrospray Ionization Source Fig. 3 High-pressure apparatus for packing micro-columns. The progress of packing can be observed by placing the column under the microscope during the packing process.68R Analyst, July 1996, Vol.121 flow rate micro-column LC-MS. As shown in Fig. 4 a microsplitter is used to divert nanolitre volumes of the column effluent to a microtitre plate and the remaining sample to the mass spectrometer. A balance between the sensitivity of the biological assay and the mass spectrometer is achieved by adjusting the split ratio. By depositing aliquots from the split into wells at specific time intervals the biological activity can be correlated with m/z value. In the experiments of Cox et al. the ability of the peptides present in each well to cause lysis of T-2 cells by cytotoxic T-lymphocytes was determined by measuring the release of "Cr.By calibrating the time difference when a peptide appears in the microtitre plate to when it enters the mass spectrometer, accurate correlation of m / z values with activity can be achieved. This method has been effective for the identification of peptides bound to class I major histocompat- ibility molecules (MHC) with specific antigenic activity such as tumour antigens.27 LC-Tandem Mass Spectrometry Tandem mass spectrometry (MS-MS) permits the selection of ions for CID experiments to generate structurally important fragment ions. An approach based on this method has been developed for the sequence analysis of peptides. In conjunc- tion with LC, mixtures of molecules can be separated and then one or many components characterized on-line by MS-MS .2S Two approaches are typically used to acquire tandem mass spectra of peptides.The first involves pre-measurement of the nz/z values for all the components in a mixture. These data are then analysed to identify candidates for MS-MS. A list is made of these values and the order in which they elute from the chromatographic column. The instrument is then configured to select each m/z value in the first mass analyser as it comes off the column. After a signal has been recorded for a particular nz/z value, the MS-MS parameters are then changed for the next ion. The disadvantage to this approach is that the ions are expected in some order; if this changes or an ion fails to elute, this may confuse the order and all other ions could be missed.Secondly, the ability to acquire tandem mass spectra for large numbers of nz/z values is limited by the ability to change the MS-MS parameters manually. If changes in parameters are dependent on the operator entering these into the computer during the analysis, this limits the ability to acquire tandem mass spectra on closely eluting peaks. A second method, made possible by computer control of the instrument, allows both MS and MS- MS data to be acquired in a single analysis. Automated LC-MS-MS Mass spectrometers can be operated through instrument control languages, similar in concept to programming languages, that allow precise and rapid control of the instrument. By combining computer control with feedback from data acquisition, com- mands can be combined in a computer program to control the acquisition of data and the response of the instrument to various conditions.29-31 This improves the flexibility and efficiency of data acquisition allowing large numbers of tandem mass spectra to be acquired, for example.A typical computer program for the acquisition of tandem mass spectra would perform the follow- ing. Step 1 : a scan of mass analyser two to record the m/z values of all ions. Step 2: search the mass spectrum for an m/z value using some criteria (e.g., the most abundant ion, second most abundant ion, m/z value present in a pre-defined list, m/z value not present in a pre-defined list, etc.) Step 3: MS-MS conditions for acquisition are set based on m/z value selected in step 2 (collision energy, mass range of second mass analyser, etc.).The instrument switches to the precursor ion MS-MS mode of scanning setting the m/z value to pass through the first mass aiialyser and calculating the collision energy. Any other tune parameters that need to be changed would be done so at this point. Step 4: four or five product ion MS-MS scans are acquired. Step 5: the instrument is reset to scan the second mass analyser to record nz/z values. A tandem mass spectrum will be acquired whenever an ion meeting the criteria of step 2 is identified and is above a preset threshold. The collision cell is filled with gas during the entire analysis to minimize the time needed to fill and evacuate the cell when the instrument switches between modes. This results in a slight decrease in the ion current recorded for peptides transmitted through the collision cell, but does not result in any undue fragmentation since the collision energy is low.This method allows MS-MS data to be acquired at a faster rate and permits more efficient analysis of complex mixtures. Fragmentation of Peptides in Low Energy CID Processes CID of peptides using a triple quadrupole mass spectrometer has been studied extensively. 1 1 Under multiple, low energy collision conditions (10-50 eV), peptides fragment primarily at the amide bonds to produce a ladder of sequence ions." Depending on the gas-phase basicity of the amino acids within the sequence, the charge can be retained on the amino terminus of the ion to form an acylium ion (type b-ion, NH2-CHR1- CO...NHCHR,,CO+) or, by H-rearrangement, on the carboxy Syringe pump Sheath Gas Nitrogen Fig.4 splitter to transfer a portion of the material to a 96-well microtitre plate is shown in the inset. Micro-column HPLC using a micro-splitter to correlate biological activity with mlz information. A blow up of the conceptual depiction of a micro-Analyst, July 1996, Vol. 121 69R terminus (type y-ion, NH2-CHR,,-C0..-NHCHR1-C02H + H+) of the ion (Table 1). The value of R depends on the amino acid and ranges from 1 to 131 u for Gly and Trp, respectively. A complete series of one type of fragment ion, or a complete series made of ions from both ion types, allows the amino acid sequence to be determined by subtraction of the masses of adjacent sequence ions. The major fragment ions likely to be produced in the collision activation process can be predicted for a known amino acid ~ e q u e n c e .~ ~ - ~ ~ Data Interpretation Peptides created by trypsin proteolysis and ionized by electro- spray generally form ions that are doubly charged. This stems from the presence of basic sites at the N-terminus, the a-amino group, and the basic amino acids at the C-terminus, Lys or Arg. Most of the fragment ions produced by CID of doubly charged ions are singly charged. In general, a ladder of sequence ions is produced where the difference between consecutive ions indicates the mass of the amino acid at that position in the sequence. However, a variety of ion types can be produced in the fragmentation process such that ions that retain charge on the C-terminus represent a different portion of the amino acid sequence than ions which retain the charge on the N-terminus.Neutral losses (-H20, -NH3) from fragment ions result in lower intensity ions that accompany the major fragment ions. Successful interpretation involves determining which ions originate from the N- or C-terminus so that mass differences between consecutive ions of the same type can be calculated. Thus, a set of sequence ions, from low to high mass will define the amino acid sequence from the C- to the N-terminus. As an example of the approach used to obtain sequence information using MS-MS, interpretation of a tandem mass spectrum obtained from a tryptic digest of the immunoaffinity isolated Ras protein from S. cerevisiae will be illustrated.Fig. 5 shows the tandem mass spectrum for the peptide and the sequence deduced from the spectrum. The precursor ion at m/z 813.9 is assumed to be a doubly charged ion because the peptide is derived from a tryptic digest and the observation of a fragment ion in the spectrum 9 u (M + 2H - H20)2+ below the precursor ion. On the basis of this information the (M + H)+ value is calculated to be 1626.8. MS-MS analyses of peptides derived from a tryptic digest of proteins generally present a prominent y-type ion series in the high mass end of the spectrum and Lys or Arg as the C-terminal amino acid. These amino acids can be recognized by yl-type ions at m/z 147 or 175. An ion at m/z 175 is observed in this tandem mass spectrum. Subtraction of 174 from the (M + H)+ ion, 1626.8, leaves a value of 1452.8.A very weak ion is observed at this m/z value and this constitutes the highest b-ion value. The adjacent m/z value in the spectrum is 1426. The mass difference between this ion and the ion at nz/z 1452 is too small (Am = 26 u) to be associated with the b-ion series. A logical assumption is that the ion at nz/z 1426 derives from the y-ion series. By calculating Am for all the abundant m/z values in the high mass end of the spectrum a sequence of Lxx-Asn-Val can be determined. Two ions are observed at m/z 986 and 971 and both have mass differences corresponding to amino acids (Asn and Glu, respectively) when subtracted from the ion at m/z 1100. The correct choice can be determined by considering extension of both sequences to the next possible sequence ion, the ion at m/z 843 (Fig.6A). Subtraction of 843 from 986 produces a difference of 143 (see Table 1) while the difference between 843 and 97 1 corresponds to Glu (A 129). A sequence of Lxx-Asn-Val-Glu-Glu can now be deduced. The next frag- ment ion at m/z 77 1 gives an amino acid Ala. Another cluster of ions exists at m/z 624, 639 and 652 (Fig. 6B). Subtraction of each of these ions produces values of 147, 132 and 119. A difference of 147 corresponds to Phe and extends the sequence to Lxx-Asn-Val-Glu-Glu-Ala-Phe. In this low m/z region of the tandem mass spectrum the number of ions present increases and the clarity of the sequence ion series can be clouded. The next likely sequence ion exists in a set of ions beginning at m/z 525 and ending at m/z 460.Subtraction of the ions at m/z 525, 5 10 and 460 from 624 produces mass differences of 99, 114 and 163 (Fig. 6B). All of these differences correspond to masses of amino acids. In order to determine the correct sequence, all of these possibilities should be considered and carried through until a sequence is found that matches the observed relative molecular mass of the peptide. In order to simplify the process, the ion at m/z 460, corresponding to Tyr, will be used to extend the sequence. The next ion must exist in a window of m/z values from 403 to 274 [differences for Gly (57) to Trp (186)]. In this window there are numerous possibilities and each should be considered. The ion at m/z 359 corresponds to a difference of 1291 200-2 3134 4275 5266 6 E 7 7848 8553 10031 11663 12674 13805 1451 6 16078 Gln Ala Ile Asn Val Glu Glu Ala Phe Tyr Thr Leu Ala Arg 1625.8 14972 14266 1313.5 1193.4 1100.2 971 1 I 1x5 62 4 1 0.3 20 842.0 7709 63.7 4606 K59.4 2463 1752 %5 1 1 0 0 .7 1317.5 S 7 1 .O I , 1200.0 I m/z Fig. 5 Collision-induced dissociation mass spectrum recorded on the (M + 2H)2+ ions at m/z 813.9 of a peptide derived from the Ras protein of S. cerevisiae. Fragment ions of type b and y, having the general formulae H(NHCHRCO),?+ and H2(NHCHRCO),,0H+, respectively, are shown above and below the amino acid sequence.70R Analyst, July 1996, Vol. 121 101 or Thr and leaves a difference of 184 when the C-terminal fragment ion of 175 is subtracted. This corresponds to several amino acid combinations, and ions for the sequence Lxx-Ala are observed in the tandem mass spectrum. The sequence of Lxx-Asn-Val-Glu-Glu-Ala-Phe-Tyr-Thr-Leu-Ala-Arg has an (M + H)+ value of 1426 and this leaves a difference of 200 u from the observed (M + H)+ value.Of the amino acid possibilities that add to 200 none can be unambiguously assigned to ions remaining in the spectrum. Several methods are often used to finish or confirm the unknown sequence of a peptide.l1,35,36 One technique is to derivatize the peptide and obtain a tandem mass spectrum of the derivatized peptide. For a given amino acid sequence, the shift in m/z value for the precursor ion as well as the fragment ions should be predictable. In addition, a shift in the mass of an ion or a series of ions due to derivatization can provide information about whether an ion or ions originate from the N- or C- terminus.This can assist in the interpretation process since the m/z values should shift in a predictable manner if the amino acid assignments are correct. Two convenient methods are esterifica- tion of carboxylic acids and N-acetylation of amines. Methyl esterification adds 14 u to every carboxylic acid in the peptide. If there are no acidic amino acids in the sequence the peptide m/z value should shift by 14 u, corresponding to esterification of the C-terminal carboxylic acid. Because of this shift in mass at the C-terminus all the y-type ions will also shift by 14 u. Additional 14 u increases in mass indicate the presence of the amino acids containing Glu, Asp or S-carboxymethyl Cys. Acetylation of an unblocked N-terminus shifts the m/z values of the peptide and the b-ion series by 42 u.Any additional 42 u increments indicate the presence of Lys in the peptide. If the sequence assignments still contain ambiguities a cycle of mlz 509.9 B. Fig. 6 A, Expansion of the mlz region from 750-1500 of the collision- induced dissociation mass spectrum recorded on the (M + 2H)2+ ions at mlz 813.9 of a peptide derived from the Ras protein of S. cerevisiae. B, Expansion of the mlz region from 75-675. The amino acids represented by mass differences between ions are shown in the figure. The correct series of amino acids are shown in bold face type. subtractive Edman degradation can be performed to determine the N-terminal amino acid residue. String searches or basic local alignment searches of proteins in databases can also assist in the interpretation of tandem mass spectra by aligning the sequence with known sequences. The missing sequence can sometimes be assigned by correspondence to a similar sequence if it is conserved in the missing region.Alternatively, if the peptide possesses biological activity, the different possibilities can be synthesized and the activity tested. The last two amino acids in this peptide were assigned by correspondence to the sequence of the Ras protein from S. cerevisiae. Analysis of MS Data by Using Known Sequences Sequencing of the human genome and the genomes of various organisms is proceeding at a rapid rate.37-42 As the collection of data continues to grow it is increasingly important to screen new data through the database to prevent duplication of sequence analysis efforts. In addition, the sequence of a protein may have been previously determined, but the experimental context in which the information is re-discovered may be relevant to the biological process under The growing sequence infrastructure also provides a convenient resource to analyse or interpret MS data.Two different types of MS information can be correlated to sequence databases to assist in protein identification and spectral interpretation. Correlation by Peptide Mass Maps From Proteins The DNA sequencing methods of Sanger et al.43 and Maxam and Gilbert44 quickly supplanted protein sequencing as the principal method for the determination of protein sequences. By sequencing the gene for a protein and translating the nucleotide sequence, a putative sequence for a protein can be obtained.Frequently, the gene product undergoes processing and mod- ification before becoming fully active, Early efforts at gene sequencing were often accompanied by characterization of the protein, usually by MS.45746 By digesting the protein with site- specific enzymes the calculated masses of the predicted products from the gene sequence can be compared with the observed ma~ses.453~6 Modified peptides and sequence or translation errors can be located with this method. This method of peptide mass mapping was later extended to create an approach for the identification of ‘unknown’ pr0teim~7-5~ Mass values obtained from a digested ‘unknown’ protein can be used to find other protein sequences that would produce the same set of masses under the same digestion conditions. Matching the peptide map from a protein of ‘unknown’ amino acid sequence to a known protein sequence can indicate a high probability of protein identification. Computer programs designed to perform searches of data- bases using peptide mass data generated by electrospray ionization MS and matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOFMS) have been described.48-52 Accurate protein identifications can be made using mass tolerances as large as 5 u and as little as 7% of the total protein mass.48 The sensitivity of this technique is sufficient to distinguish among members of highly similar protein families, such as cytochrome c, where small differences in amino acid sequences e ~ i s t .~ 8 The peptide (M + H)+ values observed for cytochrome c proteins derived from different species are shown in Table 2. Inspection of the values shows the diversity of these values even though these proteins have highly similar sequences. The presence of post-translational modifica- tions changes the mass of only those peptides containing modifications. Henzel et al.49 combined peptide mapping and database searching for the identification of proteins isolated from two-dimensional gel electrophoresis. If no match or a poor match is found, additional sequencing experiments must beAnalyst, July 1996, Vol. 121 71R performed to characterize the peptides more fully. This approach is restricted to the digestion products of a fairly homogeneous protein sample and cannot be readily used with the mass generated by a single peptide without additional sequence information.Correlation With MS-MS Data From Peptides Just as a mass map produces a fingerprint for an amino acid sequence of a protein a tandem mass spectrum produces a highly specific representation of a peptide’s amino acid sequen~e.3~,533~~ As observed above, the complexity of a tandem mass spectrum can make interpretation a time- consuming process. Eng et al.53 have developed an approach to search sequence databases directly by using tandem mass spectra of peptides. This method is based on the predictable nature of peptide fragmentation under CID conditions and takes the form of a pseudo-mass spectral library search (Fig.7). A search can be performed by considering all sequences, or just sequences defined by other constraints such as the proteolytic specificity of an enzyme, or a partial amino acid sequence or composition. The fragment ions for sequences fitting the above criteria are then predicted and compared with those present in the tandem mass spectrum. A preliminary score is generated for each sequence and the top 500 candidate peptide sequences are ranked and stored. A final analysis of the 500 amino acid sequences is performed using a correlation function. Using this function a reconstructed tandem mass spectrum for each candidate amino acid sequence is compared with the modified experiment tandem mass spectrum. The final results are ranked on the basis of a normalized correlation coefficient. No interaction is required with this software; thus analyses can be fully automated. A search of a database with roughly 100000 protein sequences requires approximately 2-3 min per spectrum on a DecStation 3000/900 computer.De novo Computer Interpretation of Low Energy CID Spectra The enormous throughput of tandem mass spectrometers can create a huge data interpretation backlog. By searching databases with tandem mass spectra these data can be compared with all known sequences and if the sequence is found no further effort is required. If a similar sequence exists in the database, then that sequence could be used as a guide for interpretation. Those tandem mass spectra that do not match can be interrogated manually or with de novo computer interpretation algorithms. Many of the de novo interpretation algorithms utilize a combinatorial approach to analyse the fragmentation patterns Table 2 The (M + H)+ values predicted for trypsin digestion for cytochrome c proteins from 12 different species.The number in the table headings indicate the residue numbers for each of the tryptic peptides Species 9-13 14-22 28-38 40-53 56-72 92-99 Arabian camel Chimpanzee Gray whale Hippopotamus Honey bee Pacific lamprey House mouse Ostrich Domestic rabbit Spider monkey Skipjack tuna Dog 634 1019 1169 1459 2011 651 1035 1169 1429 2009 634 1019 1169 1457 2011 634 1019 1169 1459 2011 634 1019 1169 1473 2011 634 1234 1195 1473 1383 620 1035 1119 1457 2051 634 1019 1169 1431 1997 634 1035 1147 1489 1997 634 1019 1169 1459 1997 651 1035 1169 1475 2009 622 1247 1216 1505 2051 907 808 907 907 907 322 836 907 907 907 907 950 contained in a tandem mass spectrum.The first algorithms designed for de novo interpretation were written to analyse fast atom bombardment mass spectral data.55 These mass spectra usually contain low abundance fragment ions. An approach considering all possible sequences that match the fragmentation pattern was used. The total number of possibilities is delimited by stepping through the spectra one or two amino acids at a time. Starting at the C-terminus, amino acid masses are subtracted from the (M + H)+ ion to calculate the y- and b-type ion that should be present if that amino acid exists at that position in the sequence.A score is calculated based, in part, on the abundances of sequence ions corresponding to the presence of that amino acid. After the first cycle a ranked list of 20 amino acids will be obtained. After a second iteration, where each amino acid in the list is extended by an additional amino acid, the fragment ions calculated and the mass spectrum searched, a list of 400 partial sequences is obtained. This list can be ranked and only those sequences with a non-zero score or some other cut-off criteria can be used in the next iteration. The final sequences must match the measured (M + H)+ value and good mass accuracy helps to limit the number of final possibilities. This approach has been extended to CID spectra produced under high and low energy conditions and can be useful with high quality data.56-58 Applications of LC-MS-MS Sequencing to Mixtures of Proteins and Peptides The ability to acquire and then analyse large numbers of tandem mass spectra rapidly by using the growing sequence databases allows the analysis of increasingly complex problems.Combin- ing a chromatographic separation with MS creates a highly HPLC BuPferA G r B I Automated LC/MS/MS 3 I L Protein Nucleotide .I Predicted Tandem Mass Spectra Cross-Correlation Comparison I \ Tandem Mass Spectra 1 Endase EEALDLIVDAIK 2 PyrUvate kinas8 NPNNELTTEK 3. kXdcinaW IEODPFVFLEDTODIFQK 4. Hwusine APEGELGDSLQTAFDEGK 5. HMH1 0 OAFDDAIAELDTLSEESYK * 0 Fig. 7 A depiction of automated tandem mass spectrometry in conjunction with computer antomated searches of sequence databases.Tandem mass spectra are predicted from the sequences retrieved from the database and compared with the observed tandem mass spectrum by using a cross- correlation analysis.72R Analyst, July 1996, Vol. 121 resolving analytical method for complex mixtures of molecules. This allows consecutive separations based first on chemical properties and second by m/z. Additionally, if the analysis is performed on a tandem mass spectrometer, then structural information can be obtained on components without the necessity of purifying each individually. By using automated MS-MS a large number of spectra can be efficiently acquired providing a more complete analysis of the components in the mixture. If the mixture consists of proteolytically digested proteins, the identities of the components can be determined by comparing tandem mass spectra with sequences in the database and identifying the proteins from which the amino acid sequences originated. Furthermore, the separation and mass measurement process produces a highly specific fingerprint for the mixture of molecules.Thus, complex mixtures can be compared to identify differences. As the limit of detection improves in mass spectrometers, increasingly smaller amounts of peptides and proteins can be used. Currently, reports of sample amounts used to obtain sequence information on peptides by using MS-MS (5-30 fmol) are well below those reported for conventional sequenc- ing methods such as Edman degradation (0.5-10 pmol)? As the amount of sample is decreased, methods for manipulating the material have become inadequate resulting in losses of sample to nonspecific interactions with plastic and metal surfaces.Rather than attempting to improve sample handling for small amounts of purified peptides or proteins, analysing mixtures enriched for a common property may increase the chances of acquiring sequence information. By directly analys- ing complex mixtures employing LC-MS and LC-MS-MS, the number of sample handling and manipulation steps is reduced. All sample transfers or manipulations are performed at stages where losses may be minimal because the large amount of peptide present acts as a carrier. The examples described below illustrate applications of this approach. Sequence Analysis of Complex Mixtures of Peptides Sequences of peptides hound to class II MHC molecules associated with rheumatoid arthritis Rheumatoid arthritis is a chronic inflammatory disease of the bone joints.One current model of etiology for the disease presupposes exposure of a genetically susceptible individual to an environmental stimulus that results in an unregulated autoimmune response to tissue in the joints. A genetic linkage is believed since many individuals with the disease have inherited certain MHC class I1 DR4 alleles, notably HLA DRBl"0401 and HLA DRBl"0404. The two gene products differ by a conservative substitution of K to R at position 71 which is located in the binding groove of the MHC molecule. Although binding motifs have not been fully defined for all class I1 alleles each appears to have a specific core structural motif.By sequencing the naturally bound peptides to these different alleles, information about the unique sequence motifs involved in allele-specific binding is obtained. The MHC molecules can be isolated by using immunoaffinity chromatography with antibodies reactive against the sequence of the MHC molecule. During this process antigenic peptides remain bound in the binding cleft of the MHC molecules. As a consequence, the antigenic peptides are enriched relative to the other components of the cell. Cells containing HLA-DRB"0401 or HLA-DRB"O404 alleles were prepared and treated as described by Yates et ~ 1 . 6 0 Peptides were collected from the isolated MHC molecules and fractionated by subjecting the mixture to reversed-phase HPLC. Fractions were then analysed by LC-MS-MS as previously described.60 Mass spectrometric analysis of the peptides contained in the HPLC fractions revealed that there were as many as 10-50 peptides present in some of the fractions.Many of the peptides exist as nested sequences around a core structure. Thus, purification of each peptide to homogeneity would be difficult and would lead to substantial losses of material. Shown in Fig. 8A and B are the LC-MS trace and mass spectrum for one of the reversed-phase HPLC fractions. MS-MS was then used to obtain sequence information from peptides present in the mixture without subsequent isolation. Sequences for some of the peptides represented in the tandem mass spectra were determined by using the database searching algorithm of Eng et al.s3 A listing of the peptide sequences identified from two different cell lines is shown in Table 3.Most of the peptides identified from HLA DRB"0404 and HLA DRB*O401 are nested sets of similar endogenous peptides originating from Class I HLA A and B alleles. Not all searches of the protein database have led to successful matches. The tandem mass spectrum generated from an (M + 2H)2- ion at m/z 817 (Fig. 9) was used to search the Owl non-redundant protein database. No acceptable sequence was found. A search of nucleotide databases using the approach described by Yates et al.54 yielded a match to a translation of a nucleotide sequence, labelled human mRNA from ORF (open reading frame), whose gene product has not been isolated or characterized. An important point is that a large percentage of new sequences are being generated by DNA sequencing, particularly of expressed genes, and there may be a lag in the time between when the nucleotide sequence is deposited in the database and when a translation appears in a protein database.Thus, it is necessary to search all the possible sequence information before attempting to interpret 1 5 7 0 5 Scan Number 705.7 B. I + ~ ~ ~ > ~ ~ ~ 1 0 0 W 3 622.5 0 Fig. 8 A, Reconstructed total ion chromatogram showing the ion current intensity of the most abundant ion at each scan for a micro-column LC-MS analysis of class I1 MHC peptides obtained from a single peak by analytical reversed-phase HPLC. B, A mass spectrum created by adding scans between 540 and 560. Approximately 30 peptide ions are observed in this mass spectrum.Analyst, July 1996, Vol.121 73R a tandem mass spectrum manually. A second point is that by employing LC-MS-MS in conjunction with database searching a large number of sequences can be quickly identified, a feature that is particularly powerful for the analysis of peptides bound to class 11 MHC molecules. Comparative Analysis of Complex Mixtures of Peptides MHC class II antigen presentation mutants Studies of pathways are complicated by a large number of participants and an often complex series of events. The Table 3 Peptides identified by micro-column LC-MS-MS isolated from human class I1 DR4 molecules Allele DRB 1 *040 1 DRB 1*0404 Sequence DTQFVRFDSFAASQRMEP DTQFVRFDSFAASQRM*EP DDTQFVRFDSFAASQRM*EPR VDDTQFVRFDSFAASQRMEPR VDDTQFVRFDSFAASQRM*EPR DXRS WTAADTAAQXTQ DXRS WTAADTAAQXTQR DXRS WTAADTAAQXSQ DXRS WTAADTAAQXSQR LPSYEEALSLPSKTP LPSYEEALSLPSKTPE VLPSYEEALSLPSKTPE SHSMRYFHTAMSRP SHSMRYFHTAMSRPG GSHSMRYFHTAMSRPG GSHSMYYFHTAMSRPGRG SHSMRYFYTAVSRP SHSMRYFYTAVSRPG SHSMRYFYTAVSRPGRG GSHSMRYFYTAVSRPG SHSMRYFYTAVSRPGRG Protein MHC class I (HLA-A) MHC class I (HLA-B) MHC class I (HLA-B) Human mRNA for ORF MHC class I (HLA-B) MHC class I (HLA-B) 98.1 185.2 348.4 477.5 606.6 677.7 790.8 processing of class I and class I1 antigens for display on the cell surface proceeds through two separate pathways.Detailed knowledge of these processing pathways would help better the understanding of the response to antigens as well as the prerequisites for antigenicity. The class I1 antigen processing pathway can be divided into separate processes.The first is the proteolytic degradation of the antigen to the appropriate size and the second is loading of the antigen into the MHC molecule and transportation to the cell surface. Conventional approaches to the study of processing pathways often involve perturbing the pathway by mutation to disrupt normal processing.61 The resulting phenotype can derive from a gross functional change or something more subtle. Mass spectrometry can then be used to examine the molecular implications of mutation rigorously. In order to study the presentation of class I1 MHC peptides, mutations have been created in B lymphoblastoid cells displaying class I1 DR3 molecules to find new phenotypes.62363 One in particular exhibits reduced binding to exogenously provided peptides and complete dissociation of dimers in sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE).62 This phenotype is derived from a single point mutation in the HLA-DMB gene, that maps within the MHC class I1 region, and disrupts the formation of class 11-peptide complexes.Cells containing this mutation exhibit DR3 mole- cules complexed primarily with peptides from invariant chains rather than a mixture of peptides derived from many pr0teins.6~ This phenotype can be corrected by incubating cells at low pH with cognate peptide. By comparing cells with perturbations of the normal processing mechanism and characterizing the antigens presented on the cell surface, information relative to the steps and requirements for processing can be obtained.These studies benefit from rapid methods to survey populations of antigen at various stages of the processing pathway as well as from methods for the rapid interpretation of the data. By using micro-column LC-MS-MS, peptides isolated from two presen- tation mutants have been sequenced.62 Normal cells exhibiting DR3 molecules were incubated at pH 7.4 and 4. Peptides were eluted from the isolated DR3 molecules and analysed by LC-MS. The distribution of m/z values of the observed peptides from these cells was consistent with a complex mixture of numerous low abundance peptides, the normal phenotype (Fig. 10A and B). When mutant cells are incubated at pH 7.4 and 4 a different set of peptide nz/z values 98.1 185.2 313.4 414.5 877.9 991.1 1088.2 1175.3 1303.5 1404.6 b,Yn-iO"s ~ ~ 114.2 211 3 461.5 590.7 719.8 790.8 804.0 991.1 1104.2 1201.4 l a 4 1416.6 U 7 b-ions LQ&*+iii&&&*&~&j&O 1519.6 1422.6 1335.5 1172.31043.2 914.1 843.0 729.9 642.6 529.6 432.5 345.4 217.2 116.1 pions x7 ' O 0 1 ' '4 Xl 0 I 11+2H)2+ 20 b14 I 500 mlr 1000 1500 Fig.9 Collision-induced dissociation mass spectrum recorded on the (M + 2H)2+ ions at m/z 817 of a peptide presented by class I1 major histocompatibility molecules on the surface of EBV cells homozygous for HLA-DRB*0401. Fragment ions of type b and y, having the general formulae H(NHCHRCO),+ and H2(NHCHRCO),0H+, respectively, are shown above and below the amino acid sequence at the top of each figure. Fragment ions of the type b,y,, are labelled with an asterisk.Ions observed in the spectrum are underlined. Leu and Ile were assigned by correspondence to the sequence derived from the database.74R Analyst, July 1996, Vol. 121 is observed. The peptides obtained from the cells incubated at pH 7.4 present only a few very abundant peptides (Fig. lOC). These peptides were sequenced by MS-MS and found to be a nested set of sequences from residues 82-104 of the invariant chain. Mutant cells incubated at pH 4 presented DR3 molecules with virtually empty binding pockets (Fig. lOD). The role of the invariant chain is to prevent binding of endogenous peptides to class I1 molecules in the endoplasmic reticulum and to guide the MHC class I1 molecules to the endocytic pathway for complexation with peptides.These data suggest that the defective DMB-gene product is unable to remove invariant chain peptides from the DR3 molecules, and as a consequence does not produce binding sites on the DR3 molecules for antigenic peptides leading to unstable DR3 dimers. Enrichment and Analysis of Components in Sub-cellular Vesicles Peptides isolated from sub-cellular fractions of the class 11 antigen processing pathway The dissection of cellular events frequently requires localization of proteins to sub-cellular compartments. By characterizing or Scans summed 25&575 Scans summed 250-575 60 40 20 .- ; a, 400 600 800 1000 1200 14 - c 400 600 800 1000 1200 1400 al .- E+07 6.40 .- 3 Scans summed 535-725 Scans summed 535-725 , - rr" 100 1 '; 1 l O O l RIC 1 1 :p( ' E+08 loo jx3A 1001 RIC(x28) ffi 40 40 20 20 400 600 800 1000 1200 1400 400 600 800 1000 1200 1400 m/z Fig.10 Micro-column liquid chromatography electrospray ionization mass spectrometry of peptides isolated from DR3 molecules. A, Mass spectrum (scans 250-575 summed) and reconstructed ion chromatogram (inset) of peptide mixture eluted from DR molecules of pH 7.4 treated 8.1.6 cells. B, Mass spectrum (scans 250-575 summed) and reconstructed ion chromatogram (inset) of peptide mixture eluted from DR molecules of pH 4 treated 8.1.6 cells. C, Mass spectrum (scans 535-725 summed) and reconstructed ion chromatogram (inset) of peptide mixture eluted from DR molecules of pH 7.4 treated 9.5.3 cells. Ions correspond to +3 and +4 charge states of the following invariant chain fragments, KPPKPVSKMRMATPLLMQALP, PKPPKPVSKMRMATPLLMQALP, PKPPKPVSKMRMATPLLMQALPM.D, Mass spectrum (scans 535-725 summed) and reconstructed ion chromatogram (inset) of peptide mixture eluted from DR molecules of pH 4 treated 9.5.3 cells. LC-MS was performed using a 1.0 and a 1.5 p1 aliquot of material from 8.1.6 cells and 9.5.3 cells, respectively. The peptides were eluted using a linear gradient from &90% B over 30 min. (Reproduced with permission from ref. 62. Copyright 1994 The American Association of Immunologists. All rights reserved.) identifying the components within these sub-cellular compart- ments, information concerning the biochemical events occur- ring within the compartments can be discovered. For example, class I1 antigen processing occurs in vesicles within antigen presenting cells.The processes that exist within these vesicles can be dissected by enrichment of the vesicles and character- izing the components. Other sub-cellular compartments exist within cells, and proteins present in these compartments can be identified with a variation on these approaches. In order to study the events occurring in the processing of class I1 antigens, Rudensky et al.65 have developed procedures to isolate sub-cellular components of the antigen processing pathway. Previous research has suggested that MHC class I1 molecules move from the golgi complex to late endosomes where proteolytic dissociation of the invariant chain takes place.66 The class I1 molecules accumulate in a dense endosomal compartment, positioned between late endosomes and dense lysozomes, termed MIIC.Endogenous peptides bind to class I1 molecules in MIIC compartments.66 Finally, the complex moves through the endosomal pathway to the cell surface. Exogenous peptides have also been found to accumu- late in MIIC vesicles, where binding to class I1 molecules may occur.66 In order to understand better the events occurring in each of these vesicles, studies have focused on characterizing the peptide components in sub-cellular fractions. By deter- mining the relative molecular masses of polypeptides and peptides in the various fractions, the level of processing at each stage of the pathway can be determined. In addition, MS-MS experiments on peptides in conjunction with database searching can help determine the protein origins of the peptides as well as the cleavage sites in the protein and thus the putative proteases involved in the processing.The vesicles of the antigen processing pathway can be fractionated by density on a sucrose gradient.65 By using protein marker assays, fractions from the sucrose gradient can be pooled by type of vesicle. Vesicles are then ruptured under acidic conditions and the contents fraction- ated by using a 10 000 Da cut-off membrane filter. An aliquot of this material was then subjected to LC-MS-MS analysis and database searching. Complex mixtures of peptides were found in all fractions, except those enriched in plasma membrane. A broad range of relative molecular masses was observed in both the late endosomes and endoplasmic reticulum fraction and the MIIC, late endosomes and endoplasmic reticulum fraction.Poly- peptides were present in the range 800-11 700 Da. Fig. 11 shows a mass spectrum for one of the polypeptides in the mass range 6500 Da. An additional fractionation step was then included to separate the peptides into two populations. By filtering through a 3000 Da cut-off membrane, the 1,ow relative molecular mass peptides were collected and analy sed by LC- MS-MS. Results from database searches with tandem mass spectra obtained from the late endosome fraction are shown in Table 4a. Additionally, 30 tandem mass spectra of peptides were obtained from the fraction containing late endosomes and endoplasmic reticulum, and the fraction containing MIIC, late endosomes and endoplasmic reticulum. Seven of the tandem mass spectra gave high scoring matches to sequences in a Mus musculus sub-set of the Owl protein database (Table 4b).By searching the database without regard to the proteolytic specificity that may have been used to create the peptides, it is possible to obtain information about how the identified peptides were formed. The cleavage sites of the peptides identified in these experiments are consistent with an acid protease such as cathepsin D or E. Peptides from protein contaminants, common in samples handled by humans, were also found. The processing of these contaminants is consistent with those of the endoge- nous proteins. The direct analysis of the peptides contained in the sub-celiular vesicles by MS-MS revealed information relevant to the processing environment.This was accomplishedAnalyst, July 1996, Vol. 121 80- m C 3 $60- ,g 40- 75R ly7.3 with a minimal amount of fractionation of the peptides, thus circumventing sample losses that may have occurred and greatly decreasing analysis time. Conclusion and Perspective Biological experimentation can be crudely divided into three activities: experiment (activation/pertubation/discovery), mea- surement and data analysis. In general, rigorous measurement of the experimental outcome at the molecular level or molecular discovery can be a time-consuming process. For example, amino acid sequence analysis of a large collection of peptides A. '9 932.0 I 1304.2 I I 1200 1400 400 600 B. '7 5980.0 1 6736 5000 6500 700 Fig. 11 A, Mass spectrum showing m/z values of the higher relative molecular mass peptides derived from sub-cellular vesicles of the class I1 MHC antigen processing pathway.B, The derived relative molecular masses for the polypeptides following deconvolution of the mass spectrum are shown. Table 4 Peptides identified by micro-column LC-MS-MS of murine sub- cellular fractions containing: (a) late endosomes; (b) MIIC vesicles, late endosomes and endoplasmic reticulum (ER) Sequence Protein (a) Late endosomes: WQVKSGTIFDNF Calreticulin precursor LGLLPHTFTPTTQL ATP synthase A chain FDITADDEPLGRVSFEL Peptidyl-pro1 yl cis-trans-isomerase AGFGGGFAGGDGLL Human keratin (b) MIIC vesicles, late endosomes, ER: TLDDTWAKAHFAIMF MTFFPQHFLGL WQVKSGTIFDNF Calreticulin precursor AVLGLDL Calreticulin precursor LGLLPHTFTPTTQL ATP synthase A chain FDITADDEPLGRVSFEL Peptidyl-prolyl cis-trans-isomerase VVENLQDDFDFN MMU06922 NCBI gi: 458706 Cytochrome c oxidase polypeptide I Cytochrome c oxidase polypeptide I residing in a sub-cellular vescicle could require months of effort to purify and sequence each individual peptide by conventional peptide analytical methods such as Edman degradation.For this reason the enormous throughput potential of MS-MS has always presented an attractive alternative to more conventional methods of peptide and protein sequence analysis. A major hurdle to tapping this throughput potential has been the lack of fast and robust methods to interpret MS-MS data. By utilizing the sequence infrastructure produced through genome sequenc- ing, tandem mass spectra can now be quickly interpreted in an automated manner.The ability to analyse large numbers of tandem mass spectra efficiently should stimulate methods to increase the rate of tandem mass spectra data acquisition. By developing better computer control of data acquisition or building faster scanning mass analysers such as ion traps or time-of-flight hybrids, it will be possible to study complex biological processes more thoroughly with less effort. Support for this work was obtained from the National Science Foundation, Science and Technology Center (BIR 88097 lo), NIH (GM52095), and the University of Washington's Research Royalty Fund. References 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Hunkapiller, T., Kaiser, R. J., Koop, B. F., and Hood, L.E., Science, 1991, 254, 59. Olson, M., Proc. Natl. Acad. Sci. USA, 1993, 90,4338. Marshall, E., Science, 1994, 266, 1800. Marshall, E., Science, 1994, 266, 25. Adams, M. D., Kelley, J. M., Gocayne, J. D., Dubnick, M., Polymeropoulos, M. H., Xiao, H., Merril, C. R., Wu, A., Olde, B., Moreno, R. F., Kerlavage, A. R., McCombie, W. R., and Venter, J. C., Science, 1991, 252, 1651. Adams, M. D., et al., Nature (London), 1995, 377, suppl., 3-1743. Fleischmann, R. D., Adams, M. D., White, O., et. al., Science, 1995, 269, 496. Fraser, C. M., Gocayne, J. D., White, O., Adams, M. D., et al., Science, 1995, 270, 397. Krishna, R. G., and Wold, F., Adv. Enzymol. Relat. Areas Mol. Biol., 1993, 67, 265. Hunter, T., Cell., 1987, 50, 823. Hunt, D. F., Yates, J. R., 111, Shabanowitz, J., Winston, S., and Hauer, C.R., Proc. Natl. Acad. Sci., USA, 1986, 84, 620. Wong, S. F., Meng, C. K., andFenn, J. B., J . Phys. Chem., 1988,92, 546. Fenn, J. B., Mann, M., Meng, C. K., Wong, S. F., and Whitehouse, C. M., Science, 1989, 246, 64. Fenn, J. B., Mann, M., Meng, C. K., Wong, S. F., and Whitehouse, C. M., Mass Spectrom. Rev., 1990, 9, 37. Smith, R. D., Loo, J. A., Edmonds, C. G., Barinaga, C. J., and Udseth, H. R., Anal. Chem., 1990, 62, 882. Smith, R. D., Loo, J. A., Ogorzalek Loo, R. R., Busman, M., and Udseth, H. R., Mass Spectrom. Rev., 1991, 10, 359. Covey, T., Huang, E., and Henion, J., Anal. Chem., 1991, 63, 1193. Griffin, P. R., Coffman, J. A., Hood, L. E., and Yates, J. R., 111, Int. J. Mass Spectrom. Ion Processes, 1991, 111, 131.Griffin, P. R., Hood, L. E., and Yates, J. R., 111, Proceedings ofthe 39th ASMS Conference on Mass Spectrometry and Allied Topics, Nashville, TN, 1991, pp. 1157-1158. Wahl, J. H., Goodlett, D. R., Udseth, H. R., and Smith, R. D., Electrophoresis, 1993, 14, 448. Griffin, P. R., Furer-Jonshur, K., Hood, L. E., Yates, J. R., 111, Schwartz, J., and Jardine, I., Techniques in Protein Chemistry III, ed. Angeletti, R. A., Academic Press, New York, 1992, pp. 467476. Gale, D. C., and Smith, R. D., Rapid Commun. Mass Spectrom., 1993, 7, 1017. Emmett, M. R., and Caprioli, R. M., J . Am. SOC. Mass Spectrom., 1994, 5, 605. Wilm, M. S., and Mann, M., Int. J . Mass Spectrom. Ion Processes, 1994,136, 167. Shelley, D. C., Gluckman, J. C., and Novotny, M. V., Anal. Chem., 1984,56,2990.Analyst, July 1996, Vol.121 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 Kennedy, R. T., and Jorgenson, J. W., Anal. Chem., 1989, 61, 1128. Cox, A. L., Skipper, J., Chen, Y., Henderson, R. A., Darrow, T. L., Shabanowitz, J., Engelhard, V. H., Hunt, D. F., and Slingluff, T. L., Jr., Science, 1994, 264, 716. Huang, E. C., and Henion, J. D., J . Am. SOC. Muss Spectrom., 1990, 1, 158. Stahl, D. C., Martino, P. A., Swiderek, K. M., Davis, M. T., and Lee, T. D., Proceedings of the 39th ASMS Conference on Mass Spectrometty and Allied Topics, Washington, DC, 1992, pp. Yates, J. R., 111, McCormack, A. L., and Eng, J., Identfication of Individual Proteins in Mixtures using Micro-Column HPLC Tandem Muss Spectrometry and Automated Database Analysis, presented at the Methods in Protein Structure Analysis Meeting, Snowbird, UT, September 9-13, 1994.Yates, J. R., 111, Eng, J., McCormack, A. L., and Schieltz, D., Anal. Chem., 1995, 67, 1426. Lee, T. D., and Vemuri, S., Biomed. Environ. Muss Spectrom., 1990, 19, 639. Papayannopoulos, I. A., and Biemann, K., J . Am. SOC. Muss Spectrom., 1991, 2, 174. Watkins, P. J., Jardine, I., and Zhou, J. X., Biochem. SOC. Trans., 1991, 19, 957. Hunt, D. F., Griffin, P. R., Yates, J. R., 111, Shabanowitz, J., Fox, J. W., and Beggerly, L. K., in Techniques in Protein Chemistry, ed. Hugli, T. E., Academic Press, San Diego, CA, 1989, pp. 580-588. Hunt, D. F., Alexander, J. E., McCormack, A. L., Martino, P. A,, Michel, H., and Shabanowitz, J., in Techniques in Protein Chemistry II, ed.Villafranca, J. J., Academic Press, San Diego, CA, 1991, pp. 455465. Waterston, R., Ainscough, R., Anderson, K., Berks, M., Blair, D., Connell, M., Cooper, J., Coulson, A., Craxton, M., Dear, S., et al., Cold Spring Harh. Symp. Quant. Biol., 1993, 58, 367. Olson, M., Curr. Opinion Biol., 1992, 2, 221. Feldman, H., Aigle, M., Aljinovic, G., Andr’e, B., Baclet, M. C., Barthe, C., Baur, A., B’ecam, A. M., Biteau, N., Boles, E., et al., EMBO J., 1994, 13, 5795. Dujon, B., Alexandraki, D., Andr’e, B., Ansorge, W., Baladron, V., Ballesta, J. P., Banrevi, A., Bolle, P. A., Bolotin-Fukuhara, M., Bossier, P., et ul., Nature, (London), 1994, 369, 371. Oliver, S. G., van der Aart, Q. J., Agostoni-Carbone, M. L., Aigle, M., Alberghina, L., Alexandraki, D., Antoine, G., Anwar, R., Ballesta, J. P., Benit, P., et ul., Nature (London), 1992, 357, 38. Sofia, H. J., Burland, V., Daniels, D. L., Plunkett, G., 111, and Blattner, F. R., Nucleic Acids Res., 1994, 22, 2576. Sanger, F., Nicklen, S., and Coulson, A. R., Proc. Natl. Acad. Sci. USA, 1977,74,5463. Maxam, A. M., and Gilbert, W., Proc. Natl. Acad. Sci. USA, 1977,74, 560. 1801-1802. 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 6.5 66 Morris, H. R., Panico, M., and Taylor, G. W., Biochem. Biophys. Res. Commun., 1983, 117, 299. Gibson, B. W., and Biemann, K., Proc. Natl. Acud. Sci. USA, 1984, 81, 1956. Yates, J. R., 111, Intelligenetics, 1988, 4, 1. Yates, J. R., 111, Griffin, P. R., Speicher, S., and Hunkapiller, T., Anal. Biochem., 1993,214, 397. Henzel, W., Billeci, T., Stults, J., Wond, S., Grimley, C., and Watanabe, C., Proc. Natl. Acad. Sci. USA, 1993, 90, 501 1. James, P., Qaudroni, M., Carafoli, E., and Gonnet, G., Biochem. Biophys. Res. Commun., 1993, 195, 58. Pappin, D., Hojrup, P., and Bleasby, A,, Curr. Biol., 1993, 3, 327. Mann, M., Hojrup, P., and Roepstorff, P., Biol. Mass Spectrom., 1993, 22, 338. Eng, J., McCormack, A. L., and Yates, J. R., 111, .I. Anz. SOC. Mass Spectrom., 1994, 5, 976. Yates, J. R., 111, Eng, J., and McCormack, A. L., Anal. Chem., 1995, 67, 3202. Ishikawa, K., and Niwa, Y., Biomed. Environ. Mass Spectrom., 1986. 13, 373. Johnson, R. S., and Biemann, K., Biomed. Environ. Mass Spectrom., 1989, 18, 945. Yates, J. R., 111, Zhou, J., Griffin, P. R., and Hood, L. E., in Techniques in Protein Chemistry II, ed. Villafranca, J., Academic Press, New York, 1990, p. 477. Hines, W. M., Falick, A. M., Burlingame, A. L., and Gibson, B. W., I . Am. SOC. Mass Spectrom., 1992, 3, 326. Hunt, D. F., Henderson, R. A., Shabanowitz, J., Sakaguchi, K., Michel, H., Sevilir, N., Cox, A. L., Appella, E., and Engelhard, V. H., Science, 1992, 255, 1261. Yates, J. R., 111, McCormack, A. L., Hayden, J. B., and Davey, M. P., in Cell Biology: A Laboratory Handbook, ed. Celis, J. E., Academic Press, San Diego, CA, 1994, pp. 380-388. Mellins, E., Smith, L., Arp, B., Cotner, T., Celis, E., and Pious, D., Nature (London), 1990, 343, 71. Monji, T., McCormack, A. L., Yates, J. R., 111, and Pious, D., J . Immunol., 1994, 153, 4468. Sette, A.. Ceman, S., Kubo, R. T., Sakaguchi, K., Appella, E., Hunt, D. F., Davis, T. A., Michel, H., Shabanowitz, J., Rudersdorf, R., Grey, H. M., and Demars, R., Science, 1992, 258, 1801. Riberdy, J. M., Newcomb, J. R., Surman, M. J., Barbosa, J. A., and Cresswell, P., Nature (London), 1992, 360, 474. Rudensky, A. Y., Marie, M., Eastman, S., Shoemaker, L., DeRoos, P. C., and Blum, J. S.. Immunity, 1994, 1, 585. Marie, M. A., Taylor, M. D., and Blum, J. S., Proc. Natl. Acud. Sci. USA, 1994,91, 2171. Paper 5107370G Received November 9, 1995 Accepted January 16, I996

 

点击下载:  PDF (1827KB)



返 回