Search Keck Sites:


W.M. Keck Facility
 Yale University
 300 George Street
 Addresses
 
 Contact Us

Yale University School of Medicine

Keck Home Page > Protein Chemistry > More MALDI Information

More Information on
Matrix Assisted Laser Desorption Ionization (MALDI)
Mass Spectrometry

What is MALDI-MS?

This MS approach uses a nitrogen UV laser (337 nm) to generate ions from high mass, non-volatile samples such as peptides and proteins. The key to this technique, which was discovered several years ago, is that in the presence of a matrix like alpha-cyano-4-hydroxy cinnamic acid (CHCA), large molecules like peptides ionize instead of decomposing. Although the mechanism remains uncertain, it may involve absorption of UV light by the matrix followed by transfer of this energy to the peptide - which then ionizes into the gas phase as a result of the relatively large amount of energy absorbed. To get the resulting ions to move down the flight tube in the mass spectrometer they are accelerated in an electric field (i.e., 25 kV). MALDI-MS can be done either in positive or negative ion mode but the Keck Laboratory usually uses positive ion mode because at acid pH (where COOH groups are not ionized/not charged) tryptic peptides usually carry a charge of +2 or higher. That is, they will typically have a +1 charge on the NH2-terminal amino group and another +1 charge on either the epsilon NH2- group of the C-terminal lysine or on the C-terminal arginine. In addition, peptides containing histidine will have an additional +1 charge at acid pH. At a pH of about 2 we have seen reasonably direct correlation between how well a peptide ionizes (how large a signal it gives) and its overall predicted net positive charge over the range extending from neutral to +4 (Williams et al (1996)). Thus we believe it is better in terms of MALDI-MS "screening" of peptides for sequencing (see below) to use trypsin instead of endopeptidase Lys-C, which cleaves after lysine residues only. With endopeptidase Lys-C many peptides will have +3 and perhaps even higher charges (due to their content of arginine) and hence, these peptides will give more intense MALDI-MS signals than peptides with lower net charge - making it more difficult to judge relative purity.

Although MALDI-MS screening of peptides for the purpose of identifying peptides suitable for sequencing is certainly not quantitative (due to often large differences in how well different peptides ionize), we have found it to be nonetheless useful. Previously, when we relied solely on HPLC absorbance peak shape as the sole criterion of peptide purity, our success rate at obtaining useable sequences from Edman degradation was about 65%. In contrast, if we use both HPLC peak shape and relative purity as judged by MALDI-MS, coupled with a minimum MALDI-MS criterion of 10:1 (in terms of the observed intensities of the major to next major ions observed), we find that our success rate at sequencing individual tryptic peptides increases to above 85% (Williams et al (1996)). Hence, by restricting ourselves to tryptic peptides, which generally all carry similar charge (see above) and by using the 10:1 MALDI-MS criterion we have been able to partially compensate for the inherent lack of quantitation of MALDI-MS. However, it should be noted that we have previously documented more than a 10,000 fold range in MALDI-MS sensitivity of detection when comparing neutral peptides with those containing an overall +2 or greater charge (Williams et al (1996)).  So this may cause significant problems when trying to use MALDI-MS to judge the purity of peptides that do not share relatively uniform charge (e.g., MHC derived peptides).

What is the difference between "linear" and "reflectron" MALDI-MS?

The Micromass M@LDI-L/R instrument in use in the Keck Laboratory can be used in either linear or reflectron modes of operation. In linear mode the ions travel down a linear flight path and their mass/charge (m/z) ratio (see below for an explanation of the difference between mass and mass/charge ratio) is determined by the time it takes for them to reach the detector. Hence, this instrument is called a time of flight instrument (TOF). The relationship that allows the m/z ratio to be determined is E = ½ (m/z)v2 . In this equation E is the energy imparted on the charged ions as a result of the voltage that is applied by the instrument and v is the velocity of the ions down the flight path. Because all of the ions are exposed to the same electric field, all similarly charged ions will have similar energies. Therefore, based on the above equation, ions that have larger mass must have lower velocities and hence will require longer times to reach the detector, thus forming the basis for m/z determination by a mass spectrometer equipped with a time of flight detector.

A reflectron MALDI-MS has an ion mirror at its end which reflects the ions back (at a slight angle) to a detector. One major advantage of a reflectron is that it permits limited mass spectrometric sequencing to be carried out via a process called post source decay (PSD). The "source" in this case is the target containing the sample and what happens is that as the excited ions travel down the flight path many will tend to literally fall apart due to bond breakage induced by the absorbed energy. Suppose we take a very simple case where we have a peptide ion with mass 900 that breaks into two pieces, with mass 300 and 600 respectively, as it travels down the flight tube. Since each of these pieces will tend to retain a correspondingly similar fraction of the total energy that was originally imparted to the ion (i.e., one-third and two-thirds respectively), the energy and mass of each will be decreased in the same proportion and the E/(m/z) ratio will remain constant. Hence, based on rearranging the above equation to:

2E/(m/z) = v2

both these ions will have the exact same velocity and both will arrive at the linear detector at the same time - hence they will together only result in one m/z peak. A good feature of this observation is that it allows us to use the linear mode to screen peptides for purity - that is we do not "see" mass peaks due to post source decay (where a single peptide mass ion might decay to give several m/z peaks and thus falsely indicate that a pure peptide is a mixture of species). A limitation of a linear instrument is that the sequencing information present as a result of post source decay is lost and the mass accuracy (see below) is less than on a reflectron equipped instrument. In order to separate the fragment ions that reach the linear detector simultaneously, a reflectron instrument has an ion mirror, which is essentially a potential field, at the end of the flight tube. Those ions that are larger and that have more energy penetrate farther into the field and hence are slowed down relative to lower mass ions that do not penetrate as far into the ion mirror. Hence, all of the fragment ions are separated and the resulting differences in mass between these species sometimes can be used to infer partial sequences of the parent peptide. In practice, interpretation of MS/MS post source decay sequencing data is not routine and its quality is much less than the MS/MS fragmentation data obtained from a conventional mass spectrometer designed for MS/MS sequencing like the Q-Tof. In this latter case collision induced dissociation (CID) with a "heavy" gas such as argon is used to more completely fragment the peptide ions. However, post source decay data has proven useful particularly for identifying sites of posttranslational modification - such as phosphorylation sites that are very difficult to identify via direct Edman sequencing.

Another major advantage of the reflectron is that it permits higher mass accuracy. The reason is that during high voltage extraction of the peptide ions produced by exposure to UV light, there are slight differences in the amount of energy that is actually acquired by similarly charged ions. In a linear instrument these differences result in slight differences in times of flight which results in broader peaks and lower mass accuracy. As described previously in terms of resolving fragment ions, a reflectron also compensates for similarly charged ions having slightly different overall energies (again, the more energetic ions that have slightly faster velocities will penetrate further into the ion mirror and hence be slightly delayed relative to less energetic ions - thus both will tend to reach the detector at the same time). As a result, a reflectron improves both resolution and mass accuracy. Although there is always the possibility of observing fragmentation ions when using the reflectron (and mistaking these for contaminating peptide ions), by adjusting the settings on the instrument it is possible to minimize the possibility of seeing peptide fragmentation in the reflectron mode. Perhaps the best approach, however, is to first screen for purity in the linear mode and then obtain a more accurate mass by shooting the target again in the reflectron mode. Typically, we observe average mass accuracies (with external calibration and when operating at very high sensitivity) of about ±0.1% in the linear and ±0.05% (or lower) in reflectron mode for tryptic peptides that often range in mass between about 800 and 3,000 daltons. For an average 1,500 dalton peptide this means an expected error of about ±1.5 amu in linear and about ±0.75 in reflectron mode. While either of these mass accuracies should be sufficient to detect oxidation of methionine (+16) neither would be sufficiently accurate to detect deamidation of an asparagine (+1). In addition to making it easier to "fill in holes" left from conventional Edman sequencing, increased mass accuracy is also essential for identifying known proteins based on peptide mass searching (see below). Using this approach several peptide masses that we may obtain via "off-line" MALDI-MS on an aliquot of a digest of the protein can be used to identify the parent protein if its sequence is in the database.

A very recent improvement in MALDI-MS that can be used on both linear and reflectron equipped instruments is a phenomenon called delayed extraction. In this case the high voltage field that results in movement of the ions down the flight path is turned on several microseconds after, instead of simultaneously with the target being hit with the UV laser.

What is the difference between monoisotopic and average peptide mass?

As shown in the table, below the atoms that make up the naturally occurring amino acids found in proteins are not isotopically pure.

Natural Abundance of Isotopes Commonly Found in Proteins

Atom

Most Abundant Isotope

Next Most Abundant Isotope

Carbon

12C

98.9%

13C

1.11%

Nitrogen

14N

99.6%

15N

0.366%

Oxygen

16O

99.8%

18O

0.204%

Sulfur

32S

95.0%

34S

4.22%


As a result, peptide mass spectra acquired at low resolution provide only an average mass which reflects the relative abundances of these isotopes. In contrast, peptide mass spectra acquired at high resolution (e.g., MALDI-MS data acquired up to about 3,000 Da with a reflectron) will show a cluster of peaks differing by unit mass resolution. Hence, the lowest mass peak in each cluster results from peptide molecules which are entirely composed of atoms which contain only the most abundant isotopes listed above. The next highest mass peak, which will appear at +1 amu, will consist of peptide molecules which contain a single 13C, 15N, 18O, or 32S atom. Typically, an average 15-mer tryptic peptide might appear in a high resolution MALDI spectrum as a cluster of 4-5 peaks with the first two peaks, which might be approximately equal in size, being the most intense. Peptide mass database searching requires that the monoisotopic peak (resulting from peptide molecules which contain only the most abundant isotopes listed above) be accurately identified. When the signal to noise ratio is low, as it often is when <5% aliquots of Coomassie Blue stained gel bands are analyzied which contain substantially less than picomole amounts, and when there is substantial overlap between the isotope envolopes for different peptide ions, which becomes much more likely as the MW of the protein increases substantially above 50,000, great care must be taken to identify the monoisotopic peptide ion peaks - with the expertise and labor required to complete this task accounting for a significant fraction of the cost of the
Manual MALDI-MS Protein Identification Service.

MALDI-MS in the Keck Laboratory

Matrix assisted laser desorption ionization mass spectrometry is routinely carried out on 1.0 µl (usually <5%) of in gel digests (to enable peptide mass database searching - see below) and on aliquots of peptides isolated via reverse phase HPLC. As noted, the MALDI mass spectrum provides an additional criterion to judge peptide purity and the mass of the peptide is often helpful as a further confirmation of the sequence determined by Edman degradation. In each case, the sample is mixed with 1.0 µl of alpha-cyano-4-hydroxy cinnamic acid (CHCA) matrix solution and then spotted onto a new target. The samples are then allowed to air dry at room temperature. To avoid any possible cross-contamination, all targets are used only once. The alpha-CHCA matrix solution is prepared at a concentration of 4.5 mg/ml in 50% CH3CN, 0.05% TFA and is used after vortexing and standing for a few minutes. Matrix solutions are prepared fresh daily. The calibrants used for external calibration of peptides are bradykinin (average (M+H) is 1061.23) and ACTH Clip (average (M+H) is 2466.70). Both calibrants are stored at -20 degrees C as 500 fmol/µl stocks in 50% CH3CN, 0.05% TFA. The same calibrants (100 fmol each) are used for internal calibration of in gel digests except that under the conditions used for this mass spectrometry we are able to calibrate on the monoisotopic (M+H) mass of bradykinin which is 1060.57. Generally, we observe a mean mass error of about 0.1% with external calibration. Since the standard deviation for the mass error with external calibration is 0.1%, there is a 68.3% probability the observed mass will be within ± 0.1%, a 95.4% probability it will be within ±0.2% and a 99.7% probability it will be within ±0.3% (i.e., within 3 standard deviations) of the theoretical mass.

The Micromass M@LDI-L/R is a linear / reflectron time-of-flight mass spectrometer designed for high throughput protein identification. This system can be used to analyze intact protein/peptide masses (normally done in linear mode up to about m/z = 400,000) or protein digests (reflectron mode). Often, samples destined for an intact mass contain salts that have to be removed prior to analysis using a C18 or C4 ZipTip (Millipore). Trypsin digests do not typically require de-salting, so 1.8µl of the digest mixture can be applied directly to the MALDI target after mixing with matrix plus internal standards (i.e., 50 fmol bradykinin and 150 fmol ACTH clip which have protonated monoisotopic masses of 1060.569 and 2465.199 respectively). The target plate has clusters of four sample spots centered about a central spot which contains the external calibration standards. The target plate can hold 96 samples plus the external calibration standards. The M@LDI-L/R system uses a nitrogen laser (337 nm) and a '"fuzzy logic" algorithm to locate the best laser position on the target sample in an automated mode of operation. Spectra are acquired in a data dependent mode with the data system deciding when and if an acceptable spectrum has been acquired. It also decides when to abort further acquisition and to move on to the next sample. The collected spectra for each sample are averaged together, and the calibration is adjusted using the "lockmass" internal standard. Mass lists for database searching are made by hand and searched using ProFound and Mascot against the NCBInr protein database. A printout of the search results is then produced which lists the comparative score for possible protein identifications for each sample from which acceptable spectra were acquired. For additional information on the database searching algorithms, please visit the above web sites for each.

It is important to understand that the output from both the Mascot and ProFound searching programs is dependent not only upon the mass spectrometry data but also upon several input parameters. The following provides a brief discussion of each of these parameters.

  • Taxonomic Category: The ProFound program allows the taxonomic category to be specified and since this is always known beforehand the first inclination might be to always take advantage of this information to quickly narrow the searches. However, if the complete genome for the species from which the protein derives is not known (e.g., human), it is possible the identification might be missed if the search is limited to human proteins. That is, if a highly homologous protein has been sequenced in another organism, a search that is not limited to human proteins might succeed whereas a search of the human database would fail. For this reason the Keck Laboratory does not specify a particular taxonomic category but rather searches the entire sequence database. 
  • Protein Mass Range: Since virtually all proteins submitted for protein identification are purified via SDS PAGE, this parameter is almost always available and again, the first inclination would be to take maximum use of this information to narrow the searches. Two problems, however, stand in the way of using a narrow MW range to enable more facile searches. The first problem is that molecular weights derived from SDS PAGE are often significantly in error. For instance, although the ARPP-21 cAMP regulated phosphoprotein has an estimated MW of 21,000 as determined by SDS PAGE, the actual molecular weight of this protein is only 9,561 (as predicted from the sequence of ARPP-21 that was determined by conventional Edman degradation (Williams et al, (1989)). The second problem that occurs when a very limited protein mass range is used in the search is that identifications may be missed if the form of the protein that is in the database is not the form that is actually isolated. Since many database entries are based on translations of nucleic acid sequences, they will contain, for instance, signal peptides that are not found in the circulating form of the protein and conversely, they will not contain post-translational modifications that may be present. Hence, as noted by Cottrell, (1994), the circulating form of insulin has a molecular weight of about 5,734 whereas the Swiss Protein entry (INS_BOVIN) is actually the sequence of the precursor protein which has a molecular weight of about 11,394. Keeping these factors in mind, we generally carry out the first peptide mass search with a mass range that extends from about one-half to twice the SDS PAGE predicted MW. If the search fails to identify the protein, then we normally will carry out a second search without any mass range limitation.
     
  • Peptide Mass Tolerance: Obviously, the higher the mass accuracy, the fewer will be the number of predicted peptides from proteins besides the actual protein that was digested that will have predicted masses that will be scored as false positives and the higher will be the specificity of the search. Currently, we use a mass accuracy tolerance of 70 ppm. To achieve this very high mass accuracy routinely requires the use of an instrument (like the Micromass TofSpec SE in the Keck Laboratory) that is equipped with delayed extraction. In addition, the sample must be calibrated by adding internal standards (see above) and the resolution must be sufficiently high that unit mass resolution is achieved (i.e., the spectrum must be acquired under conditions where an m/z of 1,500 is resolved from 1,501) so that the monoisotopic mass of the peptide (corresponding to the peptide containing only 12C isotopes - see above) is determined. If unit mass resolution is not achieved, the resulting peptide mass will be an average mass, which will be an average of the mass of the 12C version of the peptide as well as with the same peptide containing various numbers of 13C, 15N, 18O, and 32S atoms.
     
  • Cysteine and Methionine Modification: Since the Keck Laboratory does not modify either of these residues prior to manual in gel trypsin digestion, initial searches are carried out assuming that both these amino acids are in their native form. Two common modifications that can occur (and that would prevent peptides that contained these modifications from being identified) are methionine oxidation and modification of cysteine by acrylamide free radicals during SDS PAGE. Although it is difficult to prevent methionine oxidation, a reasonable approach to limiting the extent of modification of free cysteine residues is to limit the concentration of acrylamide free radicals in the SDS gel by letting the gel stand overnight prior to use.
     
  • Number of Missed Cleavages: Incomplete tryptic cleavage at every lysine and arginine may be caused by an adjacent proline residue, nearby acidic residues, and/or a generally poor digest that may result from a very low amount of protein, from the presence of a significant fraction of carbohydrate and perhaps, to higher order protein structure that remains after SDS PAGE. For these reasons, our initial peptide mass searches allow one missed cleavage - which we believe is generally sufficient to reasonably account for this problem. In the case of the Peptide Search program, the second pass search will miss as many cleavage sites as necessary to match a measured mass within the specified tolerance.
     
  • Criteria for Protein Identification: In contrast to the Peptide Search program, the ProFound Program gives a probability score for each protein that is found that meets the minimum input criteria. The probability score, which is described in more detail in the ASMS abstract that may be found in the ProFound Web site, takes into account a number of factors including the fraction of predicted peptides that are matched and the actual error between the peptide masses predicted for the "identified" protein and the observed peptide masses. Currently, the minimum ProFound search criteria we require for a protein identification are a probability score of 1.0 (i.e., 100%) and that observed peptide masses must be matched to at least 25% of the predicted protein sequence. Although most top scoring proteins that have met our requirements for a Peptide Search identification have also met our requirements for a ProFound identification (i.e., ProFound score of 1.0), the Keck Laboratory will only identify a protein through its Manual MALDI-MS Protein Identification Service if it meets our requirements for an identification for both the Mascot and ProFound Search algorithms.

Confirmation of Manual and Automated MALDI-MS Protein Identifications Based on Peptide Mass Database Searching

Several additional factors can be brought into play to either confirm or weaken an identification based on peptide mass searching. Firstly, particularly in the case of an organism like Hemophilus, whose genome has been sequenced and which encodes a relatively modest number of proteins (about 1,700 as compared to 80,000 in the human genome), the identified protein should be from the correct species. Indeed, this is one of the required creteria for the hgih throughput service. Secondly, the MW observed by SDS PAGE generally will be within a one-half to two-fold "window" around the predicted MW or there will be a good explanation for it falling outside this window (e.g., if the observed mass is less than one-half the predicted mass the protein may be a limited cleavage fragment - a hypothesis that can be confirmed by noting the location in the sequence of the matching peptides). Thirdly, if the sample was isolated by 2D gel electrophoresis, the observed versus predicted pI provides an additional criterion. Fourthly, once a tentative identification has been made, other approaches (such as Western Blotting/antibody detection) and the known properties of the identified protein (e.g., ability to bind nucleic acids) may be used to quickly bolster or cast doubt on the identification based on peptide mass searching. Regardless, the Keck Laboratory strongly recommends that additional studies (MS/MS protein identification or "conventional" internal Edman sequencing) be carried out prior to drawing firm conclusions. Finally, it must be kept in mind always that if the sample has not been purified sufficiently, the major protein in the Coomassie Blue stained band may not be the protein that is responsible for the activity being followed. In this regard it is interesting that both peptide mass and MS/MS data indicated that an "unknown" 2D gel spot that was analyzed in the Keck Laboratory actually contained at least 4 different proteins.

 

    Top of Page
Medical Center Yale-New Haven Hospital Yale University

Copyright © 2003, Yale University, New Haven, Connecticut, USA. All rights reserved.
Comments or suggestions to site editor.

Last modified: 23-Oct-2006 (GB)