|
Keck
Home Page >
Protein Chemistry >
More MALDI Information
More Information
on
Matrix Assisted Laser Desorption Ionization (MALDI)
Mass
Spectrometry
What is MALDI-MS?
This MS approach uses a nitrogen UV laser (337 nm) to
generate ions from high mass, non-volatile samples such as
peptides and proteins. The key to this technique, which was
discovered several years ago, is that in the presence of a
matrix like alpha-cyano-4-hydroxy cinnamic acid (CHCA),
large molecules like peptides ionize instead of decomposing.
Although the mechanism remains uncertain, it may involve
absorption of UV light by the matrix followed by transfer of
this energy to the peptide - which then ionizes into the gas
phase as a result of the relatively large amount of energy
absorbed. To get the resulting ions to move down the flight
tube in the mass spectrometer they are accelerated in an
electric field (i.e., 25 kV). MALDI-MS can be done
either in positive or negative ion mode but the Keck
Laboratory usually uses positive ion mode because at acid pH
(where COOH groups are not ionized/not charged) tryptic
peptides usually carry a charge of +2 or higher. That is,
they will typically have a +1 charge on the
NH2-terminal amino group and another +1 charge on
either the epsilon NH2- group of the
C-terminal lysine or on the C-terminal arginine. In
addition, peptides containing histidine will have an
additional +1 charge at acid pH. At a pH of about 2 we have
seen reasonably direct correlation between how well a
peptide ionizes (how large a signal it gives) and its
overall predicted net positive charge over the range
extending from neutral to +4 (Williams
et al (1996)). Thus we believe it is better in
terms of MALDI-MS "screening" of peptides for sequencing
(see below) to use trypsin instead of endopeptidase Lys-C,
which cleaves after lysine residues only. With endopeptidase
Lys-C many peptides will have +3 and perhaps even higher
charges (due to their content of arginine) and hence, these
peptides will give more intense MALDI-MS signals than
peptides with lower net charge - making it more difficult to
judge relative purity.
Although MALDI-MS screening of peptides for the purpose
of identifying peptides suitable for sequencing is certainly
not quantitative (due to often large differences in how well
different peptides ionize), we have found it to be
nonetheless useful. Previously, when we relied solely on
HPLC absorbance peak shape as the sole criterion of peptide
purity, our success rate at obtaining useable sequences from
Edman degradation was about 65%. In contrast, if we use both
HPLC peak shape and relative purity as judged by MALDI-MS,
coupled with a minimum MALDI-MS criterion of 10:1 (in terms
of the observed intensities of the major to next major ions
observed), we find that our success rate at sequencing
individual tryptic peptides increases to above 85%
(Williams et al (1996)).
Hence, by restricting ourselves to tryptic peptides, which
generally all carry similar charge (see above) and by using
the 10:1 MALDI-MS criterion we have been able to partially
compensate for the inherent lack of quantitation of
MALDI-MS. However, it should be noted that we have
previously documented more than a 10,000 fold range in
MALDI-MS sensitivity of detection when comparing neutral
peptides with those containing an overall +2 or greater
charge (Williams et al
(1996)). So this may cause significant problems
when trying to use MALDI-MS to judge the purity of peptides
that do not share relatively uniform charge (e.g., MHC
derived peptides).
What is the difference between "linear" and "reflectron"
MALDI-MS?
The Micromass M@LDI-L/R instrument in
use in the Keck Laboratory can be used in either linear or reflectron
modes of operation. In linear
mode the ions travel down a linear flight path and their
mass/charge (m/z) ratio (see below for an explanation
of the difference between mass and mass/charge ratio) is
determined by the time it takes for them to reach the
detector. Hence, this instrument is called a time of flight
instrument (TOF). The relationship that allows the
m/z ratio to be determined is E = ½
(m/z)v2 . In this equation E is the
energy imparted on the charged ions as a result of the
voltage that is applied by the instrument and v is the
velocity of the ions down the flight path. Because all of
the ions are exposed to the same electric field, all
similarly charged ions will have similar energies.
Therefore, based on the above equation, ions that have
larger mass must have lower velocities and hence will
require longer times to reach the detector, thus forming the
basis for m/z determination by a mass spectrometer equipped
with a time of flight detector.
A reflectron MALDI-MS has an ion mirror at its end which
reflects the ions back (at a slight angle) to a detector.
One major advantage of a reflectron is that it permits
limited mass spectrometric sequencing to be carried out via
a process called post source decay (PSD). The "source" in
this case is the target containing the sample and what
happens is that as the excited ions travel down the flight
path many will tend to literally fall apart due to bond
breakage induced by the absorbed energy. Suppose we take a
very simple case where we have a peptide ion with mass 900
that breaks into two pieces, with mass 300 and 600
respectively, as it travels down the flight tube. Since each
of these pieces will tend to retain a correspondingly
similar fraction of the total energy that was originally
imparted to the ion (i.e., one-third and two-thirds
respectively), the energy and mass of each will be decreased
in the same proportion and the E/(m/z) ratio will
remain constant. Hence, based on rearranging the above
equation to:
2E/(m/z) = v2
both these ions will have the exact same velocity and
both will arrive at the linear detector at the same time -
hence they will together only result in one m/z
peak. A good feature of this observation is that it allows
us to use the linear mode to screen peptides for purity -
that is we do not "see" mass peaks due to post source decay
(where a single peptide mass ion might decay to give several
m/z peaks and thus falsely indicate that a pure
peptide is a mixture of species). A limitation of a linear
instrument is that the sequencing information present as a
result of post source decay is lost and the mass accuracy
(see below) is less than on a reflectron equipped
instrument. In order to separate the fragment ions that
reach the linear detector simultaneously, a reflectron
instrument has an ion mirror, which is essentially a
potential field, at the end of the flight tube. Those ions
that are larger and that have more energy penetrate farther
into the field and hence are slowed down relative to lower
mass ions that do not penetrate as far into the ion mirror.
Hence, all of the fragment ions are separated and the
resulting differences in mass between these species
sometimes can be used to infer partial sequences of the
parent peptide. In practice, interpretation of MS/MS post
source decay sequencing data is not routine and its quality
is much less than the MS/MS fragmentation data obtained from
a conventional mass spectrometer designed for MS/MS
sequencing like the Q-Tof. In this latter case collision
induced dissociation (CID) with a "heavy" gas such as argon
is used to more completely fragment the peptide ions.
However, post source decay data has proven useful
particularly for identifying sites of posttranslational
modification - such as phosphorylation sites that are very
difficult to identify via direct Edman sequencing.
Another major advantage of the reflectron is that it
permits higher mass accuracy. The reason is that during high
voltage extraction of the peptide ions produced by exposure
to UV light, there are slight differences in the amount of
energy that is actually acquired by similarly charged ions.
In a linear instrument these differences result in slight
differences in times of flight which results in broader
peaks and lower mass accuracy. As described previously in
terms of resolving fragment ions, a reflectron also
compensates for similarly charged ions having slightly
different overall energies (again, the more energetic ions
that have slightly faster velocities will penetrate further
into the ion mirror and hence be slightly delayed relative
to less energetic ions - thus both will tend to reach the
detector at the same time). As a result, a reflectron
improves both resolution and mass accuracy. Although there
is always the possibility of observing fragmentation ions
when using the reflectron (and mistaking these for
contaminating peptide ions), by adjusting the settings on
the instrument it is possible to minimize the possibility of
seeing peptide fragmentation in the reflectron mode. Perhaps
the best approach, however, is to first screen for purity in
the linear mode and then obtain a more accurate mass by
shooting the target again in the reflectron mode. Typically,
we observe average mass accuracies (with external
calibration and when operating at very high sensitivity) of
about ±0.1% in the linear and ±0.05% (or lower) in
reflectron mode for tryptic peptides that often range in
mass between about 800 and 3,000 daltons. For an average
1,500 dalton peptide this means an expected error of about
±1.5 amu in linear and about ±0.75 in reflectron
mode. While either of these mass accuracies should be
sufficient to detect oxidation of methionine (+16) neither
would be sufficiently accurate to detect deamidation of an
asparagine (+1). In addition to making it easier to "fill in
holes" left from conventional Edman sequencing, increased
mass accuracy is also essential for identifying known
proteins based on peptide mass searching (see below). Using
this approach several peptide masses that we may obtain via
"off-line" MALDI-MS on an aliquot of a digest of the protein
can be used to identify the parent protein if its sequence
is in the database.
A very recent improvement in MALDI-MS that can be used on
both linear and reflectron equipped instruments is a phenomenon called
delayed extraction. In this case the high voltage field that results in
movement of the ions down the flight path is turned on several
microseconds after, instead of simultaneously with the target being hit
with the UV laser.
What is the difference between monoisotopic and average
peptide mass?
As shown in the table, below the atoms that make up the
naturally occurring amino acids found in proteins are not
isotopically pure.
|
Natural Abundance of Isotopes Commonly
Found in Proteins
|
|
Atom
|
Most Abundant Isotope
|
Next Most Abundant Isotope
|
|
Carbon
|
12C
|
98.9%
|
13C
|
1.11%
|
|
Nitrogen
|
14N
|
99.6%
|
15N
|
0.366%
|
|
Oxygen
|
16O
|
99.8%
|
18O
|
0.204%
|
|
Sulfur
|
32S
|
95.0%
|
34S
|
4.22%
|
As a result,
peptide mass spectra acquired at low resolution provide only
an average mass which reflects the relative abundances of
these isotopes. In contrast, peptide mass spectra acquired
at high resolution (e.g., MALDI-MS data acquired up
to about 3,000 Da with a reflectron) will show a cluster of
peaks differing by unit mass resolution. Hence, the lowest
mass peak in each cluster results from peptide molecules
which are entirely composed of atoms which contain only
the most abundant isotopes listed above. The next
highest mass peak, which will appear at +1 amu, will consist
of peptide molecules which contain a single
13C, 15N, 18O, or
32S atom. Typically, an average 15-mer tryptic
peptide might appear in a high resolution MALDI spectrum as
a cluster of 4-5 peaks with the first two peaks, which might
be approximately equal in size, being the most intense.
Peptide mass database searching requires that the
monoisotopic peak (resulting from peptide molecules which
contain only the most abundant isotopes listed above) be
accurately identified. When the signal to noise ratio is
low, as it often is when <5% aliquots of Coomassie Blue
stained gel bands are analyzied which contain substantially
less than picomole amounts, and when there is substantial
overlap between the isotope envolopes for different
peptide ions, which becomes much more likely as the MW of
the protein increases substantially above 50,000, great care
must be taken to identify the monoisotopic peptide ion peaks
- with the expertise and labor required to complete this
task accounting for a significant fraction of the cost of
the Manual
MALDI-MS Protein Identification Service.
MALDI-MS in the Keck Laboratory
Matrix assisted laser desorption ionization mass
spectrometry is routinely carried out on 1.0 µl
(usually <5%) of in gel digests (to enable peptide mass
database searching - see below) and on aliquots of peptides
isolated via reverse phase HPLC. As noted, the MALDI mass
spectrum provides an additional criterion to judge peptide
purity and the mass of the peptide is often helpful as a
further confirmation of the sequence determined by Edman
degradation. In each case, the sample is mixed with 1.0
µl of alpha-cyano-4-hydroxy cinnamic acid (CHCA) matrix
solution and then spotted onto a new target. The samples are
then allowed to air dry at room temperature. To avoid any
possible cross-contamination, all targets are used only
once. The alpha-CHCA matrix solution is prepared at a
concentration of 4.5 mg/ml in 50% CH3CN, 0.05%
TFA and is used after vortexing and standing for a few
minutes. Matrix solutions are prepared fresh daily. The
calibrants used for external calibration of peptides are
bradykinin (average (M+H) is 1061.23) and ACTH Clip (average
(M+H) is 2466.70). Both calibrants are stored at -20 degrees
C as 500 fmol/µl stocks in 50% CH3CN, 0.05%
TFA. The same calibrants (100 fmol each) are used for
internal calibration of in gel digests except that under the
conditions used for this mass spectrometry we are able to
calibrate on the monoisotopic (M+H) mass of bradykinin which
is 1060.57. Generally, we observe a mean mass error of about
0.1% with external calibration. Since the standard deviation
for the mass error with external calibration is 0.1%, there
is a 68.3% probability the observed mass will be within
± 0.1%, a 95.4% probability it will be within
±0.2% and a 99.7% probability it will be within
±0.3% (i.e., within 3 standard deviations) of the
theoretical mass.
The Micromass M@LDI-L/R is a linear /
reflectron time-of-flight mass spectrometer designed for high
throughput protein identification. This system can be used to analyze
intact protein/peptide masses (normally done in linear mode up to about
m/z = 400,000) or protein digests (reflectron mode). Often, samples
destined for an intact mass contain salts that have to be removed prior
to analysis using a C18 or C4 ZipTip (Millipore). Trypsin digests do
not typically require de-salting, so 1.8µl of the digest mixture can be
applied directly to the MALDI target after mixing with matrix plus
internal standards (i.e., 50 fmol bradykinin and 150 fmol ACTH clip
which have protonated monoisotopic masses of 1060.569 and 2465.199
respectively). The target plate has clusters of four sample spots
centered about a central spot which contains the external calibration
standards. The target plate can hold 96 samples plus the external
calibration standards. The M@LDI-L/R system uses a nitrogen laser (337
nm) and a '"fuzzy logic" algorithm to locate the best laser position on
the target sample in an automated mode of operation. Spectra are
acquired in a data dependent mode with the data system deciding when
and if an acceptable spectrum has been acquired. It also decides when
to abort further acquisition and to move on to the next sample. The
collected spectra for each sample are averaged together, and the
calibration is adjusted using the "lockmass" internal standard. Mass
lists for database searching are made by hand and searched using
ProFound and
Mascot against the NCBInr protein database. A printout of the
search results is then produced which lists the comparative score for
possible protein identifications for each sample from which acceptable
spectra were acquired. For additional information on the database
searching algorithms, please visit the above web sites for each.
It is important to understand that the output from both
the
Mascot and
ProFound
searching programs is dependent not only upon the mass
spectrometry data but also upon several input parameters.
The following provides a brief discussion of each of these
parameters.
- Taxonomic Category: The ProFound
program allows the taxonomic category to be specified and
since this is always known beforehand the first
inclination might be to always take advantage of this
information to quickly narrow the searches. However, if
the complete genome for the species from which the
protein derives is not known (e.g., human), it
is possible the identification might be missed if the
search is limited to human proteins. That is, if a highly
homologous protein has been sequenced in another
organism, a search that is not limited to human
proteins might succeed whereas a search of the human
database would fail. For this reason the Keck Laboratory
does not specify a particular taxonomic category but
rather searches the entire sequence database.
- Protein Mass Range: Since virtually
all proteins submitted for protein identification are
purified via SDS PAGE, this parameter is almost always
available and again, the first inclination would be to
take maximum use of this information to narrow the
searches. Two problems, however, stand in the way of
using a narrow MW range to enable more facile searches.
The first problem is that molecular weights derived from
SDS PAGE are often significantly in error. For instance,
although the ARPP-21 cAMP regulated phosphoprotein has an
estimated MW of 21,000 as determined by SDS PAGE, the
actual molecular weight of this protein is only 9,561 (as
predicted from the sequence of ARPP-21 that was
determined by conventional Edman degradation (Williams
et al, (1989)). The second problem that
occurs when a very limited protein mass range is used in
the search is that identifications may be missed if the
form of the protein that is in the database is not the
form that is actually isolated. Since many database
entries are based on translations of nucleic acid
sequences, they will contain, for instance, signal
peptides that are not found in the circulating form of
the protein and conversely, they will not contain
post-translational modifications that may be present.
Hence, as noted by Cottrell,
(1994), the circulating form of insulin has a
molecular weight of about 5,734 whereas the Swiss Protein
entry (INS_BOVIN) is actually the sequence of the
precursor protein which has a molecular weight of about
11,394. Keeping these factors in mind, we generally carry
out the first peptide mass search with a mass range that
extends from about one-half to twice the SDS PAGE
predicted MW. If the search fails to identify the
protein, then we normally will carry out a second search
without any mass range limitation.
- Peptide Mass Tolerance: Obviously,
the higher the mass accuracy, the fewer will be the
number of predicted peptides from proteins
besides the actual protein that was
digested that will have predicted masses that will be scored as
false positives and the higher will be the specificity of the
search. Currently, we use a mass accuracy tolerance of 70 ppm. To achieve this very
high mass accuracy routinely requires the use of an
instrument (like the Micromass
TofSpec SE in the Keck Laboratory) that is equipped
with delayed extraction. In addition, the sample must be
calibrated by adding internal standards (see above) and
the resolution must be sufficiently high that unit mass
resolution is achieved (i.e., the spectrum must
be acquired under conditions where an m/z of
1,500 is resolved from 1,501) so that the monoisotopic
mass of the peptide (corresponding to the peptide
containing only 12C isotopes - see above) is
determined. If unit mass resolution is not achieved, the
resulting peptide mass will be an average mass, which
will be an average of the mass of the 12C
version of the peptide as well as with the same peptide
containing various numbers of 13C,
15N, 18O, and 32S
atoms.
- Cysteine and Methionine
Modification: Since the Keck Laboratory does not
modify either of these residues prior to manual
in gel trypsin digestion, initial searches are
carried out assuming that both these amino acids are in
their native form. Two common modifications that can
occur (and that would prevent peptides that contained
these modifications from being identified) are methionine
oxidation and modification of cysteine by acrylamide free
radicals during SDS PAGE. Although it is difficult to
prevent methionine oxidation, a reasonable approach to
limiting the extent of modification of free cysteine
residues is to limit the concentration of acrylamide free
radicals in the SDS gel by letting the gel stand overnight prior to
use.
- Number of Missed Cleavages:
Incomplete tryptic cleavage at every lysine and
arginine may be caused by an adjacent proline residue,
nearby acidic residues, and/or a generally poor digest
that may result from a very low amount of protein, from
the presence of a significant fraction of carbohydrate
and perhaps, to higher order protein structure that
remains after SDS PAGE. For these reasons, our initial
peptide mass searches allow one missed cleavage - which
we believe is generally sufficient to reasonably account
for this problem. In the case of the Peptide Search
program, the second pass search will miss as many
cleavage sites as necessary to match a measured mass
within the specified tolerance.
- Criteria for Protein Identification:
In contrast
to the Peptide Search program, the ProFound Program gives
a probability score for each protein that is found that
meets the minimum input criteria. The probability score,
which is described in more detail in the ASMS
abstract that may be found in the ProFound
Web site, takes into account a number of factors
including the fraction of predicted peptides that are
matched and the actual error between the peptide masses
predicted for the "identified" protein and the observed
peptide masses. Currently, the minimum ProFound search
criteria we require for a protein identification are a
probability score of 1.0 (i.e., 100%) and that
observed peptide masses must be matched to at least 25%
of the predicted protein sequence. Although most top
scoring proteins that have met our requirements for a
Peptide Search identification have also met our
requirements for a ProFound identification (i.e.,
ProFound score of 1.0), the Keck Laboratory will
only identify a protein through its Manual MALDI-MS
Protein Identification Service if it meets our
requirements for an identification for both the Mascot and ProFound Search algorithms.
Confirmation of Manual and Automated MALDI-MS
Protein Identifications Based on Peptide Mass Database
Searching
Several additional factors can be brought into play to
either confirm or weaken an identification based on peptide
mass searching. Firstly, particularly in the case of an
organism like Hemophilus, whose genome has been
sequenced and which encodes a relatively modest number of
proteins (about 1,700 as compared to 80,000 in the human
genome), the identified protein should be from the correct
species. Indeed, this is one of the required creteria for
the hgih throughput service. Secondly, the MW observed by
SDS PAGE generally will be within a one-half to
two-fold "window" around the predicted MW or there will be a
good explanation for it falling outside this window
(e.g., if the observed mass is less than one-half
the predicted mass the protein may be a limited cleavage
fragment - a hypothesis that can be confirmed by noting the
location in the sequence of the matching peptides). Thirdly,
if the sample was isolated by 2D gel electrophoresis, the
observed versus predicted pI provides an additional
criterion. Fourthly, once a tentative identification has
been made, other approaches (such as Western
Blotting/antibody detection) and the known properties of the
identified protein (e.g., ability to bind nucleic
acids) may be used to quickly bolster or cast doubt on the
identification based on peptide mass searching. Regardless,
the Keck Laboratory strongly recommends that additional
studies (MS/MS protein
identification or "conventional" internal
Edman sequencing) be carried out prior to
drawing firm conclusions. Finally, it must be kept in mind
always that if the sample has not been purified
sufficiently, the major protein in the Coomassie Blue
stained band may not be the protein that is responsible for
the activity being followed. In this regard it is interesting that both
peptide mass and MS/MS data indicated that an "unknown" 2D gel spot
that was analyzed in the Keck Laboratory actually contained at least 4
different proteins.
|