|
Keck
Home Page >
DNA Sequencing >
Data Interpretation
Interpretation of the Four
Color Plot
of the Fluorescent Electrophoretic Data
On average, using good template and primer, Taq/dye-terminator
cycle sequencing will provide 500-600 bases of sequence with a 98-99
% accuracy (exceptional template-primer combinations will yield
650-750 bases with 98% accuracy). After 600-650 bases, the resolution
between peaks decreases and the software has difficulty accurately
determining the exact number of bases in runs of the same base.
Consequently, the error rate usually increases dramatically and may
be as much as 10% at 550-650 bases. Furthermore, because the software
utilizes a uniform spacing to call bases, it is slightly biased
towards inserting extra bases. Thus, one should be somewhat
conservative in data interpretation, particularly when designing
primers for primer walking: in general, for the best chance of
synthesizing a primer with the correct sequence and to provide
sufficient overlap between the two sequencing steps, one should
design a primer (see guidelines on the back of the sample sheet) in
the region between bases 450 and 550.
Unlike Amplitaq polymerase, Taq FS polymerase demonstrates greatly
reduced discrimination between incorporation of the four
fluorescent-dideoxynucleotide terminators leading to relatively
uniform peak sizes. Despite the increased uniformity of terminator
incorporation, Taq FS data do exhibit some recognizable patterns
which are useful for sequence interpretation:
- G's following
A's are weak and may be very
weak, leading to a dropout peak.
Editing the Sequence File
Sequence data is provided in a computer-readable format either via
ftp-server. Usually the sequence is in GCG format, ready for use by the University of
Wisconsin Genetics Computer Group (UWGCG) programs which are available on the
VAX computer in the Yale Biomedical Computing Unit. Before analyzing the raw sequence data or aligning it
with previously determined sequence, use a sequence editing program
with the electropherogram as a guide to truncate the sequence by
removing:
- unreliable data at the beginning of the sequence (usually the
first 10-20 bases) which is due to the analysis software starting
base calling before a uniform stream of fluorescent peaks is
present in the electrophoretic data.
- any relevant vector sequences.
- unreliable data at the 3' end of the sequence (beginning in
the region of 550-650 bases for ds plasmid DNA and large PCR
fragments) which is due to the decreasing resolution of large DNA
fragments (broadening and overlap of fluorescent peaks).
- for PCR products, data past the physical end of the PCR
fragment.
If you need assistance in interpreting your
sequence data, please call us at 737-2566 or email, dnasequencing@yale.edu.
|