Search Keck Sites:


W.M. Keck Facility
 Yale University
 300 George Street
 Addresses
 
 Contact Us

Yale University School of Medicine

Keck Home Page > Biostatistics Resource > Overview

MICROARRAY DATA ANALYSIS OVERVIEW

Microarray technology is a very powerful tool for medical and biological research which allows the monitoring of expression levels of thousands of genes simultaneously. Performing microarray experiments and getting the results is not the end but just the beginning. Microarray experiments generate overwhelmingly large amount of data. In order to make sense out of this data one needs to use sophisticated statistical software and tools. Various sources have developed many software packages for analyzing microarray data. We also have developed some advanced statistical analysis methods in house. It has been amply proven that the analysis results provide in depth understanding on gene regulations.

Microarray data can be analyzed using several approaches based on research goals. The basic approach of microarray data analysis is simple discrimination of differentially expressed genes. Clustering analysis is used widely to identify clusters of genes with correlated patterns of expression. Classification methods have proven very useful to identify patterns of gene expression that can be correlated with diagnostic classification and for classifying genes according to their functional role. Based on clustering analysis results and other information, data can be interpreted with respect to biological pathways. Certainly, there are other analyses such as functional annotation can be done based on experimental design and the scientific questions you are going to address.

Identification of Differentially Expressed Genes

A natural first step in extracting microarray data information is to examine the genes with significant differential expressions in individual samples or different conditions. This simple technique is extremely efficient, for example, in screens for drug targets. As there are thousands of genes in a microarray chip, it is neither possible nor necessary to follow up all the genes. Analyses such as T test, ANOVA, and other tests can identify which genes show good evidence of being differentially expressed. The genes could be initially ranked in order of evidence for differential expression from strongest to weakest evidence. Then a cut-off value could be chosen to select a subset of genes based on a given criterion, either statistical or biological. By the simple differential analysis, the genes to be followed will be reduced from several thousands to hundreds or dozens. These are the candidate genes for confirmation and further study.

Cluster Analysis

Beyond identification of differentially expressed genes, clustering of genes from multiple experiments into groups with similar expression patterns is required for further function annotation and diagnostic classification. Genes clustered in the same group share similar expression profile, which give clues that the unknown genes may have functions or pathways of the respective groups they cluster in. Hierarchical and nonhierarchical algorithms such as k-means, self-organizing maps, principal component analysis and other methods have been implemented to cluster similar expressed genes and expression patterns.


 

       


Classification

Traditional disease classification is mainly based on morphology, pathology and biochemistry. However, classification at the gene expression level will be more accurate and more useful for diagnosis and treatment.

DNA microarray experiments generate thousands of gene expression data from tissue and cell samples regarding gene expression profiles. The data can be used to discriminate between different types of tissues. The data can also be used to identify new subclasses of an existing class of phenotype. This means that microarray techniques may lead to a more complete understanding of the molecular variations among individuals at the gene level. The challenge of disease treatment is to target specific therapies to genetically distinct disease types to maximize efficacy and minimize toxicity. Improvements in classification have thus been crucial to advances in disease treatment.

 

    Top of Page
Medical Center Yale-New Haven Hospital Yale University

Copyright © 2003, Yale University, New Haven, Connecticut, USA. All rights reserved.
Comments or suggestions to site editor.

Last modified: 30-Aug-2005 (GB)