Statistical Analysis of Low-level High Density Oligonucleotide Array Data Benjamin M. Bolstad University of California, Berkeley Berkeley CA USA bolstad@stat.berkeley.edu When running an experiment using the Affymetrix GeneChip(R) system, it is important to have good quality gene expression estimates. At the low level, we work with Perfect Match (PM) probe intensity data. A GeneChip(R) has multiple PM probes, each interrogating the same gene. Information from these probes may be combined together to compute an expression estimate. I will discuss the three major steps used when computing the Robust Multi-chip Average (RMA) expression measure: background correction, normalization and summarization. Background correction is the process of removing noise, particularly in the low intensity range. The goal of normalization is to remove unwanted non-biological sources of variation. To normalize data from multiple arrays, RMA makes use of a simple non-parameteric, non-linear algorithm called quantile normalization. Summarization is the process where information from multiple probes is combined to compute an expression measure. In the context of RMA, this is done by fitting a robust form of linear model to probe information from multiple chips. I will then compare RMA with other established expression measures using two very important statistical measures: bias and variance.