Some FAQ about computing the RMA expression measure 

What is RMA?RMA is the Robust Multichip Average. It consists of three steps: a background adjustment, quantile normalization (see the Bolstad et al reference) and finally summarization. Some references (currently published) for the RMA methodology are:


Which paper should I cite for RMA?Ideally you will cite all three. However, if you wish to cite only a single paper then my recommendation is that you cite
You can download bibtex citation information for all three papers by clicking here. 

How does RMA compare to other expression measures?The Irizarry et al (2003) Nucleic Acids Research paper compares RMA to the dChip and MAS 5.0 algorithms.Another place to look is the affycomp website which provides a framework and competition for comparing expression measures. 

What software is available for computing RMA expression measures?There are several freely available options for computing the RMA expression measure.


I want to use software X to compute RMA expression values, is this ok?Your guess is as good as mine as to whether it does the right thing or not. Unless it is my own software I can not vouch for its peformance one way or the other. Note that in the past I have seen a lot of people call things RMA, which I would not necessarily call RMA. If you find for instance that the RMA expression values you get from software X differ from those you get from Y you should take the issue up with the provider of that software, not with the RMA authors.  
How do I use the Bioconductor software to compute RMA expression measures?


I am using Bioconductor, can I use expresso or justRMA instead of rma?Absolutely. If you prefer to use these functions you are free to do so. The function rma() should be considered the canonical implementation of RMA, but both expresso() and justRMA() should give you the same expression estimates (as of current writing which would refer to the 1.2.27 version of the affy package). If you use versions of the package prior to the Bioconductor 1.2 release the values might not agree. To use the expresso() to compute RMA expression measures
The other option is to use the justRMA() command. This command was contributed to the affy package by James MacDonald (jmacdon@med.umich.edu). In the underlying code it calls the same C routines used by rma() but it differs from the rma() and expresso() commands in that it does not require you to use ReadAffy() instead you specify the filenames of your CEL files in the call to the function and it produces an exprSet with the RMA expression measures. The easiest way to use justRMA() is to use


I really don't want to use R, can I get RMA expression measures another way?If you use a machine with the Windows operating system you might want to consider using RMAExpress which is a stand alone program for computing RMA values. This program is opensourced and can also be compiled on Linux/Unix systems. 

I have a really huge dataset with hundreds (or thousands of arrays) that I want to process together. What should I do?In this situation you are likely to really struggle with memory problems if you just try using rma(). As mentioned above justRMA() is another option which is more memory efficient but may still reach memory problems. A third option is to try justRMAlite() in AffyExtensions. If you want to go non R based, then is another option with lower memory overhead and ability to scale to large datasets. 

Can I get standard error estimates since rma() does not seem to return any?Yes. If you use expresso() then you will get adhoc estimates of the standard errors. Another option is to use the affyPLM package which provides other methods of robust linear model fitting (beyond the medianpolish for which there is no specific way of computing a standard error estimate) and standard error estimates. 

Should I compute RMA expression measures separately for different treament groups?We generally recommend that you compute your expression measure for all chips, irrespective of treatment (condition), together as one batch. This is because we have observed that non biological variation is not as easily reduced if chips are analysed in groups.  
I have MGU74A and MGU74Av2 chips which I want to combine in an analysis. What should I do?One option is to work with the two sets separately. Then you may some how combine the two datasets together.The other option is to remove the probesets which are not on both versions of the array and work only with the common probesets. You can read more about that option here: Mixture CDFenv. 

I am using affy/Bioconductor. How much memory do I need? What is the maximum number of arrays I can process to get RMA estimates?The general rule is the more memory you can afford the better. I would recommed that if you want to do a moderate amount low level analysis you have at leat one GB of RAM.The number of arrays that you can process (and how long it might take) depends on a number of things including


Can you please explain the RMA background?The traditional RMA background is to assume that the observed signal is the convolution of a Normal background (N) mean mu variance sigma^2 and exponential signal (S) with mean alpha. That is we observe O = S + N. Under these conditions the background corrected intensity values are given by E(S  O). You can see the exact formula in this document. You can find a derivation of this formula with greater discussion on pages 1721 of this Dissertation. 

I just want to run the background and/or normalization steps of the RMA method (no summarization). How might I do it?If you are interested in applying the RMA preprocessing to just the PM probes of an affybatch Data. You might find the following code slightly more efficient than applying the standard operations bg.correct(),normalize() to an affybatch.my.PM < apply(pm(Data), 2, bg.adjust)


Are there any papers where the RMA expression measure has been used?Yes. There have been a number, a subsample is below. If you would to add your paper to this list please email me.


Is it proper to say "I normalized my data using RMA"?It is not really precise enough to say this. Normalization is one stage in the process of computing the RMA expression measure, but certainly not the only process. It is more correct to talk about having "computed RMA expression values" or that you used "RMA preprocessing". More details on this issue are addressed in this BioC mailing list posting.All that said, the terminology normalization has become pretty synonmous with preprocessing in the microarray analysis context, and I am not the one to fight against this tidal wave of behavior, so you are free to make your own choiced. 

How come the distribution of expression values across arrays is not identical? I thought quantile normalization was meant to ensure that, wasn't it?Answering the second question first, yes indeed quantile normalization as it is implemented in the RMA procedure does indeed insure identical distributions on all the arrays normalized together. However, in RMA the quantile normalization step is carried out at the probelevel, rather than the probeset level. This means that the distribution of probe intensities is identical across arrays before the median polish summarization. After summarization wou will typically find that the distributions are close, but not identical. Really poor data quality arrays may have significantly different distributions after summarization. 

DEFUNCT QUESTIONThe RMA values I computed using Bioconductor version 1.1 differ from those I compute in version 1.2, how do I duplicate 1.1 results using 1.2?If you are using the the affy package from the 1.2 release, you may duplicate results by using the parameter bgversion=1 in your rma() function call. The two versions differ only in the background correction step. 

