Why do my MAS 5.0 values differ? 

Updated June 8, 2006 The previous results are to be found here. The recent changes have improved agreement even more.  
What am I talking about?I am talking about the MAS values that you might get when using the Bioconductor package affy. In particular comparing these values with those from Affymetrix MAS software. Please note that this document refers to results from version 1.11.1 of affy, but should apply to all later versions. The results may not hold true for older versions of the affy package. 

How were the MAS methods implemented in affy?The methods were implemented based upon available documentation. In particular a useful reference is Statistical Algorithms Description Document by Affymetrix. Our implementation is based on what is written in the documentation and as you might appreciate there are places where the documentation is less than clear.All source code of our implementation is available. You are free to read it and suggest fixes. 

Using public data to compare implementationsI will use data from 30 liver arrays from the Genelogic dilution study which may be obtained here. I will ignore normalization and just look at unscaled expression values. 

Computing MAS numbers using "affy"The expression values were computed using the expresso command. This document refers to results from version 1.11.1 of affy, but other versions should behave similarly. The exact command used was 

Comparing valuesFirst make an M vs A plot comparing the MAS values from "affy" we will refer to this as values from E* to the MAS values from Affymetrix we will refer to these as values from A*. Such a plot is shown for 4 arrays below.  
Rounding has an effectMAS 5 values A* are typically rounded to 1 decimal place. Applying this same rounding to our values from E* and then repeating the same plots we get much fewer disagreements. My intuition for the funnel like shapes would also be that these are also rounding issues (just at the edges of the rounding region).  
Is it the same probesets that disagree each time?To answer the question of whether it is the same probesets that are disagreeing each time plot M values from one chip comparison against those from another. If these are the same probesets we should see points in the diagonal areas, if they are not then we will see vertical and horizontal lines. So we actually see that it is not the same probesets disagreeing each time. Possibly something magical is happening it is not yet clear.  
How many probesets are disagreeing?The table below is using the most stringent criteria to determine disagreement in column 2 and 3sum(abs(log2(Astar[,i])log2(round(Estar[,i],1))) > 0) and a slightly less stringent criteria for columns 4 and 5 sum(abs(log2(Astar[,i])log2(round(Estar[,i],1))) > 0.005)
 
Concluding StatementsThere are differences between our implementation and that of Affymetrix. Most of the differences are small and usually only a small fraction differ in value. I think alot of these differences can also be traced back to rounding issues.Differences between implementations will likely always exist but we have seen that they are not particularly drastic. When comparing values from A* (Affymetrix) with values from E* (affy/bioconductor) you should also keep in mind whether you are comparing normalized/unnormalised data. Also remember the possibility of mask's/Affymetrix Outliers affecting your analysis. If you look in the [MASKS] and [OUTLIERS] sections of your cel files you will be able to see how many probes may be excluded using the MAS 5.0 software. Currently "affy" does not handle excluding probes in this manner particularly well. If there is a large number of such probes, then it is likely that these will have quite a drastic effect if you exclude them from your analysis.  
Questions/Comments?Send me an email at bmb@bmbolstad.com 

