Why do my MAS 5.0 values differ?

Written by Ben Bolstad
Updated June 8, 2006 The previous results are to be found here. The recent changes have improved agreement even more.

What am I talking about?

I am talking about the MAS values that you might get when using the Bioconductor package affy. In particular comparing these values with those from Affymetrix MAS software. Please note that this document refers to results from version 1.11.1 of affy, but should apply to all later versions. The results may not hold true for older versions of the affy package.

How were the MAS methods implemented in affy?

The methods were implemented based upon available documentation. In particular a useful reference is Statistical Algorithms Description Document by Affymetrix. Our implementation is based on what is written in the documentation and as you might appreciate there are places where the documentation is less than clear.
All source code of our implementation is available. You are free to read it and suggest fixes.

Using public data to compare implementations

I will use data from 30 liver arrays from the Genelogic dilution study which may be obtained here. I will ignore normalization and just look at unscaled expression values.

Computing MAS numbers using "affy"

The expression values were computed using the expresso command. This document refers to results from version 1.11.1 of affy, but other versions should behave similarly. The exact command used was
eset <- expresso(Data, bgcorrect.method="mas",normalize=FALSE,pmcorrect.method="mas",summary.method="mas")

Comparing values

First make an M vs A plot comparing the MAS values from "affy" we will refer to this as values from E* to the MAS values from Affymetrix we will refer to these as values from A*. Such a plot is shown for 4 arrays below.

Comparing without rounding

Rounding has an effect

MAS 5 values A* are typically rounded to 1 decimal place. Applying this same rounding to our values from E* and then repeating the same plots we get much fewer disagreements. My intuition for the funnel like shapes would also be that these are also rounding issues (just at the edges of the rounding region).

Comparing with rounding

Is it the same probesets that disagree each time?


To answer the question of whether it is the same probesets that are disagreeing each time plot M values from one chip comparison against those from another. If these are the same probesets we should see points in the diagonal areas, if they are not then we will see vertical and horizontal lines.

Compare Across

So we actually see that it is not the same probesets disagreeing each time. Possibly something magical is happening it is not yet clear.

How many probesets are disagreeing?

The table below is using the most stringent criteria to determine disagreement in column 2 and 3
sum(abs(log2(Astar[,i])-log2(round(Estar[,i],1))) > 0)
and a slightly less stringent criteria for columns 4 and 5
sum(abs(log2(Astar[,i])-log2(round(Estar[,i],1))) > 0.005)

ArrayNumber disagreeing (Strict)Fraction of probesets (Strict)Number disagreeingFraction of probesets
1 210 0.017 63 0.005
2 227 0.018 106 0.008
3 216 0.017 88 0.007
4 188 0.015 75 0.006
5 231 0.018 107 0.008
6 211 0.017 85 0.007
7 233 0.018 100 0.008
8 159 0.013 74 0.006
9 108 0.009 49 0.004
10 225 0.018 123 0.010
11 263 0.021 104 0.008
12 196 0.016 100 0.008
13 92 0.007 42 0.003
14 123 0.010 60 0.005
15 209 0.017 142 0.011
16 190 0.015 102 0.008
17 120 0.010 74 0.006
18 109 0.009 55 0.004
19 129 0.010 76 0.006
20 186 0.015 127 0.010
21 106 0.008 61 0.005
22 123 0.010 90 0.007
23 108 0.009 67 0.005
24 73 0.006 52 0.004
25 136 0.011 106 0.008
26 118 0.009 94 0.007
27 89 0.007 70 0.006
28 103 0.008 73 0.006
29 98 0.008 69 0.005
30 75 0.006 70 0.006

Concluding Statements

There are differences between our implementation and that of Affymetrix. Most of the differences are small and usually only a small fraction differ in value. I think alot of these differences can also be traced back to rounding issues.

Differences between implementations will likely always exist but we have seen that they are not particularly drastic.

When comparing values from A* (Affymetrix) with values from E* (affy/bioconductor) you should also keep in mind whether you are comparing normalized/unnormalised data. Also remember the possibility of mask's/Affymetrix Outliers affecting your analysis. If you look in the [MASKS] and [OUTLIERS] sections of your cel files you will be able to see how many probes may be excluded using the MAS 5.0 software. Currently "affy" does not handle excluding probes in this manner particularly well. If there is a large number of such probes, then it is likely that these will have quite a drastic effect if you exclude them from your analysis.

Questions/Comments?

Send me an email at bmb@bmbolstad.com