Why do my MAS 5.0 values differ?

Written by Ben Bolstad

What am I talking about?

I am talking about the MAS values that you might get when using the Bioconductor package affy. In particular comparing these values with those from Affymetrix MAS software. Please note that this document refers to results from version 1.2.7 of affy, but should apply to all later versions. The results may not hold true for older versions of the affy package ie the Bioconductor 1.1 release.

How were the MAS methods implemented in affy?

The methods were implemented based upon available documentation. In particular a useful reference is Statistical Algorithms Description Document by Affymetrix. Our implementation is based on what is written in the documentation and as you might appreciate there are places where the documentation is less than clear.
All source code of our implementation is available. You are free to read it and suggest fixes.

Using public data to compare implementations

I will use data from 30 liver arrays from the Genelogic dilution study which may be obtained here. I will ignore normalization and just look at unscaled expression values.

Computing MAS numbers using "affy"

The expression values were computed using the expresso command. This document refers to results from version 1.2.7 of affy, but other versions should behave similarly. The exact command used was
eset <- expresso(Data, bgcorrect.method="mas",normalize=FALSE,pmcorrect.method="mas",summary.method="mas")

Comparing values

First make an M vs A plot comparing the MAS values from "affy" we will refer to this as values from E* to the MAS values from Affymetrix we will refer to these as values from A*. Such a plot is shown for 4 arrays below.

Comparing without rounding

Rounding has an effect

MAS 5 values A* are typically rounded to 1 decimal place. Applying this same rounding to our values from E* and then repeating the same plots we get much fewer disagreements. My intuition for the funnel like shapes would also be that these are also rounding issues (just at the edges of the rounding region).

Comparing with rounding

Is it the same probesets that disagree each time?

To answer the question of whether it is the same probesets that are disagreeing each time plot M values from one chip comparison against those from another. If these are the same probesets we should see points in the diagonal areas, if they are not then we will see vertical and horizontal lines.

Compare Across

So we actually see that it is not the same probesets disagreeing each time. Possibly something magical is happening it is not yet clear.

How many probesets are disagreeing?

The table below is using the most stringent criteria to determine disagreement in column 2 and 3
sum(abs(log2(Astar[,i])-log2(round(Estar[,i],1))) > 0)
and a slightly less stringent criteria for columns 4 and 5
sum(abs(log2(Astar[,i])-log2(round(Estar[,i],1))) > 0.005)

ArrayNumber disagreeing (Strict)Fraction of probesets (Strict)Number disagreeingFraction of probesets
1452 0.036 124 0.010
2433 0.034 196 0.015
3432 0.034 158 0.013
4342 0.027 127 0.010
5432 0.034 191 0.015
6358 0.028 138 0.010
7470 0.037 217 0.017
8 294 0.023 139 0.011
9214 0.017 98 0.008
10450 0.036 259 0.020
11509 0.040214 0.017
12362 0.028 187 0.015
13179 0.014 87 0.007
14 267 0.021 133 0.010
15 425 0.034 283 0.022
16394 0.031210 0.017
17234 0.019 151 0.012
18231 0.017 112 0.009
19220 0.018 134 0.011
20418 0.033 283 0.022
21240 0.019 134 0.011
22232 0.018 167 0.013
23238 0.019 157 0.012
24146 0.012 103 0.008
25249 0.020 196 0.015
26216 0.017 164 0.012
27168 0.013 136 0.011
28215 0.017162 0.012
29201 0.016 145 0.011
30145 0.011133 0.010

Concluding Statements

There are differences between our implementation and that of Affymetrix. Most of the differences are small and usually only a small fraction differ in value. I think alot of these differences can also be traced back to rounding issues.

Differences between implementations will likely always exist but we have seen that they are not particularly drastic.

When comparing values from A* (Affymetrix) with values from E* (affy/bioconductor) you should also keep in mind whether you are comparing normalized/unnormalised data. Also remember the possibility of mask's/Affymetrix Outliers affecting your analysis. If you look in the [MASKS] and [OUTLIERS] sections of your cel files you will be able to see how many probes may be excluded using the MAS 5.0 software. Currently "affy" does not handle excluding probes in this manner particularly well. If there is a large number of such probes, then it is likely that these will have quite a drastic effect if you exclude them from your analysis.


Send me an email at bmb@bmbolstad.com