Mixture CDF environments

Written by Ben Bolstad
email bmb@bmbolstad.com

What is a mixture CDF environment?

A mixture CDF env is an R package that can be used with the Bioconductor package affy. It should be used when you have chips of both version 1 and version 2 of a particular chip type. The environment contains only probesets which appear on both chip types, thus it is neither one version or the other but a mixture of the two. For the MGU74A and MGU74Av2 chips there are 10043 common probesets on each array. In total there are 12488 probesets on MGU74Av2 chips and 12654 on the MGU74A chips. Using the mixture CDF env will ignore the non overlapping probesets. For the HGU95A and HGU95Av2 there are 12600 common probesets. In total there are 12625 probesets on the HGU95A and 12626 on the HGU95Av2.

What exactly is in the packages, which probesets are included?

For the mix packages we restrict to probesets which have the same name and same location on both versions of the chip. For the restrict packages we restrict to probesets which have the same name, same location and known sequence information. By known sequence information we mean information from the *_probe_tab files as downloaded from the Affymetrix download center on Jun 21, 2003. There is no restrict package for MGU74A because we have sequence information for all the probes on those chips.

We do not consider the case of probesets which have probes with the same sequence except for a few changed probes (and therefore different probeset id and different locations). In other word we do not create probesets that are restrictions to common probes..

Downloads

"mix" packages
PackageUnix/Linux source packageWindows
mgu74av12mixmgu74av12mixcdf_1.0.tar.gzmgu74av12mixcdf_1.0.zip
hgu95av12mixhgu95av12mixcdf_1.0.tar.gzhgu95av12mixcdf_1.0.zip
mgu74bv12mixmgu74bv12mixcdf_1.0.tar.gzmgu74bv12mixcdf_1.0.zip
mgu74cv12mixmgu74cv12mixcdf_1.0.tar.gzmgu74cv12mixcdf_1.0.zip

Special note: The windows binary packages are currently built for R-2.2.1.

Probesets in the "mix" package
PackageProbesets common to both versionsProbesets only on Version 1Probesets only on Version 2
mgu74av12mixOverlapping probesetsOnly on V1Only on V2
hgu95av12mixOverlapping probesetsOnly on V1Only on V2

"restrict" packages
PackageUnix/Linux source packageWindows
hgu95av12restricthgu95av12restrictcdf_1.0.tar.gzhgu95av12restrictcdf_1.0.zip

Probesets in the "restrict" package
PackageProbesets common to both versionsProbesets only on Version 1Probesets only on Version 2Probesets with no sequence information
hgu95av12restrictOverlapping probesetsOnly on V1Only on V2No sequence probesets

An example

You will need to take two affybatch's and merge them together. The CDF name must be set to something suitable so that the appropriate mixture CDF environment package is used.

This sample code will demonstrate what you should do.

What disadvantages might there be?

You will be throwing away some probesets, possibly a sizable proportion. For the MGU74A/MGU74Av2 this will be about 1/5 of the probesets. You might not wish to throw out his much data. You should strongly consider the wisdom of mixing data from different chip versions together. It is possible that sources of non-biological variability will be overwhelming in this case. Also possible biases that might be introduced by filtering out probesets.

Questions/Comments/Problems?

Send me an email at bmb@bmbolstad.com.