Mixture CDF environments
Written by Ben Bolstad
email bmb@bmbolstad.com
|
What is a mixture CDF environment?
A mixture CDF env is an R package that can be used with the Bioconductor package affy. It should be used when you have chips of both version 1 and version 2 of a particular chip type. The environment contains only probesets which appear on both chip types, thus it is neither one version or the other but a mixture of the two. For the MGU74A and MGU74Av2 chips there are 10043 common probesets on each array. In total there are 12488 probesets on MGU74Av2 chips and 12654 on the MGU74A chips. Using the mixture CDF env will ignore the non overlapping probesets. For the HGU95A and HGU95Av2 there are 12600 common probesets. In total there are 12625 probesets on the HGU95A and 12626 on the HGU95Av2.
|
What exactly is in the packages, which probesets are included?
For the mix packages we restrict to probesets which have the same name and same location on both versions of the chip. For the restrict packages we restrict to probesets which have the same name, same location and known sequence information. By known sequence information we mean information from the *_probe_tab files as downloaded from the Affymetrix download center on Jun 21, 2003. There is no restrict package for MGU74A because we have sequence information for all the probes on those chips.
We do not consider the case of probesets which have probes with the same sequence except for a few changed probes (and therefore different probeset id and different locations). In other word we do not create probesets that are restrictions to common probes..
|
Downloads
Special note: The windows binary packages are currently built for R-2.2.1.
|
An example
You will need to take two affybatch's and merge them together. The CDF name must be set to something suitable so that the appropriate mixture CDF environment package is used.
This sample code will demonstrate what you should do.
|
What disadvantages might there be?
You will be throwing away some probesets, possibly a sizable proportion. For the MGU74A/MGU74Av2 this will be about 1/5 of the probesets. You might not wish to throw out his much data. You should strongly consider the wisdom of mixing data from different chip versions together. It is possible that sources of non-biological variability will be overwhelming in this case. Also possible biases that might be introduced by filtering out probesets.
|
Questions/Comments/Problems?
Send me an email at bmb@bmbolstad.com.
|
|
|