AffyExtensions - An R package

Written by Ben Bolstad
Special Note For the vast majority of people you should look at affyPLM for the functionality you previously found in AffyExtensions


The package AffyExtensions contains a set of routines that can be called from within R. It extends the functionality of the R-packages: affy and affyPLM. These are themselves components of the Bioconductor project.


You might ask why I don't just seek to get these functions included into the base affy package. The reason is that one of the main goals of that package is to be general and flexible, allowing one to add new methods using the framework of the package. The code in AffyExtensions will in general be less flexible that that in the main package, but I trade that off for speed of execution. Some code that makes it into this package will eventually be integrated into the main packages. A fast RMA based upon the code in this package was integrated into the main affy package as of release bioconductor 1.1. Another purpose of this code is as a testing ground for stability purposes. Because of this things may occasionally change drastically, eg 0.3 and 0.4 series are completely different. In otherwords things change rapidly in AffyExtensions with much speculative work.

How does AffyExtensions relate to Bioconductor, affy and affyPLM?

For the most part AffyExtensions has been Bioconductor compliant. Meaning that it would work satisfactorily and in combination with Bioconductor packages. AffyExtensions has also served as a place for some experimental and sometimes highly speculative work. Some routines have eventually become stable or well developed enough that they have transitioned into the base affy package: examples include RMA.C() of AffyExtensions 0.3.X which became the rma() routine in affy at the 1.1 release, also parsing routines developed in the 0.5 series of AffyExtensions have become the default routines in the affy 1.3.X series.

affyPLM is a BioConductor package that started with a major portion of AffyExtensions 0.5-14. It is a direct port of that version of the package to R-1.8 with a few routines removed. This package concentrates on Probe Level Modelling procedures. Later on further developments from AffyExtensions were integrated into affyPLM. At the conclusion of the AffyExtensions 0.7-X series most of the major functionality had been integrated into affyPLM and subsequent development was carreid out on affyPLM rather than AffyExtensions.

To perhaps best understand the history and roadmap of the packages this diagram clarifies some of the interdependencies. Please click for a larger version

Short history of changes.

Approximate DatesVersionShort Description
Jun - Sep, 2002 pre 0.3 A collection of loose C and R code
Oct - Dec, 20020.3 SeriesA package supplying RMA.C(), which later became rma() in the affy package.
Jan - Mar 20030.4 SeriesThree Step methods. Robust linear modeling at the probe level.
Late Mar 2003 - Sep, 20030.5 SeriesNew backgrounds, dependence on R-1.7.0 and later, more general probe level modeling
Sep 2003-May 20040.6 SeriesModified PLMset object, many additional features
Jun 2004-Apr 20050.7 SeriesNever released outside UCB. Lots of work particulary in how the model fitting algorithm was implemented and the range of possible models that could be fit. Became the base for affyPLM at 1.3.X
Jun 2006 - Present0.8 SeriesImplement the few things that never quite made it into affy or affyPLM. Large dataset support

How do I get it? and what do I need?

For most people affyPLM will provide all the functionality they are expecting and if you have that there is no need to install AffyExtensions.
Version 0.8 series works with BioC 1.8. You can download this from
Linux/Unix and the likeWindowsDateComments
AffyExtensions_0.8-0.tar.gzNot currently availableJune 16, 2006Restarting the package with a few things that did not make it into other packages. Compatible with BioC 1.8 and R 2.3.X
AffyExtensions_0.8-1.tar.gzNot currently availableAug 10, 2006Extremely large dataset support for computing RMA expression values. Built on top of the BufferedMatrix package.
AffyExtensions_0.8-2.tar.gzNot currently availableJun 19, 2007Update justRMALite() to compatibility with BioC 2.0
If you want access to the large dataset functionality you will need the BufferedMatrix package. Currently you must download and build the sources yourself from the BioC svn archive. BioConductor svn repository (user: readonly, pass: readonly)

What is this "large data support" all about?

As of AffyExtensions 0.8-1 a function justRMALite has been introduced that can process extremely large numbers of CEL files to produce RMA expression values. While justRMA will also process large numbers of CEL files it is also extremely memory hungry. justRMALite uses temporary files to store interim data and calculations so it has a lower memory overhead. The following plot illustrates the running time differences between the functions:

This benchmark was established based on a machine with the following hardware/software configuration:
ProcessorAMD X64 3800+
Operating SystemLinux (Fedora Core 5 with 2.6.17-1.2157_FC5 kernel)
Free disk spaceA lot !!!
R 2.3.1

All the U133A CEL files used were text format (sometimes called version 3) although both functions will also handle the binary format. Because the test machine has a large amount of RAM we see better performance from justRMA up to about 1500 CEL files at which point justRMALite becomes faster. Based on the shape of the curve for justRMA we can estimate the point at which the program starts seriously dealing with swap memory rather than RAM (approximately 1250). On machines with less RAM the point at which justRMALite becomes more speed efficient will be lower.

What documentation is there?

Documentation for the 0.4 and later series are in the vignettes. Type openVignette() to read vignettes.

Any other questions?

Send me an email at