2. Swirl Experiment
experiment was carried out using
zebrafish as a model
organism to study the early development in vertebrates. Swirl is a point mutant
in the BMP2 gene that affects the dorsal/ventral body axis. The main goal of the
Swirl experiment is to identify genes with altered expression in the Swirl
mutant compared to wild-type zebrafish.
The hybridizations. Two sets of dye-swap experiments were performed
making a total of four replicate hybridizations. Each of
the arrays compares RNA from swirl fish with RNA from normal ("wild type") fish.
The experimenters have prepared a tab-delimited targets file called "SwirlSample.txt"
which describes the four hybridizations:
On slides 81 and 93, swirl RNA was labeled with green (Cy3) dye and wild type RNA
was labeled with red (Cy5) dye. On slides 82 and 94, the labeling was the
Each of the four hybridized arrays was scanned on an Axon scanner to produce
a TIFF image, which was then processed using the image analysis software SPOT.
The data from the arrays are stored in the four output files listed under FileName.
The arrays. The microarrays used in this experiment were printed with
8448 probes (spots), including 768 control spots. The array printer uses a print
head with a 4x4 arrangement of print-tips and so the microarrays are partitioned
into a 4x4 grid of tip groups. Each grid consists of 22x24 spots that were
printed with a single print-tip. The gene name associated with each spot is
recorded in a GenePix Array List (GAL) file named "fish.gal".
For this example we assume that these files, along with a Targets file ("SwirlSample.txt")
and STF ("SpotTypes.txt") are available in the same directory.
2.1 Reading the data using limmaGUI
limmaGUI, be advised that if you are a Windows user, it is
best to run
Rgui in Single Document Interface (SDI) mode, otherwise,
Rgui often "steals" the focus from
This can be done by selecting "GUI preferences" from the "Edit" menu,
selecting "SDI" saving preferences and restarting R. (Or alternatively, you
can edit the file
limmaGUI library needs to be loaded by typing
and selecting Yes. The main GUI screen should be displayed. If the GUI window is closed,
will re-open it.
Reading the data
From the File menu, select "New".
You will be asked to choose a working directory. Select the directory containing
the Swirl dataset (if you used the installer by default this will be
c:\microarrays\swirl) and click OK.
Now you can open a GAL (GenePix Array List) file, an RNA Targets file (listing the hybridizations),
and a Spot Types file.
Clicking on the "Select GAL File" button gives the following dialog. Open "fish.gal".
Now click on the "Select Targets" file button to open the RNA Targets file.
Finally, the Spot Types file, which for this experiment is called "SpotTypes.txt"
and has the following format
Once the GAL, Targets and Spot Types files have been selected, click OK.
Now select the type of image-processing file listed in the RNA Targets
file ("Spot") and click OK.
For Spot files, using background correction is highly recommended (choose Yes).
A number of background correction methods are available. Subtract will be used for this dataset.
For the Swirl data set, we will not use any spot quality weighting, so click No.
When prompted for a name for this data set, type in "Swirl".
Once the data set has been loaded, its name is displayed on top of the left status
window. The status window shows that Red and Green background-corrected intensities (R and G)
have been loaded, and that there is no spot-quality weighting. The data set name can be
later modified with the "Data Set Name" button. The data set name is not the same as the file
name, displayed in the title bar. For example, the same data set (Swirl) could be saved at
two different stages of the analysis, e.g. SwirlArraysLoaded.lma and
Now we can check that the Targets have been read in correctly. From the RNA Targets
menu, click on "RNA Targets" to display the information in a table. An
Edit menu is provided to allow the user to copy the Targets table to the clipboard.
The Spot Types table can viewed from within
limmaGUI in a similar manner. Unlike the
RNA Targets table, the Spot Types table is actually editable within
limmaGUI. You can
change the default colors associated with each spot type and you can even create new
spot types and save the table to a tab-delimited text file. The arrow keys can be used
to select the active cell in the table, while holding down Control and using the arrow
keys will move the cursor within the text in one cell. The Rows menu can be used to
add or delete spot types.
The layout should also be checked by selecting the "Layout Parameters" menu item from the Layout menu.
A dialog box with the number of rows and columns of blocks on an array and rows and columns of spots in each block will be displayed.
If these values are appropriate, click OK.
2.2. Diagnostic plots and normalization using limmaGUI
Once the data has been loaded, various diagnostic plots can be generated by choosing an appropriate option
from the Plot menu.
Image plots. It can be interesting to look at the variation of foreground and background
values over an array. Consider an image plot of the red foreground for the first array, by selecting:
Plot > Image Array Plot > Choose a slide: "Slide81" > OK > Choose which variable to plot: "R" > OK >
Plot title: "Image array plot of R for Slide 81" > OK.
An image plot should be displayed in another window. The top left of the array is on the bottom left of the plot, which represents a counter-clockwise
rotation of 90 degrees. We can see a bright streak across the middle two grids of the 3rd row caused by a scratch or dust on
the array. Spots which are affected by this artifact will have suspect M-values. The streak also shows up as brighter regions in the image plots for the background.
Clicking on a particular spot on this image will bring up a window displaying its ID information retrieved from the GAL file.
Other variables which may be plotted in this way include "Mraw" and "Araw", for un-normalized M and A values,
"G" for green foreground and "Rb" and "Gb" for red and green background.
MA-plots. An MA-plot plots the log-ratio of R vs G against the overall
intensity of each spot. The log-ratio is represented by the M-value, M =
log2(R)-log2(G), and the overall intensity by the A-value, A = (log2(R)+log2(G))/2.
To get an MA-plot of the un-normalized values for the first array, try the following:
Plot > M A Plot with lowess curves (for one slide) > Choose a slide: "Slide81" > OK > Normalization Within A Single Array: No >
Lowess Curve(s) Options: "Print-Tip Group Lowess Curves" > OK > Plot title: "M A Plot for slide 81 with no normalization" > OK.
A different colored curve is displayed for each print-tip group. To see individual MA-plots for each of the print-tip groups on this array, with lowess curves, try the following:
Plot > Print-Tip Group M A Plot (for one slide) > Choose a slide: "Slide81" > OK.
You should be able to notice the points which make up the red streak (Note: this plot is not rotated). The affected spots are in grids 10 and 11, and have very large positive M values at high intensities.
Normalization. Several normalization options are available in the Normalization menu. By choosing the "Select Within-Array Normalization Method" item from the menu,
a dialog box with various normalization options is displayed, as below.
Print-tip group loess normalization is the default method, and will be used for this data. Click OK.
Next choose the "Normalize / Update M and A" item from the Normalization menu
Dialog boxes which ask whether you'd like to "Normalize Within Arrays" (choose Yes) and "Normalize Between Arrays" (choose No) should follow. Now check the status window to see that "Within-Array Normalized" M and A values are available.
Try generating an MA plot of the normalized values, with the control spots highlighted (Hint: Plot > Color-Coded M A Plot (for one slide)).
2.3 Reading the data using command line functions (for reference only)
You may prefer to load your data using the command line functions available in
limma. Ensure that the R working directory is set to the directory containing the Swirl files (using
setwd(), or from the R Console in Windows, select File > Change dir... > Change working directory to: "<directory>"). To load the limma library, type
at the R command prompt. Next read the targets file using the command:
targets <- readTargets("SwirlSample.txt")
To read in the intensity data, the function
read.maimages is used.
RG <- read.maimages(targets$FileName, source="spot")
The default for SPOT output is that
Gmean columns of each file
are used as foreground intensities and
morphG are used as
background intensities. The object
RG is an
RGList object which contains
a foreground and background intensity for each of the red and green channels for every gene (spot)
on every array. To see a summary of the contents of this object, simply type its name and press enter.
To read in the .gal file, infer the slide layout, and assign this information to the
use the commands
RG$genes <- readGAL("fish.gal")
RG$printer <- getLayout(RG$genes)
2.4 Diagnostic plots and normalization using command line functions (for reference only)
Image plots. Consider image plots of the red and green background for the first array:
imageplot(log2(RG$Rb[,1]), RG$printer, low="white", high="red")
imageplot(log2(RG$Gb[,1]), RG$printer, low="white", high="green")
MA-plots. The M and A values can be calculated using the function
method="none" calculates raw (un-normalized) M and A values.
MA <- normalizeWithinArrays(RG, method="none")
MA <- MA.RG(RG)
Note that 'subtraction' of the background from the foreground is the default background correction method
used to construct these log-ratios. Other options are possible using the
function on the
RGList. For example, if you don't want to background correct the data, you
would use the following
RGnobg <- backgroundCorrect(RG, method="none")
and then proceed as before to calculate the M and A values, using
RGnobg instead of
To plot the raw M and A values for the first array, use the following command
By incrementing the
array argument (eg
array=2), MA plots for other slides can be generated.
Now plot the individual MA-plots for each of the print-tip groups on this array,
together with the loess curves which will be used for normalization:
Normalization. For print-tip loess normalization, use the command:
MA <- normalizeWithinArrays(RG)
Print-tip loess is the default normalization method in
other options are possible, and are specified by the
method argument (for example use
method="loess" for global intensity based loess normalization of each slide). To plot the normalized M versus A values by print-tip group, type:
3. Cell line comparison data (for reference only)
In this exercise we consider an experiment where two RNA sources are compared directly on 6 replicate arrays.
Background. This data is taken from a set of quality control hybridizations, which were performed to
assess how well the spots were printed on the arrays, rather than a question of biological interest.
The RNA compared was from two very different cell lines, which ensured many of the genes were differentially expressed
at varying degrees. Data for each slide is available from two image analysis packages (GenePix and SPOT),
and will be used to highlight some of the differences between the programs, particularly in terms of the background
levels measured for each spot.
The hybridizations. Six microarrays were each hybridized with RNA from cell line A (Cy5) and cell line B (Cy3).
The arrays. A total of 10944 spots were printed on each array. They were arranged in 6x4 print-tip grids, each containing 19x24 rows and columns of spots.
The Lucidea Universal ScoreCard controls (Samartzidou et al, 2001) were printed on the arrays, and spiked into the RNA mixes of each experiment.
This control series is made up of artificial genes, some of which are added in equal quantities,
or at 3 and 10 times as much in one sample compared to the other to give known fold-changes for particular spots.
For this example, you will need the GAL file called "genelist.gal" and the raw SPOT (.spot) and GenePix (.gpr) output
files in the current working directory.(if you used the installer default you will find these in
3.1 Defining Targets files (for reference only)
For the cell line comparison arrays, a targets file would look like
for the Spot data, and
for the GenePix data. Files named "TargetsSpot.txt" and "TargetsGenePix.txt"
are included in the zip file for the cell line data.
3.2 Defining a Spot Types file (for reference only)
An appropriate STF for these arrays which contain Lucidea Universal ScoreCard controls is:
A STF named "SpotTypes.txt" is included in the zip file for the cell line data.
3.3 Loading the data using limma (for reference only)
To load the
limma library, type
To read in the STF and targets information, use the following commands
SpotTypes <- readSpotTypes()
TargetsSpot <- readTargets("TargetsSpot.txt")
The file names stored in the targets file can be used to read in the intensity information contained
in each of the image analysis output files with the command
RGspot <- read.maimages(TargetsSpot$FileName, source="spot")
RGspot is an RGList object which contains five components:
targets. You can verify
this by typing:
The foreground intensities are stored in
RGspot$R (Cy5) and
RGspot$G (Cy3) and the
backgrounds are stored in
RGspot$Rb (Cy5) and
Other useful information can also be added to the RGList, such as the probe ID information
RGspot$genes <- readGAL()
This information is read in from a file in the current working directory with an extension .gal.
If no such file, or multiple files with this extension exist, an error message will be generated.
SpotTypes information can now be used to determine the status of each spot (ie control or gene), using the command
RGspot$genes$Status <- controlStatus(SpotTypes, RGspot$genes)
This information will be used later to highlight the control spots on an MA plot. The printer
information is also needed if intensity based print-tip normalization is to be applied to the data.
The easiest way to acquire this information is from the GAL file, ie
RGspot$printer <- getLayout(RGspot$genes)
Now all of the necessary information is stored in the RGList object,
View a summary of the contents of
RGspot by typing its name and pressing enter.
This process can be repeated for the GenePix data:
TargetsGenePix <- readTargets("TargetsGenePix.txt")
RGgpr <- read.maimages(TargetsGenePix$FileName, source="genepix")
RGgpr$genes <- readGAL()
RGgpr$printer <- getLayout(RGgpr$genes)
RGgpr$genes$Status <- controlStatus(SpotTypes, RGgpr$genes)
The default for GenePix output is that the
F635 Mean (Cy5) and
(Cy3) columns of each .gpr file are used as foreground intensities and
B635 Median (Cy5) and
B532 Median (Cy3) are used as background intensities.
An alternative way of reading data from the image analysis output files without using
the Targets file, is
spotFiles <- dir(pattern="\\.spot$")
RGspot <- read.maimages(spotFiles, source="spot")
for SPOT data and
gprFiles <- dir(pattern="\\.gpr$")
RGgpr <- read.maimages(gprFiles, source="genepix")
for GenePix data. In this case, the order of the files in the working directory (which depends on their names) will determine the order they are read in. A benefit of using a targets file is that the order is controlled by the user. The targets file can also be used later in the linear modeling analysis to set up the design matrix. Assigning the gene ID's, control status of the spots and the printer layout will need to be repeated as before.
pattern argument of the
dir function accepts a regular expression. A detailed understanding of
regular expressions is certainly not required to perform basic microarray analysis. The double backslash (
\\) tells R that the dot immediately after
it should be treated literally as a dot rather than as a special character (as it normally would be in regular expressions). The dollar sign (
R that not only must the file names contain the pattern ".spot" (or ".gpr"), but they must in fact end with ".spot" (or ".gpr").
3.4 Normalization using limma (for reference only)
Un-normalized M and A values for each spot can be constructed using the function
MAspot <- MA.RG(RGspot)
MAgpr <- MA.RG(RGgpr)
MA-plots. The MA-plot of the un-normalized values for the first array is obtained using:
# Now open a new plot window (system-dependent)
windows() # For Windows users
quartz() # For Mac users
x11() # For Linux users
Notice the automatic highlighting of the control spots specified in the STF.
To do the same plot for other arrays, the
array argument can be changed.
Normalization. For print-tip group normalization, try
MAspot <- normalizeWithinArrays(RGspot)
MAgpr <- normalizeWithinArrays(RGgpr)
To do MA plots, 6 to a page and save the output in .png format, the commands
can be used. Comparing the MA plots for a given slide, you should notice that the dynamic range of the A values is less for SPOT data compared to GenePix. There is also some fanning of the M values at lower intensities for the GenePix plots. This is a result of the different background estimates used in the two packages. The morphological background in SPOT tends to give a low, stable estimate of slide background, whereas the median of the background pixels used in GenePix is an over-estimate (see Yang et al (2002)), resulting in numerical instability for many of the low intensity log-ratios. Not background correcting GenePix data may be a good option in some instances, ie
RGgprnobg <- backgroundCorrect(RGgpr, method="none")
then proceed as before to normalize the data, using
RGgprnobg instead of
3.5 Diagnostic plots using arrayQuality (for reference only)
For some other diagnostic plots, try the following functions from the
controlCode <- as.matrix(read.table("controlCode.txt", header=TRUE,sep="\t"))
rawdata <- gpQuality(dir(pattern="\\.gpr$"), compBoxplot=FALSE,
For each slide, a summary diagnostic plot "diagPlot.<slidename>.png" will be saved in the working directory. Each slide summary shows MA plots, image plots of M (before and after normalization) and A values, dot plots of the M and A values for each class of control spot (provided
controlCode is set correctly), as well as single channel intensity plots. These plots allow the overall quality of a slide to be assessed visually. The
arrayQuality package can also provide a more quantitative measurement of slide quality by comparing each slide to a set of reference slides, where available (not applicable for this example).