Timing simulations for BufferedMatrixMethods 1.3.5 (Simulation 1)

Written by Ben Bolstad
email bmb@bmbolstad.com

Background

The goal here was to investigate the speed of BufferedMatrix.justRMA() as a function of the number of arrays processed between 10 and 2500 hgu-133 Plus 2.0 arrays and compare that to the performance of RMAExpressConsole. To make a fair comparison, both were used to read in the same set of arrays, RMA process them and then write the output to text files.

pick_datasets.R Code used to select the subset of arrays used in each analysis.
RMAExpressTiming.sh Script for running and timing RMAExpressConsole
R_Timing.sh Script for running and timing BufferedMatrix.justRMA()
output_10.settings Settings file for RMAExpressConsole (using 10 arrays as example)
Process10.R R script that used BufferedMatrix.justRMA() (using 10 arrays as example)
compareoutput.R Look at all the output and compare it.

Results

RMAExpressConsole_log.txt

R_log.txt

RMAExpressTimes.txt

R_Times.txt

Discussion

The runtime between the two was very consistent.

There seems to be some inconsistencies in the expression values generated in some situations. Since RMAExpressConsole writes out expression values to 8 significant figures and using R we write to 16 decimal places we expect the maximum observed difference to be around 5*10^-7. In the case of 10-100 arrays this was true. However, for 200 and 250 there were more significant differences observed for expression values for all arrays. At 500 and 750 the differences were insignificant. Larger differences were also observed for the 2000 array set. Note that because the differences are fairly consistent between arrays within a dataset, it is not expected that the differences between the the two implementations would have any significant effects on any down stream analysis.