bmb splus notes for week 1 Feb, 3 Feb
Splus
-----
Object Oriented Statistical Programming Language
- apply procedures to objects
- flexible/extensible
- interpreted so can be a bit slow (avoid loops at all costs)
Main Data Object types
----------------------
Data frame - used to store related data.
- matrix like
- usually rows are observations and columns are measurements of various variables
- can mix different scalar types together. ie numbers, strings, logical values may be mixed together
- usually created by reading in data with read.data() command
Matrix - stores information all of the same type in a 2 dimesional array
- can not mix types in same matrix ie must be all numeric or all logical or .... etc
- matrix algebra is supported
- created with the matrix() command
Vector - one dimensional version of the matrix
List - collection of itmes of different types that are stored together because the are some what related. eg a student record ..
Scalar - a plain old number, or character string or single logical value
Indexing
--------
We can index information stored in these data structures in a number of different ways. Assume we have the following datafile which we will read into Splus
eg
name age height weeight gender
Fred 22 185 80 M
Bob 35 172 78 M
Jane 27 167 63 F
Mary 19 163 57 F
this can be read in using the following command
> data <- read.table("data",header=T)
the first parameter is the name of the datafile, the second parameter specifies that the first row of the file contains the column names.
To index a single cell we can use data[i,j] to return the jth element of the ith row. for example
> data[3,4]
returns 63
You can however also index an entire row with data[i,] to get the ith row. so
> data[2,]
would return
Bob 35 172 78 M
or index an entire column, so data[,j] would return the jth column of data. so
> data[,4]
would return
weight
80
78
63
57
we can also refer to columns by name so
data$gender refers to the column
M
M
F
F
Even more powerful we may use logical indexing. do for example
> data[data$age > 25,] would return
name age ....
Bob 35 ....
Jane 27 ....
or
>data[data$gender ="m",] would return
name .....
Fred .....
Bob .....
Note that all of the above indexing methods work equally well with matrices. We can create a matrix withthe call
> matrix(data, nrows, ncols)
Vectors have only 1 dimension so we may use only one index. one simple way to create a vector is to use the concatenate function c(). For example
> avec <- c(3.1,8.0,5.8,-4.2)
> avec[1]
returns
3.1.
the colon (:) is used as the range operator in format lowerlimit:upperlimit and will in fact return a vector containing the ordered elements (lowerlimit, lowerlimit+1, ..... ,upperlimit)
>avec[2:3]
would thus return
8.0 5.8
or you could use
> avec[c(1,4)]
to get
3.1 -4.2
logical operators would also work. note that you can use the range operator (or any other suitable vector) to index both data frames and matrices also.
to create a list that is say a student record
> alist <- list(name="john", major="Statistics",age=23,courses=c(200,248))
indexing works slightly differently from above. name indexing works as might be expected. so
> alist$name
would return
john
or we could use
> alist[[1]] to do the same thing
note the last item of our list is a vector. this introcudes slightly more complexity to the situation
alist$courses or alist[[4]]
would return
200 248
to index say the first element of the vector we would use
alist$courses[1] or alist[[4]][1]
to get 200
Some useful functions for dealing with data objects
---------------------------------------------------
dim() - get dimensions
names() - column names
row.names() - row names
length() - lenght of a vector
cbind(), rbind() - join dataframes or matrices together by columns or rows
nrows(),ncols() - number of rows or columns
Operators
---------
+,-,*,/ arthemetical
<- assignment operator
<,>,<=,>= logical comparison
== equal to
!= not equal
! logical not
& logical and
| logical or
Special data items
------------------
F = 0 (false)
T = 1 (true)
NA missing data marker
Matrix operations
-----------------
t(X) transpose matrix X
solve(X) invert a square matrix X
solve(A,B) solve linear system AX = B for X
%*% - matrix multiplication