Monday 9 March 2009

R script to look at data with a lot of columns

I had some performance data that I needed to analyse that had over 1000 rows. Excel cannot load such a file, so I resorted to some tools that mathematicians and scientists use. In this case I used the following R script. There are some interesting twists, for example when R comes across a character it does not recognise then you need to replace the character with ".". For example R does not like spaces, "(", ")" etc... Anyway this script is very usefull for analysing large csv files...

data <- read.csv(file="C:/Data/Work/EXposure+IT/ITBermuda/NetApp/PartnerRe_000005.csv",header=TRUE,sep=",")colnum <- c(116,137,97,118)

sstr <- c(
"BMHMPRES911.PhysicalDisk.0.C..D...Current.Disk.Queue.Length"
,"BMHMPRES911.PhysicalDisk.0.C..D...Avg..Disk.Queue.Length"
,"BMHMPRES911.PhysicalDisk.0.C..D...Disk.Read.Bytes.sec"
,"BMHMPRES911.PhysicalDisk.0.C..D...Disk.Write.Bytes.sec"
,"BMHMPRES911.PhysicalDisk.1.G...Current.Disk.Queue.Length"
,"BMHMPRES911.PhysicalDisk.1.G...Disk.Read.Bytes.sec"
,"BMHMPRES911.PhysicalDisk.1.G...Disk.Write.Bytes.sec")

results <- list()
for(i in 1:length(sstr)){
results[[i]] <- grep(sstr[i],names(data))
}

# make a time series#options("digits.secs") <- 3
#datetime <- as.character(data[,1])
#datetime <- strftime(datetime, format="%m/%d/%Y %H:%M:%S")time <- seq(1,1565)

for(i in 1:length(results)){
par(ask=F)

png(filename=paste("C:/Data/Work/EXposure+IT/ITBermuda/NetApp/",i,".png",sep=""))
plot(time,data[,results[[i]]],type="l")
title(main=names(data)[results[[i]]])
dev.off()
}