Monday, 9 March 2009

R script to look at data with a lot of columns

I had some performance data that I needed to analyse that had over 1000 rows. Excel cannot load such a file, so I resorted to some tools that mathematicians and scientists use. In this case I used the following R script. There are some interesting twists, for example when R comes across a character it does not recognise then you need to replace the character with ".". For example R does not like spaces, "(", ")" etc... Anyway this script is very usefull for analysing large csv files...

data <- read.csv(file="C:/Data/Work/EXposure+IT/ITBermuda/NetApp/PartnerRe_000005.csv",header=TRUE,sep=",")colnum <- c(116,137,97,118)

sstr <- c(
"BMHMPRES911.PhysicalDisk.0.C..D...Current.Disk.Queue.Length"
,"BMHMPRES911.PhysicalDisk.0.C..D...Avg..Disk.Queue.Length"
,"BMHMPRES911.PhysicalDisk.0.C..D...Disk.Read.Bytes.sec"
,"BMHMPRES911.PhysicalDisk.0.C..D...Disk.Write.Bytes.sec"
,"BMHMPRES911.PhysicalDisk.1.G...Current.Disk.Queue.Length"
,"BMHMPRES911.PhysicalDisk.1.G...Disk.Read.Bytes.sec"
,"BMHMPRES911.PhysicalDisk.1.G...Disk.Write.Bytes.sec")

results <- list()
for(i in 1:length(sstr)){
results[[i]] <- grep(sstr[i],names(data))
}

# make a time series#options("digits.secs") <- 3
#datetime <- as.character(data[,1])
#datetime <- strftime(datetime, format="%m/%d/%Y %H:%M:%S")time <- seq(1,1565)

for(i in 1:length(results)){
par(ask=F)

png(filename=paste("C:/Data/Work/EXposure+IT/ITBermuda/NetApp/",i,".png",sep=""))
plot(time,data[,results[[i]]],type="l")
title(main=names(data)[results[[i]]])
dev.off()
}

Saturday, 7 March 2009

Database structure optimization log

We have just started a database optimization exercise. This is a log of the first steps that we took in doing this. Here is a query that lists the number of seeks, scans, lookups and updates that are made on indexes for tables starting with "MOD_"

-- Index usage
SELECT
object_name(a.object_id) AS table_name,
COALESCE(name, 'object with no clustered index') AS index_name,
type_desc AS index_type,
user_seeks,
user_scans,
user_lookups,
user_updates
FROM sys.dm_db_index_usage_stats a INNER JOIN sys.indexes b
ON a.index_id = b.index_id
AND a.object_id = b.object_id
WHERE database_id = DB_ID('CatFocus')
AND (user_seeks = 0 OR user_scans= 0)
AND object_name(a.object_id) LIKE 'MOD_%'
AND a.object_id > 1000
ORDER BY user_updates

The query lists all accesses since the last reboot, here is an extract.

MOD_EVENT_SET PK_TBL_EVENT_SET CLUSTERED 0 456 0 121
-->MOD_LOSS_FILE IX_MOD_LOSS_FILE NONCLUSTERED 0 0 0 133005
-->MOD_LOSS_FILE IX_MOD_LOSS_FILE_LOSS_FILE_ID NONCLUSTERED 0 1 0 509254

There are a lot of updates to these indexs but they are never used. The next step is to time the delete statement several times with and without the index. It took 3062 ms with the index and 718 without.

The next part of the procedure was to setup a test server where we can make a controlled load. We monitored the server using the SQL PRofiler and then looked through the logs.

Here is a sumary of the findings:
- At the start there where a lot of logins
- There where several places where the same select statement was being executed
- There where a lot of updates related to creating a cumulative frequency
- Connections where not closed.

We where able to fix the login and connections problem by modifying the based class used in our data access layer to:
Private IsInTransaction As Boolean = False
Public Function OpenConnection() As OleDbConnection
REM Establish connection to database
If objConn Is Nothing Then
objConn = New OleDbConnection
End If
If objConn.State <> ConnectionState.Open Then
objConn.ConnectionString = ConnStr
objConn.Open()
End If
Return objConn
End Function

Public Sub CloseConnection(ByRef objConnection As OleDbConnection)
REM Close connection to database
If (Not IsInTransaction) Then
objConnection.Close()
End If
End Sub
Public Function BeginTransaction() As Boolean Implements IDataBase.BeginTransaction
REM Function to to start a transaction
IsInTransaction = True
Return ExecuteSQL("Begin Transaction ")
End Function
Public Function EndTransaction() As Boolean Implements IDataBase.EndTransaction
REM Function to to start a transaction
IsInTransaction = False
Return ExecuteSQL("commit ")
End Function

The problem with the cumulative frequency calculation was made with a stored procedure.

The next step was to look at index fragmentation. The value for avg_fragmentation_in_percent should be as close to zero as possible for maximum performance. However, values from 0 percent through 10 percent may be acceptable. All methods of reducing fragmentation,
such as rebuilding, reorganizing, or re-creating, can be used to reduce these values
http://msdn.microsoft.com/en-us/library/ms188917.aspx

SELECT
object_name(a.object_id) AS table_name,
a.object_id AS objectid,
a.index_id AS indexid,
name AS index_name,
partition_number AS partitionnum,
avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats (DB_ID('CatFocus'), NULL, NULL , NULL, 'LIMITED') a INNER JOIN sys.indexes b
ON a.index_id = b.index_id AND a.object_id = b.object_id
WHERE avg_fragmentation_in_percent > 10.0 AND a.index_id > 0
ORDER BY table_name, avg_fragmentation_in_percent;

Here is an extract of what was found

CED_CEDANT 1758629308 6 IX_CED_CEDANT 1 66.6666666666667
CED_CEDANT 1758629308 1 PK_CED_CEDANT_1 1 83.3333333333333
DSP_PRICING 1609772792 1 PK_DSP_PRICING 1 42.3267326732673

The solution is to include index defragmentation in the backup maintenance plan

We knew that one of our tables has some columns that are not often used. Here there is a possibility to use "Sparse Columns" available in SQL 2008. Sparse Column has the 8% more CPU overhead, and it achieve the better performance result for Warm Run and Cold Run.
LOSS_ZONE_xx must be filled with NULL instead of 0
http://blogs.msdn.com/qingsongyao/archive/2008/09/10/using-sql-server-2008-semi-structured-features-for-performance-tuning-and-application-design-1.aspx
http://msdn.microsoft.com/en-us/library/cc280604.aspx

As we investigated the queries looking at the actuall execution plan in SQL 2008 there is an output that makes suggestions over any missing indexs that can be used to optimize the select statement
eg Missing Index (Impact 98.0075) : CREATE INDEX ......

Improving the speed of DataGridViews by switch of the RaiseListChangeEvent

Here is an example of how to improve the speed of updates to data gridviews in Windows Forms

Control.CheckForIllegalCrossThreadCalls = False
MyDataBindingSource.RaiseListChangedEvents = False

REM Make update to datagrid
MyDataBindingSource.RaiseListChangedEvents = True
MyDataBindingSource.ResetBindings(False)

Base classes for WPF

Here is an example of how to implement base classes for Windows in wpf

BaseWindow.cs
using System.Windows;
namespace PartnerRe.WPF.Utilities
{
public class BaseWindow : Window
{
public BaseWindow()
{
this.ShowInTaskbar = false;
this.WindowStartupLocation = WindowStartupLocation.CenterOwner;
System.Drawing.Color color = System.Drawing.SystemColors.Control;
this.Background = new System.Windows.Media.SolidColorBrush(System.Windows.Media.Color.FromRgb(color.R, color.G, color.B));
}
protected override void OnInitialized(System.EventArgs e)
{
base.OnInitialized(e);
}
}
}

src:BaseWindow x:Class="WinMyWindow"
...
xmlns:src="clr-namespace:PartnerRe.WPF.Utilities;assembly=MyUtilities"
...
/src:BaseWindow

Single Click edit in WPF DataGridView

If you have a datagrid view with a check box the fefault behaviour is that you need to click 2 times to check a check box. 1 click it to select the cell and the second to check the box. To get arround this:

In the XAML File:
UserControl.Resources
. Style TargetType="{x:Type my:DataGridCell}"
EventSetter Event="PreviewMouseLeftButtonDown" Handler="DataGridCell_PreviewMouseLeftButtonDown" /EventSetter
/Style
...
/UserControl.Resources


In the CS-File:
using Microsoft.Windows.Controls;...
private void DataGridCell_PreviewMouseLeftButtonDown(object sender, MouseButtonEventArgs e)
{
PartnerRe.WPF.Utilities.Tools.SingleClickEditing(sender as DataGridCell);
}

Commar format in WPF

Add the following reference
xmlns:sys="clr-namespace:System;assembly=mscorlib"
Then add the following static resource
UserControl.Resources
sys:String x:Key="CommaFormat" {0:#,0} /sys:String
...
Then within the binding element of the Text use String Format
TextBlock Foreground="Black" Text="{Binding COVER, StringFormat={StaticResource CommaFormat}}" HorizontalAlignment="Right"/
There is no way to directly add the string format into the Text property
The alternative using PropertyConverters requires a new class and therefore much more code than this...
8<---
Notes from
http://www.sloppycode.net/articles/formatstrings.aspx
Console.WriteLine("{0:g}", DateTimeKind.Utc); // general
Console.WriteLine("{0:f}", DateTimeKind.Utc); // string
Console.WriteLine("{0:d}", DateTimeKind.Utc); // integer
Console.WriteLine("{0:x}", DateTimeKind.Utc); // hex

Console.WriteLine("{0} {1} {2}",100,200,3000);
Output:
100 200 3000
Console.WriteLine("{0:X} {1:N1} {2:#,#}",100,200,3000);
Output:
64 200.0 3,000
Console.WriteLine("(amount):{0,30:C}", 1123.27);
Output:
(amount): £1,123.27
Console.WriteLine("{0,-30:C}(amount)", 1123.27);
Output
£1,123.27 (amount)
double neg = -1000.123;
double pos = 1000.123;
double z = 0;
Console.WriteLine("{0:A)0.00;B)#.##}", pos);
Console.WriteLine("{0:A)0.00;B)#.##}", z);
Console.WriteLine("{0:A)0.00#;B)#.##}", neg);
Console.ReadLine();
Output
A)1000.12
A)0.00
B)1000.12
double x = 67867987.88666;
Console.WriteLine("{0:C}",x);
Console.WriteLine("£{0:N}", x);
Console.WriteLine("£{0:#,#.###}", x);
Output:
£67,867,987.89
£67,867,987.89
£67,867,987.887
// No number specifier
Console.WriteLine("a) {0}", numSingle);
// Round up, ignore any decimal points.
Console.WriteLine("a) {0:##}", numSingle);
// Round up to 1 dp.
Console.WriteLine("b) {0:.#}", numSingle);
// Round up to 2 dp.
Console.WriteLine("c) {0:.##}", numSingle);
// Round up to 3 dp.
Console.WriteLine("d) {0:.###}", numSingle);
// Floats work up to 7 digits, then any extra digit after the decimal place is rounded up, or ignored.
// So 4 decimal places is the maximum a number up to 1000 can store, e.g. 999.1234.
// It's recommended to always use double (Double) or decimal (Decimal) for applications involving currencies.
double numDouble = 1000000.5664;
float numSingle2 = 999.9999f;
Console.WriteLine("e) {0:.####}", numDouble);
Console.WriteLine("e2) {0:.####}", numSingle2);
Console.WriteLine("e3) {0:##}", numSingle2);
Output:
a) 10.566
a) 11
b) 10.6
c) 10.57
d) 10.566
e) 1000000.5664
e2) 999.9999
e3) 1000
//
// In built formatters. These can have digit specifiers after them, such as {0:D5}
//
Console.WriteLine("");
Console.WriteLine("a) [C]urrency - {0:C}", 1123.27); // Very similar to N (but no rounding), but with the currency symbol. See the section on regions.
Console.WriteLine("b) [D]ecimal - {0:D5}", 1123); // Integers only. The '5' sets the number of zeros to pad to the left.
Console.WriteLine("c) [F]ixed - {0:F3}", 1123.275); // Fixed decimal places including rounding.
Console.WriteLine("d) [G]eneral - {0:G}", 1123.275); // Scientific or decimal notation, whichever is shorter. Has a long set of rules in MSDN.
Console.WriteLine("e) [N]umber - {0:N}", 1123.275); // Similar to fixed, but includes ',' thousand seperator and is rounded to 2 dp by default.
Console.WriteLine("f) [P]ercent - {0:P}", 0.2345); // Percentage to 2 dp. So 0.99998 is 100%
Console.WriteLine("g) [P]ercent - {0:#%} ", 0.915); // This shows the percentage but as a whole number, rounded
double d1 = 1234.9998;
string rd = string.Format("{0:R}", d1);
double d2 = Double.Parse(rd);
Debug.Assert(d1 == d2);
Console.WriteLine("h) [R]ound trip - {0:R}", 0.9998); // Allows you to parse the resulting string using Single.Parse() or Double.Parse()
Console.WriteLine("i) he[X]adecimal - 0x{0:X}", 255); // Integers only
Console.WriteLine("j) [E]xpontial - {0:E}", 10000000);
Output:
a) [C]urrency - £1,123.27
b) [D]ecimal - 01123
c) [F]ixed - 1123.275
d) [G]eneral - 1123.275
e) [N]umber - 1,123.28
f) [P]ercent - 23.45 %
g) [P]ercent - 92%
h) [R]ound trip - 0.9998
i) he[X]adecimal - 0xFF
j) [E]xpontial - 1.000000E+007
// --- Default formats ---
// Tokens:
// dD tT fF gG rsu M Y
// d = short/long [d]ate
// t = short/long [t]ime
// f = short/long [f]ull
// g = short/long [g]eneral or default
// rsu = RFC or ISO format, last 2 are sortable
// M = day & month
// Y = month & year
// --- Custom formats ---
// dd ddd dddd
// 15 Mon Monday
// MM MMM MMMM
// 06 Jun June
// yy yyyy
// 08 2008
// hh HH
// 01 13
// mm
// ss
// tt
// AM or PM

public class FormatProvider : IFormatProvider
{
public object GetFormat(Type formatType)
{
if (formatType is IFormatProvider)
return this;
else
return null;
}
}
public string ToString(string format, IFormatProvider formatProvider)
{
// handle format
}
public class User : IFormattable
{
private int _age;
private string _name;
///
/// Gets/sets the Name.
///

public string Name
{
get { return _name; }
set { _name = value; }
}
///
/// Gets/sets the Age.
///

public int Age
{
get { return _age; }
set { _age = value; }
}
///
/// Object.ToString() implementation
///

public override string ToString()
{
return string.Format("'{0}', age {1}", Name, Age);
}
///
/// Just a format string, functionality you find in DateTime.Now.ToString()
///

public string ToString(string format)
{
return ToString(format, null);
}
///
/// IFormattable.ToString() implementation
///

public string ToString(string format, IFormatProvider formatProvider)
{
// Default format
if ( string.IsNullOrEmpty(format) format == "G" )
return string.Format("'{0}', age {1}", Name, Age);
if (format == "name")
return Name;
else if (format == "age")
return Age.ToString();
else
{
// The IFormattable example shows branching based on the format. This is fine if the class
// is being used with composite formatting, however to support the DateTime.Now.ToString("xxx")
// style formatting, we need the below.
string result = format;
result = result.Replace("name", Name);
result = result.Replace("age", Age.ToString());
return result;
}
}
}
static void Main(string[] args)
{
User user = new User() { Name = "Chris", Age = 30 };
Console.WriteLine("Default format: {0}", user.ToString());
Console.WriteLine("Composite format: {0:name} is {0:age}",user);
Console.WriteLine("ToString(format): {0}", user.ToString("This user (name) is age"));
Console.ReadLine();
}
Output:
Default format: 'Chris', age 30
Composite format: Chris is 30
ToString(format): This user (Chris) is 30