Saturday, 7 March 2009

Database structure optimization log

We have just started a database optimization exercise. This is a log of the first steps that we took in doing this. Here is a query that lists the number of seeks, scans, lookups and updates that are made on indexes for tables starting with "MOD_"

-- Index usage
SELECT
object_name(a.object_id) AS table_name,
COALESCE(name, 'object with no clustered index') AS index_name,
type_desc AS index_type,
user_seeks,
user_scans,
user_lookups,
user_updates
FROM sys.dm_db_index_usage_stats a INNER JOIN sys.indexes b
ON a.index_id = b.index_id
AND a.object_id = b.object_id
WHERE database_id = DB_ID('CatFocus')
AND (user_seeks = 0 OR user_scans= 0)
AND object_name(a.object_id) LIKE 'MOD_%'
AND a.object_id > 1000
ORDER BY user_updates

The query lists all accesses since the last reboot, here is an extract.

MOD_EVENT_SET PK_TBL_EVENT_SET CLUSTERED 0 456 0 121
-->MOD_LOSS_FILE IX_MOD_LOSS_FILE NONCLUSTERED 0 0 0 133005
-->MOD_LOSS_FILE IX_MOD_LOSS_FILE_LOSS_FILE_ID NONCLUSTERED 0 1 0 509254

There are a lot of updates to these indexs but they are never used. The next step is to time the delete statement several times with and without the index. It took 3062 ms with the index and 718 without.

The next part of the procedure was to setup a test server where we can make a controlled load. We monitored the server using the SQL PRofiler and then looked through the logs.

Here is a sumary of the findings:
- At the start there where a lot of logins
- There where several places where the same select statement was being executed
- There where a lot of updates related to creating a cumulative frequency
- Connections where not closed.

We where able to fix the login and connections problem by modifying the based class used in our data access layer to:
Private IsInTransaction As Boolean = False
Public Function OpenConnection() As OleDbConnection
REM Establish connection to database
If objConn Is Nothing Then
objConn = New OleDbConnection
End If
If objConn.State <> ConnectionState.Open Then
objConn.ConnectionString = ConnStr
objConn.Open()
End If
Return objConn
End Function

Public Sub CloseConnection(ByRef objConnection As OleDbConnection)
REM Close connection to database
If (Not IsInTransaction) Then
objConnection.Close()
End If
End Sub
Public Function BeginTransaction() As Boolean Implements IDataBase.BeginTransaction
REM Function to to start a transaction
IsInTransaction = True
Return ExecuteSQL("Begin Transaction ")
End Function
Public Function EndTransaction() As Boolean Implements IDataBase.EndTransaction
REM Function to to start a transaction
IsInTransaction = False
Return ExecuteSQL("commit ")
End Function

The problem with the cumulative frequency calculation was made with a stored procedure.

The next step was to look at index fragmentation. The value for avg_fragmentation_in_percent should be as close to zero as possible for maximum performance. However, values from 0 percent through 10 percent may be acceptable. All methods of reducing fragmentation,
such as rebuilding, reorganizing, or re-creating, can be used to reduce these values
http://msdn.microsoft.com/en-us/library/ms188917.aspx

SELECT
object_name(a.object_id) AS table_name,
a.object_id AS objectid,
a.index_id AS indexid,
name AS index_name,
partition_number AS partitionnum,
avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats (DB_ID('CatFocus'), NULL, NULL , NULL, 'LIMITED') a INNER JOIN sys.indexes b
ON a.index_id = b.index_id AND a.object_id = b.object_id
WHERE avg_fragmentation_in_percent > 10.0 AND a.index_id > 0
ORDER BY table_name, avg_fragmentation_in_percent;

Here is an extract of what was found

CED_CEDANT 1758629308 6 IX_CED_CEDANT 1 66.6666666666667
CED_CEDANT 1758629308 1 PK_CED_CEDANT_1 1 83.3333333333333
DSP_PRICING 1609772792 1 PK_DSP_PRICING 1 42.3267326732673

The solution is to include index defragmentation in the backup maintenance plan

We knew that one of our tables has some columns that are not often used. Here there is a possibility to use "Sparse Columns" available in SQL 2008. Sparse Column has the 8% more CPU overhead, and it achieve the better performance result for Warm Run and Cold Run.
LOSS_ZONE_xx must be filled with NULL instead of 0
http://blogs.msdn.com/qingsongyao/archive/2008/09/10/using-sql-server-2008-semi-structured-features-for-performance-tuning-and-application-design-1.aspx
http://msdn.microsoft.com/en-us/library/cc280604.aspx

As we investigated the queries looking at the actuall execution plan in SQL 2008 there is an output that makes suggestions over any missing indexs that can be used to optimize the select statement
eg Missing Index (Impact 98.0075) : CREATE INDEX ......