Sunday 11 April 2010

Catching up on some old notes about indexed views

In May 2008 I was looking into optimizing our EDMConverter by using index views. I first came across index views in a talk by Kimberly Tripp in 2005 “SQL Server 2000 Performance Best Practices and Getting Ready for SQL Server 2005”. An index View (Oracle equivalent to Materialized views) is several orders of magnitude faster than a normal query because behind an index view is a cache corresponding to the query. As new data is entered the contents of this cache is updated. Since this data is cached only a read is made hence it’s speed.

Indexed views can be made only on a subset of deterministic data. As new data is added triggers are invoked that update the cached index meaning that there is a smaller performance hit on data updates. Indirectly Indexed views can be used by the sql compiler even when the view is not directly referenced by the select statement. Another consideration is that the contents of the query needs to be stored within the database meaning more space is required

In the case of our EDM data Indexed views are of only very limited value. The reason was that deterministic data includes float, double, min, max etc which in our case was needed because our queries revolved around sums insured which are stored as a float. We used the Tuning advisor to optimize some predefined load, it turned out that that the EDM was already well optimized. Therefore in the end we optimized our queries by adding precalculated data.

I am still looking for ways to optimize and generate reports faster, when I get some time there are a lot of ideas I need to follow up on. At the PDC09 I visited “Data-Intensive Computing on Windows HP Server with the DryadLINQ Framework” presented by John Vert. This could be a way to make reporting faster by spreading the query across nodes of an HPC Cluster. DryadLinq is still experimental but the core structure looks like:

- Scale-out, partitioned container for .NET objects
- Derives from IQueryable<T>, IEnumererable<T> with ToPartitionedTable() extension methods
- DryadLinq operators consume and produce PartitionedTable<T>
- DryadLinq generates code to serialize/deserialize you .NET objects
- Underlying storage can be a partitioned file, partitioned SQL table, cluster files