Thursday 29 December 2011

Moving an SSIS Package from a 32 bit machine to a 64 bit machine

I have an SSIS package that works on a 32 bit server but fails on a 64 bit machine with a number of errors, the most interesting being:


[Excel Destination [511]] Error: SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER. The AcquireConnection method call to the connection manager "Excel Connection Manager" failed with error code 0xC00F9304. There may be error messages posted before this with more information on why the AcquireConnection method call failed.

  
Surfing I found these sites...

http://social.msdn.microsoft.com/Forums/en/sqlintegrationservices/thread/289e29ad-26dc-4f90-bad4-ffb86c76e5f9

http://toddmcdermid.blogspot.com/2009/10/quick-reference-ssis-in-32-and-64-bits.html

It turns out that there are a number of drivers that are available in 32 bit but are not available in 64 bit. It is possible to run an SSIS package in 32 bit on a 64 bit machine. These URLs suggest

The first thing is to make the Visual Studio 2008 DTS package run in 32 bit mode in the designer. This can be done by:

  1. Goto the properties of the SSIS project 
  2. In the left tree view select Configuration Properties
  3. Set Run64BitRuntime = false

This enables you to debug the SSIS package. The next step is to execute the package in 32 bit. The easiest approach is to use

DTExec /f "D:\MyPath\MySSISPackage.dtsx" /SET \package.Variables[Variable1].Value;"Value1" /SET \package.Variables[Variable2].Value;"Value2" /X86

This works well if the total length of the command line is less than 255 characters. In my case I had a lot of parameters.

To get arround this 255 caharacter limitation I think the best solution is to make a command line executable that is compiled in 32 bit. I thought about making a power shell command for this but decided it would be simpler to keep it to a command line and search the standard output for "Success" or "Failure".

To get this to work I included the following DLLs:
D:\Program Files (x86)\Microsoft SQL Server\100\SDK\Assemblies\Microsoft.SqlServer.ConnectionInfo.dll
D:\Program Files (x86)\Microsoft SQL Server\100\SDK\Assemblies\Microsoft.SqlServer.Dts.Design.dll

D:\Program Files (x86)\Microsoft SQL Server\100\SDK\Assemblies\Microsoft.SqlServer.DTSPipelineWrap.dll

D:\Program Files (x86)\Microsoft SQL Server\100\SDK\Assemblies\Microsoft.SQLServer.DTSRuntimeWrap.dll

D:\Program Files (x86)\Microsoft SQL Server\100\SDK\Assemblies\Microsoft.SQLServer.ManagedDTS.dll
The code looked like:


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using Microsoft.SqlServer.Dts.Runtime;
using System.IO;
using System.Threading;

namespace Ssis32BitExec
{
class Program
{
static void Main(string[] args)
{

if (args.Count() != 2)
{
Console.WriteLine("Ssis32BitExec ");
Console.WriteLine("===========================================");
Console.WriteLine("");
Console.WriteLine("Utility to execute an SSIS package in 32 bit mode thus enabling 32 bit drivers");
Console.WriteLine("that may not be available in 64 bit");
Console.WriteLine("");
Console.WriteLine("");

Console.WriteLine(" UNC path to .dtsx file (SSIS package)");
Console.WriteLine(" Path to parameter name and value pair file (This file");
Console.WriteLine(" contains lines like ParameterName;ParameterValue)");

Console.ReadLine();
Environment.Exit(0);
}

string packagePath = args[0];
string parameterValuePairsPath = args[1];

Console.WriteLine("SizeOf IntPtr is: {0}", IntPtr.Size);

if (!File.Exists(packagePath))
{
Console.WriteLine("Failed: The following file does not exist {0}", packagePath);
Environment.Exit(0);
}

if (!File.Exists(parameterValuePairsPath))
{
Console.WriteLine("Failed: The following file does not exist {0}", parameterValuePairsPath);
Environment.Exit(0);
}


XmlDocument packageDoc = new XmlDocument();
packageDoc.Load(packagePath);

Package package = new Package();
package.LoadFromXML(packageDoc.DocumentElement, null);


string[] lines= File.ReadAllLines(parameterValuePairsPath);
string[] Fields;
foreach(string line in lines)
{
Fields = line.Split(';');
if (Fields.Count() != 2)
{
Console.WriteLine("Failed: The following line is not correctly formated {0}", line);
Thread.CurrentThread.Abort();
}
if (package.Variables.Contains(Fields[0]))
{
package.Variables[Fields[0]].Value=Fields[1];
}
}

//This is a fudge factor that allows a badly constructed package to throw some errors and run through
//package.MaximumErrorCount = 2;


DTSExecResult result = package.Execute();

if (result == DTSExecResult.Success)
{
Console.WriteLine("Success");
Environment.Exit(0);

}
else
{

Console.WriteLine("Failed");
for (int i = 0; i < package.Errors.Count; i++) { Console.WriteLine(package.Errors[i].Description); } Environment.Exit(1); } } } }



To know for sure that this is running in 32 bit the Console.WriteLine("SizeOf IntPtr is: {0}", IntPtr.Size); equals 4 for 32 bit and 8 for 64 bit.

I call this from within VB.NET in:


Dim TempFilePath As String = IO.Path.GetTempFileName
Dim Sw As New StreamWriter(TempFilePath)
Sw.WriteLine("Param1;{0}", Value1)
Sw.WriteLine("Param2;{0}", Value2)
Sw.Close()

Dim SSIS32BitProcess As New ProcessStartInfo()
SSIS32BitProcess.WorkingDirectory = Environment.CurrentDirectory
SSIS32BitProcess.FileName = "Ssis32BitExec.exe"
SSIS32BitProcess.UseShellExecute = False
SSIS32BitProcess.RedirectStandardOutput = True
SSIS32BitProcess.RedirectStandardError = True
SSIS32BitProcess.Arguments = packagePath + " " + TempFilePath

Dim Proc As Process = Process.Start(SSIS32BitProcess)

Dim Out As String = Proc.StandardOutput.ReadToEnd()
Proc.WaitForExit()

Dim Err As String = Proc.StandardError.ReadToEnd()
Proc.WaitForExit()

If Out.Contains("Success") Then
Return True
Else
Message = Out + vbCr + Err
Return False
End If
File.Delete(TempFilePath)
Return True

Monday 5 December 2011

How to version control databases using TFS and Visual Studio Database Projects

We have some databases that contain certain reference data. We would like to keep track of changes to both the schema and the reference data and store both in one central place. Another challenge was to allow everyone to develop on a local version of the database thus avoiding colliding and breaking changes. This was solved in the following way:

1. Open Visual Studio 2010
2. File/New Project
3. Go to the tree view of the New Project Dialog select
Database/SQL Server
4. In the list select SQL Server 2008 Database Project
- Choose an appropriate Solution Name
- The Name should be the same as the database you are putting into source control
5. [OK]
6. Right mouse click on the database project and select "Import Database Objects and Settings"
7. Click New Connection
8. Server Name = (local)
Database Name = the local database you are trying to version
[OK]
9. Select the following check boxs:
Script the collation only if it is different from the database collation
Import extended properties
10. [Start]

If you have multiple instances of SQL Server (eg 2005, 2008 etc) Then you need to create an Alias for the database instance that you are targeting

Aliases are created by:
1. Open Sql Server Configuration Manager
2. Click on the SQL Server Network Configuration and enable Named Pipes
3. Click on all versions of SQL Native Cient 10.0 Configuration
Add a new alias
- Alias Name: MyDB_LOCAL
- Pipe Name: \\localhost\pipe\MSSQL2008R2
- Protocol: Named Pipes
- Server: localhost\SQL2008R2
4. Repeat 3 for SQL Native Cient 10.0 Configuration (32bit)

The Database may be documented using extended properties

Extended properties are access by
1. Open the Microsoft SQL Server Management Studio ...
2. Right Mouse click and select properties
3. In the tree view select properties
4. Add the combination of Name and Value
Extended properties are available on the database level down to the table field level

After this you have imported the database schema into a Visual Studio Database project. The next step is to insert data. Depending on what you want to achieve there are several mechanisms for importing data.

The PostDeployment script is intended to be used for populating reference data. This can be found in the following way:
In the VS2010 database projects under the solution directory Scripts\PostDeployment\Script.PostDeployment.sql
In the file system under DatabaseProject\Scripts\Post-Deployment

There are several possibilities for inserting reference data:

1. If it's small you can use INSERT INTO eg
INSERT INTO [MyDatabase].[dbo].[DATABASE_VERSION] ([DATE],[MODEL_VERSION],[DATABASE_VERSION],[COMMENTS])
VALUES ('2011-01-07','4.1.0','1.0.0','A comment about what is new in this version')
GO

2. If it's big and in multiple tables any you are not looking to compare changes in reference data you can include a bakup of a database.

Then the script would look something like:

USE [master]
go

if (exists(select * from sys.databases where name=N'Testdata') )
begin
EXEC msdb.dbo.sp_delete_database_backuphistory @database_name = N'Testdata'

ALTER DATABASE [Testdata] SET SINGLE_USER WITH ROLLBACK IMMEDIATE

DROP DATABASE [Testdata]
end
go

-- Note that SolutionRoot is an environmental variable that can either be set in the control panel or via a batch file that opens the database project

RESTORE DATABASE [Testdata] FROM DISK = N'$(SolutionRoot)\MyDatabase\Scripts\Post-Deployment\Testdata.bak' WITH FILE = 1, MOVE N'Testdata' TO N'$(DefaultSqlFileLocation)\Testdata.mdf', MOVE N'Testdata_log' TO N'$(DefaultSqlFileLocation)\Testdata_1.ldf', NOUNLOAD, REPLACE, STATS = 10
GO

INSERT [MyDatabase].[dbo].[StudentList] (StFName,StLName,StEmail,OrderID)
SELECT StFName,StLName,StEmail,OrderID FROM [Testdata].[dbo].[StudentList]

3. Here is another method using BCP and a binary file.
This can be useful when you don't want the inconvenience of creating a backup etc.

To create the bcp file (adjust the table in the from-clause, and possibly the server after the -S param):
bcp "select field1,field2,field3 from MyDatabase.dbo.MyTable order by Id asc" queryout MyTable.bcp -S localhost -T -E -N


The script looks something like:

:setvar EstimatedRowCount 5000000

ALTER TABLE MyDatabase.dbo.MyTable NOCHECK CONSTRAINT ALL

bulk insert MyDatabase.dbo.MyTable from '$(SolutionRoot)\MyDatabase\Scripts\Post-Deployment\MyTable.bcp' with (DATAFILETYPE = 'widenative', order(ModelEventId asc), keepidentity, KEEPNULLS, ROWS_PER_BATCH = $(EstimatedRowCount))

ALTER TABLE MyDatabase.dbo.MyTable WITH CHECK CHECK CONSTRAINT ALL


4. Here is another method using BCP and a text file.
With this it is possible to compare versions of reference data directly out of TFS
To create the text file

i. Right mouse click on the database name and select All Tasks/Export Data
ii. Select Next and then in the Choose a source dialog select
- Database source : SQL Server Native Client 10.0
- Server Name : The Server Name
- Database : The database
[Next]
iii In the Choose a destination dialog choose:
- Destination : Flat File Destination
- File name : path to tablename.txt
- Locale : English (United states)
- Code Page 1252 (NASI Latin1)
- Unicode is not checked
- Format : Delimted
- Text qualifier :
- Column names in the first data row is not checked
Next
iv - Copy data from one or more tables or views
Next
v . Source taböle or view : the table to be exported
- Row delimiter : {CR}{LF}
- Column delimiter : Semicolon {;}
Next
vi - Run immediately
Finish
Finish


Then to import script looks like:

BULK INSERT MyTable FROM '$(TFSROOT)\DataBases\MyDatabase\Scripts\Post-Deployment\MyTable.txt' WITH (FIELDTERMINATOR = ';', ROWTERMINATOR = '\n' )


So now we have a Visual Studio 2010 Database project and data. Now suppose that you have a view that references another table in another database. You then create a sub database project within the database solution but when you want to deploy the database you get an error like

Error 2 SQL03006: View: [dbo].[MyView] has an unresolved reference to object [MyOtherDatabase].[dbo].[MyOtherTable]. D:\SOURCE\DataBases\MyDatabase\Schema Objects\Schemas\dbo\Views\MyView.view.sql 12 23 MyDatabase

Looking at the SQL
CREATE VIEW dbo.MyView
AS
SELECT dbo.MyTable.field1, MyOtherDatabase.dbo.MyOtherTable.Field
FROM dbo.MyTable INNER JOIN
MyOtherDatabase.dbo.MyOtherTable ON dbo.MyTable.Id = MyDatabase.dbo.MyTable.FKId

The problem is MyOtherDatabase.dbo.MyOtherTable.Field to a field in a table in another database. This is causing the deployment to fail. The solution to this can be found in an article on how to defining Cross-Database References at http://msdn.microsoft.com/en-us/library/bb386242.aspx
When adding the database reference you must define a database variable.
Fill out the Add Database Reference as follows:
- Database projects in the current solution: YourRefDB
- Database Reference Variables
- Name
- Uncheck Literal
Name $(IO_DB) Value YourRefDB
- Check Update the existing schema object definitions and scripts to use the database variables
- Check Suppress errors caused by unresolved references in the reference project
[OK]
Then you get to an opportunity to edit the modifications to the scripts
This resolves the compilation errors.

The next problem comes when we want to change the Deployment function from generating a script to deploying the script
Going to the solution view and looking at the properties of the database project and changing the Deploy action to "Create a deployment script (.sql) and deploy to database

Message 1 The deployment script was generated, but was not deployed. You can change the deploy action on the Deploy tab of the project properties. D:\SOURCE\DataBases\DBProj\DBName\sql\debug\DbName.sql 0 0 DBName

The solution to this is:
1 Right mouse click on the database project and select properties
2 Goto the Deployment tab
3 Edit Target connection
Server :(local)
Database name :MyDatabase

After this the deployment of the database project works…

I found some other curious effects. If you try and open this database project on a server where you have uninstalled SQL2008 R2 and installed SQL2008 SP3 you will get the following error

D:\SOURCE\DataBases\MyDBProject\DBName\DBName.dbproj : error : Could not load file or assembly 'Microsoft.SqlServer.Management.SqlParser, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91' or one of its dependencies. The system cannot find the file specified.

This problem is caused because the uninstall process removed some assemblies needed by Visual Studio to correctly interpret the database project. There are 2 ways to solve this. Either re-install VS2010 or run the following commands

Open a command line and goto: D:\Software\VS2010\DVD\WCU\SMO
Then execute
msiexec /i SQLSysClrTypes_amd64_enu.msi
msiexec /i SharedManagementObjects_amd64_enu.msi

Next change dir to D:\Software\VS2010\DVD\WCU\DAC and execute:
msiexec /i DACFramework_enu.msi
msiexec /i DACProjectSystemSetup_enu.msi
msiexec /i TSqlLanguageService_enu.msi

Another interesting thing happens when the MS SQL2008 Server has been setup with different paths.
To fix this problem go to the database project and then open the following solution folder path Schema Objects/Database Level Objects/Storage/Files. Here you will find 2 files with names like:

MyDatabase.sqlfile.sql
MyDatabase_log.sqlfile.sql

Within these files you have

ALTER DATABASE [$(DatabaseName)]
ADD FILE (NAME = [MyDatabaseName], FILENAME = 'G:\Data\MyDatabaseName.mdf', FILEGROWTH = 1024 KB) TO FILEGROUP [PRIMARY];

To make the database project run on any setup you need to modify the path using the environmental variables DefaultDataPath and DefaultLogPath. For example:

ALTER DATABASE [$(DatabaseName)]
ADD FILE (NAME = [$(DatabaseName)], FILENAME = '$(DefaultDataPath)$(DatabaseName).mdf', FILEGROWTH = 1024 KB) TO FILEGROUP [PRIMARY];

By the way if you want to use environmental variables within a Query in MS SQL Server Manager you need to:

1. Goto the Query drop down menu
2. Select SQLCMD Mode
3. Cut and paist script....

Then you can use commands like

:setvar somevariable somevalue

Friday 7 October 2011

Cross platform applications with HTML5/JavaScript and how this ties in with SOLID OO principles

I have just started to think about writing applications that are cross platform impendent ie that can run on devices like iPad. Here are some notes about this subject
Over the last few years I have made extensive use of SOLID. OO programmin principles as I developed C# applications in .Net.

Where
SRP: The Single Responsibility Principle A class should have one, and only one, reason to change.
OCP: The Open Closed Principle You should be able to extend a classes behavior, without modifying it.
LSP: The Liskov Substitution Principle Derived classes must be substitutable for their base classes.
ISP: The Interface Segregation Principle Make fine grained interfaces that are client specific.
DIP: The Dependency Inversion Principle Depend on abstractions, not on concretions.


My first reaction was to reapply these principles in JavaScript. So I started to think of interfaces, dependency injection containers, mocking frameworks and Test Driven development. It turns out that there is a fundamental difference between C# and JavaScript in the sence that c# is class based and JavaScript is prototypical. So it is difficult to apply the notion of interfaces and abstraction in the world of JavaScript.

So the question is what principles are relevant to a Prototypical language over a Class based OO language. As far as I can see :

S Applicable
O Not Applicable
L Applicable
I Not Applicable
D Not Applicable

One principle that applys to both programming paradigms is DRY which stands for Don't Repeat Yourself

JavaScript Object Oriented Programming include classes, inheritance, and scope that can be used encapsulate, support namespaces, and avoid collisions. Actually JavaScript does not have a class entity BUT it implements the pattern of classes. The difference is in the inheritance model. In other object oriented languages, class is an actual data type that represents the blueprint for creating objects. In JavaScript, although we can use Functions to simulate an object blueprint, they are just in fact objects themselves. These objects are then used as models (aka prototypes) for other objects. Applying the concept of prototypal inheritance allows us to create “subclasses”, or objects that inherit the properties of another object. This becomes particularly useful when we want to use the methods of another object with some slight modifications.


The next thing I was thinking about is that after programming a while in Xaml I preffer to use the MVVM model over the MVC. This is in MVC the Controller is tightly coupled to the view. In practical terms this means that every time you need to change a control on the view the controller needs to change things like how this effects all the other controls. Where as in MVVM the View is completely separate from the VM and thus a change to a control results in minimal update to the ViewModel. It turns out that there is a JavaScript library called KnockOut.js that makes the MVVM pattern possible

Here are some examples of JavaScript. It can be seen that JavaScript is far from just a begineers language. JavaScript's prototypal OOP is flexible and it does not limit the programmer to things like static typing.


Here is an example of how to create your own foreach function:

Array.prototype.foreach.function(callback)
{
for (var i=0; i= companrand:}
"=" : function(comporand){return function(e){return e= companrand:}
}
}

a.set(op[">="](5).foreach(alert);

This is called Curring after Mr Curry.
btw If you uise this in DOS you will need toescape the > to ^>

----
Here is a way tobiuld up the arrays:
Number.protype.inputModule
{
UpTo : function(Upeer,step)
{
var a[]
for(var i = this.valueof(Upper); j return a;
}
}

(1).Upto(20,2).....

This is called High Order Programming

Here's a link to some further examples: http://www.w3schools.com/js/

The next version of Visual Studio will have a lot more support for JavaScript development, so this should reduce the barrier to trying this out...

Tuesday 27 September 2011

Condensed notes from the Build about where Microsoft technology is going over the next year

Here is a summary of what I learned at the Build. I went to a lot of trouble this year to attend this conference which was not easy considering that the agenda was empty up to the last minute. But it was certainly worth while because Microsoft reveled where it is going with its technology which included a unified user experience from mobile devices through to large television sized devices.

CNN “Microsoft unveils a radically redesigned Windows 8”

This technology will only be available in about a year from now, probably industry will adopt this in perhaps 2 years from now. This conference is important because no strategy would be complete without an idea of where Windows is going because it is our main operating system. It is necessary because our software engineers need to gather the skills needed to take advantage of this technology and it is necessary in order to make informed decisions over if and how to integrate other technology such as iPhone and iPad.

Windows 8 Metro Style

We can expect a shift towards touch base applications over the coming years. Microsoft is of the opinion that all screens will be touched enabled and the ones that are not will feel antiquated when used. To make use of the Windows 8 metro style the user will have to adapt by learning new gestures and become familiar with the concept of having a less cluttered desktop using active Tiles. This means a “Re imagine your application” when making an application in Metro style. It was emphasized that there are a number of applications where the chrome style is simply the most appropriate meaning that this style of UI will continued to be supported. The difference between metro and chrome will go along the lines of the difference between a DOS prompt and Windows ie an expert system that requires training to a intuitive system that requires minimal training. Chrome seems to be based on the philosophy that “Less is more”.

Performance and energy saving has become a key factor in the design of Windows 8. A cold start up takes about 20 seconds. To make this possible the number of running processes has been minimized and that tasks that are not currently in view are suspended. This has an impact on the design and development of applications because they only have a few seconds to persist there state before the task is suspended. It is now possible to boot a Windows 8 client directly from a USB memory stick. Also the first bleeding edge problems with VS are coming to light eg

http://blog.galasoft.ch/archive/2011/09/25/quick-tip-killing-a-metro-style-app-in-windows-8.aspx?utm_source=twitterfeed&utm_medium=twitter&utm_campaign=Feed%3A+galasoft+%28Laurent+Bugnion+%28GalaSoft%29%29

There are some features that may take more time to be adopted by the industry. Such as location aware applications that can make use of local devices or the touch sensor that perform a kind of electronic hand shake and setup a link between the 2 devices for collaboration or simply to exchange an electronic business card.

The Windows 8 user experience is based on a fast and fluid immersive and full screen user experience. Applications communicate with each other through “charms” that are implemented via shared contracts. The same intuitive interface will be used across all Windows devices from mobile devices through desk tops and televisions.

The result is a non cluttered canvas with controls. Commands that are frequently used go on the canvas, all others are revealed through the edge gestures. The metro style desktop consists of Tiles that provide simple text or images via pre defined templates and come in 2 sizes. Secondary tiles are created by pinning the content of the application and have the same capabilities as the main tile. They provide a deep link within the application. Live tiles keep people connected to your app and male it more likely the application is put on the first page.

Notifications appear via a toaster icon that appears for a short period of time and user must opt onto it. They use Windows Push Notification Service WNS and can update any time your application is running


There is a new layer that sits directly on the Windows Kernal called WinRT. This is a clean API that has no duplication of Runtime API and all obsolete or inapplicable APIs have been removed. Metro style apps for c# and VB.net can interact directly with WinRT APIs or via .Net for Metro style apps via the CLS and Win32 APIs.

To publish a metro app there is a pipeline of checks that guarantee that an application quality. This consists of Pre Processing, Security tests, Technical Compliance, Content compliance, Signing and publishing.


x86, 64, and ARM processors will all be supported. This has a big positive implication that all mobile devices running on ARM processors will support these applications.

No solid time line was given. Some guesses included perhaps it being available about this time next year. The next milestone is Beta, the RC, then RTM and then GA

Windows 8 Server

I did not attend many Windows 8 server sessions. But a lot of progress has been made on performance. I also believe that the HyperV virtualization has been improved by lessons learned in Azure. This is a way to make low cost hard disks available in a SAN. This can perform very fast because it is possible to make use of more than 1 network card through teaming. It is also possible to change a system drive on the fly.
.Net 4.5

In the key note it is really sure that Silverlight is definitely not dead. Microsoft has been working on making it possible to program the UI in languages programmers might choose. This includes the HTML5/Java script as well as C++. Microsoft is in the process of making a metro style implementation of office. From what I could determine this is going to be made using HTML5 for the view part of the presentation layer and Java script as the model art of the view model. All this sits on WinRT APIs which sits directly on Window Kernel Services. At first site this has been misinterpreted last year as a move away from Silverlight. This is not the case because the idea is to allow the programmer to express themselves in any appropriate programming language and this was demonstrated by including demonstrations in C and C++. I asked some experts whether the HTML5/JavaScript implementation would use the MVVM pattern. The answer was that this question had been asked by a lot of people and that it should be possible but there are no examples for this yet.

Entity framework EF 4.5 now includes Enums, SQL Server and Azure features such as spatial functions
In .Net 4 data manipulation was made by first starting with the data and then setting up the computation. TPL Dataflow allows the computation to be first setup and then the data

Other Parallel Computing Additions

Combinators Task.WhenAll, Task.WhenAny
Timer integration Task.Delay(TimeSpan), CancellationTokenSource.CancelAfter(TimeSpan)
Task schedulingConcurrentExclusiveSchedulerPair
Fine-grained control DenyChildAttach, HideScheduler, LazyCancellation, EnumerablePartitionerOptions
ThreadLocal.Values
PERFORMANCE (“it’s just faster”)
The garbage collector has been improved. A multi-cored JIT with pre-fetch options makes ASP.Net start 35% faster.

.Net 4.5 is an in place upgrade from .Net 4.0

VS2011

Firstly VS2011 can open VS2010 project without changing the format.

There is a 3D editor that has the capability of breaking down each transformation that is made to a 3D object. It is not designed for creating these objects but can manipulate them. Improvements have been made on the IDE experience for C++ programmers. This includes unit tests, and color intellisence pickers etc.

Visual Studio 11 is the tool for Windows 8 and supports VB.NET, C#, C++ & HTML5/JS. There is a new designer Available today built on shared architecture with Expression Blend. When shelving changes information over which windows are open are saved together with the shelf set and are re established when opening the shelve set the windows are reopened in their original position. There is also a document well that avoids each time a file is clicked upon that a new document window is created. There is a XAML editor with IntelliSense. It is now possible to use C# code directly in Jscript and have changes in the generated HTML fed back into the code behind files.

There is a static code analysis feature that allows you to look and replace chunks of repeated code.
PowerPoint templates enable the mocking of an interactive UI.

Asynchronous programming is necessary for creating responsive fast and fluent applications. Here are some new features coming with .Net 4.5

Asynchronous programming models
Windows Runtime: IAsyncOperation
.NET Framework: Task
JavaScript: Promises
All are objects representing “ongoing operations”
All use callbacks to signal completion of operation
Challenge: Callbacks turn your code inside out
Insight: Automatic transformation to callbacks is possible

Asynchronous methods automatically transform normal code into a callback state machine eg

public async Task GetXmlAsync(string url) {
var client = new HttpClient();
var response = await client.GetAsync(url);
var text = response.Content.ReadAsString();
return XElement.Parse(text);
}

.Net 5.0 aka Project Roslyn

The .Net 5.0 JIT will be written in C#. This means that it will be possible to call APIs to the pipeline used to compile code. This pipeline comprises of 4 steps

1. Parser
2. Symbols / Meta data export
3. Binder
4. Emitter

Here’s an example

ScriptEngine engine = new ScriptEngine();
Session session = Session.Create();
Engine.Execute(“using System;, session);
Engine.Execute(“for(int I = 0; I < 10; i++) Console.WriteLine(i*i);”,session); Var f = (funcengine.Execute(“new Func(Sqr)”);
For (int i=0; i<10 i++) Console.WriteLine(f(i)); This might look strange but it could be used in allowing users to use a macro style interface to your application. In my case I doubt that I would use this but it is very interesting. I could imagine that this will allow mixing of languages that could be useful in creating fast code. There is also a command line Roslyn C# Compiler that runs in a similar style as for F# and includes intellisence and code tips. To create a reference to another assembly “er” is used, eg: Er “PresentationCore” Here’s an example of translating c# into VB Public VB.SytaxNode Covert( CS.SyntaxTree syntaxTree, IDictionary identifierMap = null,
Bool convertStrings =false)
{
Var text = syntaxTree.Text;
Var mode = syntaxTree.Root;
Var vbText = Convert(text, node, identifierMap, convertStrings);
Return VB.SyntaxTree.ParseCompilationUnit(vbText).Root;
}

There is also a feature for .Net 3.5 on demand that I believe means that we don’t need to install the .Net 3.5 framework when certain obsolete functions are required.

.Net 5 will be a fresh installation

HPC

I have met up with some HPC / Azure MVPs and discussed some of the problems that I encountered when making my cat model in the cloud. The problem was that I was not able to transfer a 60 GB VM into the cloud including a local SQL Server. The suggestion was to use SQL Azure and transfer data via blob storage. In the next couple of months there will be a release of HPC Server that runs of a standard worker role meaning that we don’t need to construct and maintain our own VM. For very large quantities of data we could consider speeding up the queries by using TriadLinq aka Linq to HPC. This works by producing a sealed data block that is replicated between multiple nodes and is then queried upon in parallel. In addition to the Windows server based HPC Scheduler there is an Azure based HPC Scheduler. These are useful because it means that we don’t need to write our own schedulers.

There was a demo of a PC with 4 water cooled graphics cards that had 2500 times the processing power of a Cray. The HPC team uses these machines for numerical tasks. Currently it takes a crack programmer to correctly program such an algorithm, the hope is that this will become abstracted away at some point in the future.

Azure

Azure is a very cheap, elastic and quickly configurable source of compute power and one that we really should take advantage of. I saw some presentations on debugging and branch caching.

ALM Application Life Cycle Management

Attendees of Build where given a golden ticket to an evaluation version of the SaaS version of Team Foundation Server 2010. This contains a many of Application Lifecyle Management features that are all a part of professional software engineering. The Application life cycle is based on 2 parts . The Development cycle where a product backlog is worked upon during sprints to create a working software asset. The Operations which consists of an Ops back log and monitoring. When a bug is discovered a corresponding requirement is made that is sent back to the development backlog. In addition to the existing ALM functions a Code review function was demonstrated in the context of a development team with inline or side by side code comparison

Currently both TFS Server and TFS SaaS are capable of:
Work items, Source Control and Build
Agile Product/Project Management
Test Case Management
Heterogeneous Development

TFS SaaS has the advantage when:
Near Zero setup and administration
Collaborate with anyone from anywhere

TFS Server has the advantage when
Virtual Test Lab Management
SharePoint Integration
Data Warehouse and Reporting

Thursday 22 September 2011

User experience in an L39 Jet


This has got to be the ultimate user experience. I am sitting behind the pilot in this Russian designed L39 Jet. If anyone would like to try it is quite expensive but is something you will never forget. Here are some links:



My pilot was Dave Riggs who is a hollywood stunt pilot for films like XXX, Iron Man, Lord of War. We flew together with David LaFaille and did some combat flying and some low level flight avoiding radar in canion, by low level I mean we flew 4m of the ground. My stomach had a problem digesting this new user experience so we finished the day with some formation flying.


Thursday 15 September 2011

First Impression of the Build

Just before the build started I had a chance to talk to a user experience MVP and to a company that has made some compelling applications for Health care and emergency services. This was quite fortunate because the majority of this conference is all about a revamp of the way that the user experiences Windows. Basically over the coming years there will be a shift towards touch based applications. The vision is that all screens will be touched enabled and the ones that are not will feel antiquated when used.

It was very interesting talking to Christian Moser before the conference. He is writing a book on User Experience where controls are represented by a kind of design pattern. For a control to be successful it’s use must be intuitive which means that it’s use should be common knowledge. This means that you should think twice before inventing a new way of interaction. In the context of Windows 8 this is going to be interesting because there are a new gestures that have been invented. Also the concept of having a less cluttered desktop with active tiles has an impact on the design of applications. I heard a lot the buzz word of “Re imagine your application”. There are a lot of sessions around this Metro style. It was also emphasized that for some applications the chrome style is simply the most appropriate UI and that this will be continued to be supported. While I was talking to Christian he mentioned that in his book he will be describing some aspects of expert users. Before the build the example on an expert user was the airline booking system which is a command line interface that requires a lot of training but is super efficient. I could imagine that the difference between metro and chrome would go down the same lines. Chrome seems to be based on the philosophy that “Less is more”, according to my friends at BlackMarble most users really appreciate something like this.

In the key note it is really sure that Silverlight is definitely not dead. In fact MS has been working on making it possible to program the UI in languages programmers might choose. This includes the HTML5/Java script as well as C++. Performance and energy saving has become a key factor in the design of Windows 8.

Before the conference I was talking to some friends at Black Marble about how an application can determine where it is. For example in a hospital application your device should know what other devices are nearby. This can be done in several ways. One way is to use the SID of the wireless network. This works quite well but there can be problems of leakage of signal from one zone to another. Another method that could be used is GPS repeaters. This is essentially like setting up a GPS satellite within your building. This could be particularly interesting for emergency services in conjunction with some mapping on a Surface 2 system to keep track of where there fire fighters are etc.

Part of the Windows 8 slate included a touch sensor. The concept is that two people are using an application and what to collaborate. So they physically touch there slates and an electronic handshake occurs that enables further communication over Bluetooth etc. This also works with sensors which I believe work by induction and need no battery. Another example is a business card that when touched opens a web browser to a website. Another example starts an application that if not present will install it first then run it.

There is really a lot going on. I have met up with some HPC / Azure MVPs and discussed some of the problems that I encounted when making my cat model in the cload. Basically my problem was that I was not able to transfer a 60 GB VM into the cloud including a local SQL Server. The suggestion was to use SQL Azure and transfer data via blob storage. Also to think about TriadLinq which is a way to spread out queries onto multiple VMs. The thing is Azure is simply way cheaper than using in house servers. So I will attend some sessions that will look at these in a little more detail. On the Slate PC is a pre beta version of Windows 8 and Visual Studio 2011. I have also got an evaluation version of the SaaS version of Team Foundation Server 2010 which contains a lot of Application Lifecyle Management features that are all a part of professional software engineering.

Wednesday 20 July 2011

Some notes on the application of Cloud computing

I am just about to go on holiday. So before I forget here are ideas about the application of Cloud Computing to Catastrophe Modeling.

Cloud computing makes sense for Cat simulations for various reasons. The costs are very reasonable, for example all my tests came to about 82 CHF compared to the fixed cost of a VM in a datacenter that mounts up to several thousand franks. Microsoft can achieve this by making the administration of the hundreds of thousands of VMs as automated as possible thus getting an economy of scale that we don't have in smaller data centers. Creating the cloud VMs took a few minutes as opposed to weeks via our internal/external processes. I could imagine that WMWare will improve the provisioning of virtual machines and that one day this may be as simple as filling out a web site. But we don't have such a system yet and if we did we would have a much more limited pool of servers and therefore higher costs than if the whole thing was in the cloud. One last aspect was that surprisingly the end to end time to process the bench mark cat model in the cloud took less time than on our on-premis servers.

There are some different possibilities how number crunching processes can be implemented. I have seen solutions that can call an executable, if this executable needs something to be installed on the machine it is possible to configure setup tasks that install software as the VM is being instantiated. In the context of my modeling platform I think I would implement the job submission in the same way as in the prototype I described earlier. By this I mean I would transfer the data to be simulated as a blob and once completed add a reference to this data in a Queue that the work roles pole for work. The difference would be instead of poling the results queue I would expose an on premise WCF service over http and call this from the cload using claims based tokens. I would use Queues for status information because they have an intrinsic order, but I would send this information via one way calls to the on premise web services. Since security is made using claims based security the role of the firewall changes slightly. The reason is that cloud apps need to connect via http or https to internal services and that the services and not the firewall will carry out authentication. So there is a shift of responsibility from the firewall the services

From my experience of uploading custom VMs into the cloud I am not sure how well HPC with burst into Azure works in practice. I will follow up on this at the Build conference.

Thinking about how Cloud computing can be integrated into an organization there are 3 aspects to consider Network, Storage and Compute.

A means of synchronizing data between office and mobiles devices brings substantial benefits. Some years ago I tried a CTP of Microsoft Mesh which enabled the synchronization of data between devices. Apple recently released a similar cloud service. There was a demo at the PDC which showed a business man losing his PC but because all his configuration and working data is continuously in synch it was possible to take a new PC and carry on where he left off. Since the CTP I have seen a version of Mesh in Hotmail and Office 360. Although it is difficult to come up with a single business case where this is useful it does add to productivity because it improves the functionality of the environment that we work in.

On the network idea there are datacenters around the world making a global presence much easier to maintain. On the other hand there is latency in getting data to and from the cloud. This makes the SaaS Team Foundation Server offering very interesting.

I think cloud computing will bring about a shift from relational databases to a more object orientated data storage model. Relational databases are not intrinsically more performant. At present I believe the performance of properly dimensioned on premise sql servers out-perform the cloud sql servers. This will probably change SQL Azure developes and as technologies like TriadLinq become available enabling the distribution of the workload needed in carrying out database queries. The reason is that a central sql server is a bottle neck, where as in a cloud the compute needed to carry out queries is by design made for scalability.

With this in mind I think it's worthwhile to encourage the development of web based apps or apps that can be deployed from the web because these fit easily into the Cloud. We would need to manage the scope of the broad set of mobile devices that may or may not be supported in our environment. In particular Rapid Application Development in the context of Microsoft orientated companies which would favor Windows mobile devices even though these devices have not been on the market long enough to really take a large market share. To develop iPad and iPhone applications means needing an Apple workstation learning C and a whole new API.

When looking at data It is difficult to categorize data into security groups. The reason is that depending on the application the data either be sensitive or non sensitive depending on it's context. It is therefore more practical to make security groups for applications

Thursday 9 June 2011

Parallel.For seems to have a large overhead compared to a simpler implementation

I implemented the following alternative to the parallel for using threads:

        static void MyParallelFor(int fromInclusive, int toExclusive, Action<int> body)
        {
            int numProcs = Environment.ProcessorCount;
            using (CountdownEvent ce = new CountdownEvent(numProcs))
            {
              int rangeSize = (toExclusive - fromInclusive) / numProcs;
              for (int p = 0; p< numProcs; p++)
              {
                  int start = rangeSize * p;
                 //int end = start + rangeSize;
                 int end = p == numProcs-1 ? toExclusive : start + rangeSize;
                 ThreadPool.QueueUserWorkItem(delegate
                   {
                      for (int i = start; i < end; i++) body(i);
                      ce.Signal();
                    });
              }
              ce.Wait();
            }
        }

I then tested this on the benchmark earthquake model as described in my last blog entry. Since the signature of the parallel for and myParallelFor are identical the comparison can be made by commenting out the implementation that is not under test.

            for (i = 1; i <= ns; i++) // Outer loop going through lots of lines of exposure
            {
                loss[i] = 0;

                    //Parallel.For(1, nr,
                    MyParallelFor(1, nr, // Inner loop going through many events
                         (jj) =>
                         {
                            // Some maths to determine the mean damage ratio (see previous blog entry for details)
                            localsum[jj] = mdr * value[jj];
                         }

                         });
                 loss[i] = localsum.Sum();
            }

I then ran the above code on a physical machine with ProLiant DL580 G5 12 Cores 2.66 GHz Intel Xeon(R) X7460 with 8188 MB RAM

Parallel.For took about 114 seconds with all 12 CPUs running evenly at about 70%.
MyParallelFor took about 71 seconds with 11 CPUs at 30% and 1 CPU at 80%,

I found this difference quite surprising as it seems that the parallel.for implementation in System.Threading.Tasks seems to have a very large overhead. Not only does it take longer but the cpu’s are working harder. Using reflector I could see that the implementation of private static ParallelLoopResult ForWorker was quite extensive. When I have some time I will add some buffering in the localsum to check whether the problem is caused by cache invalidation. 

Friday 27 May 2011

Results of runing a simplified earthquake model in the cloud

Here are some preliminary results of how long it takes to upload and run 10000 iterations of a cat model under various worker role configurations

Extra Large with 8 cores took 26s
Large with 4 cores took 34s
Medium with 2 cores took 49s
Small with 1 core took 82s

For comparison here are some run times on a D20 desktop

VB.NET on a D20 compiled with VS2010 running on 1 thread  took  104s
F90 on a D20 compiled with Intel running on 1 thread took 162s
C++ on a D20 compiled with Intel with optimization on 1 thread took 58 s

The time needed to upload exposure and events took about 100s Within the cloud it took about 1s to download from blob storage

The initial flat file text files had the following sizes and number of points

Exposure file 7.3MB    385560 Points
Event Catalogue 4.8MB  194510 Events

To reduce the run times we reduced the number of events from 194510 to 10000. So this means the actual run times are 20 times longer.

I have not received the bill for these tests but I think it is probably quite cheap. I think there is a lot to be said for cloud computing because I only had to worry about my code, the infrastructure was completely abstracted away. In fact setting up a new node takes about 8 minutes with a very friendly billing terms that you pay only for the time that the role is active.

Numerical calculations in Azure continued

After some help from Steve Spencer I figured out the best way to debug the WorkerRole was not to do the whole thing in unit tests but to pressed the play button on the WorkerRole and use a unit test to feed it with data. It turned out the problem was within the blob storage code. I still have some work to refactor the worker role but I want to get some results so I am going to leave it as it is for the moment:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Net;
using System.Threading;
using Microsoft.WindowsAzure;
using Microsoft.WindowsAzure.Diagnostics;
using Microsoft.WindowsAzure.ServiceRuntime;
using Microsoft.WindowsAzure.StorageClient;
using AzureHelper;
using System.Configuration;
using System.IO;
using System.Text;
using CatModel;
using Contracts;
using Entities;
using Infrastructure;

namespace WorkerRole1
{
    public class WorkerRole : RoleEntryPoint
    {
        private String localPath;
        private String calcsimExePath;

        private EQModel eQModel { get; set; }

        private DateTime Start { get; set; }
        private DateTime Finish { get; set; }
        private TimeSpan Stopwatch { get; set; }
        public ILog Log { set; get; }

        private AzQueueHandler jobInputQueue;
        private AzQueueHandler jobOutputQueue;
        private AzQueueHandler jobLogQueue;

        private AzBlobHandler jobInputBlog;

        private AzBlobHandler dataStoreExposureX;
        private AzBlobHandler dataStoreExposureY;
        private AzBlobHandler dataStoreExposureV;

        private AzBlobHandler dataStoreCatalogSm;
        private AzBlobHandler dataStoreCatalogSx;
        private AzBlobHandler dataStoreCatalogSy;
        private AzBlobHandler dataStoreCatalogSd;

        private AzBlobHandler jobLossBlob;
        private AzBlobHandler jobLogBlob;
        private AzBlobHandler jobOutputBlog;

        private int idleCount;
        private int idleMax;
        private int idleSleepTime;

        public override bool OnStart()
        {
            // Set the maximum number of concurrent connections
            ServicePointManager.DefaultConnectionLimit = 12;

            // For information on handling configuration changes
            // see the MSDN topic at http://go.microsoft.com/fwlink/?LinkId=166357.

            return base.OnStart();
        }

        private void Init()
        {
            AppDomain.CurrentDomain.UnhandledException += new UnhandledExceptionEventHandler(CurrentDomain_UnhandledException);
            idleCount = 0; // how many times the instance have been idle
            idleMax = 10;   // After 10 times (5 minutes = 30 secs *10) being idle, then die
            idleSleepTime = 30 * 1000; // sleep for this number of seconds between each queue poll

            var accountName = ConfigurationManager.AppSettings["AzureAccountName"];
            var accountKey = ConfigurationManager.AppSettings["AzureAccountKey"];

            jobInputQueue = new AzQueueHandler(ConfigurationManager.AppSettings["JobInputQueue"], accountName, accountKey);
            jobOutputQueue = new AzQueueHandler(ConfigurationManager.AppSettings["JobOutputQueue"], accountName, accountKey);
            jobLogQueue = new AzQueueHandler(ConfigurationManager.AppSettings["JobLogQueue"], accountName, accountKey);
            jobInputBlog = new AzBlobHandler(ConfigurationManager.AppSettings["JobInputBlob"], accountName, accountKey);

            jobLogBlob = new AzBlobHandler(ConfigurationManager.AppSettings["jobLogBlob"], accountName, accountKey);
            dataStoreExposureX = new AzBlobHandler(ConfigurationManager.AppSettings["dataStoreExposureX"], accountName, accountKey);
            dataStoreExposureY = new AzBlobHandler(ConfigurationManager.AppSettings["dataStoreExposureY"], accountName, accountKey);
            dataStoreExposureV = new AzBlobHandler(ConfigurationManager.AppSettings["dataStoreExposureV"], accountName, accountKey);

            dataStoreCatalogSm = new AzBlobHandler(ConfigurationManager.AppSettings["dataStoreCatalogSm"], accountName, accountKey);
            dataStoreCatalogSx = new AzBlobHandler(ConfigurationManager.AppSettings["dataStoreCatalogSx"], accountName, accountKey);
            dataStoreCatalogSy = new AzBlobHandler(ConfigurationManager.AppSettings["dataStoreCatalogSy"], accountName, accountKey);
            dataStoreCatalogSd = new AzBlobHandler(ConfigurationManager.AppSettings["dataStoreCatalogSd"], accountName, accountKey);

            jobLossBlob = new AzBlobHandler(ConfigurationManager.AppSettings["JobLossBlob"], accountName, accountKey);
            jobOutputBlog = new AzBlobHandler(ConfigurationManager.AppSettings["JobOutputBlob"], accountName, accountKey);

            eQModel = new EQModel();
            this.Log = new Log("WorkerRole");

            localPath = Environment.CurrentDirectory;

            // pull EXE file from blob storage to local file system
            calcsimExePath = System.IO.Path.Combine(localPath, "Startup\\calcsim.exe");
        }

        public override void Run()
        {
            // This is a sample worker implementation. Replace with your logic.
            Trace.WriteLine("WorkerRole1 entry point called", "Information");
            Init();

            TraceInfo("AzJobHost::Run() Azure Instance ID {0}, DeploymentId {1}", RoleEnvironment.CurrentRoleInstance.Id, RoleEnvironment.DeploymentId);

            bool more = true;
            //Stopwatch swWrkRoleLifetime = Stopwatch.StartNew();

            // msg pump loop
            string id = ""; string popId = "";
            while (more)
            {
                id = ""; popId = "";
                string msg = jobInputQueue.GetMessage(ref id, ref popId, false);

                if (msg == null)
                {
                    idleCount++;
                    if (idleCount >= idleMax)
                        more = false;
                    else Thread.Sleep(idleSleepTime);
                }
                else
                {
                    ProcessMsg(id, popId, msg);
                }
            }

            //swWrkRoleLifetime.Stop();
            //TraceInfo("AzJobHost::Exit(). Execution time {0}", swWrkRoleLifetime.Elapsed);
        }

        private bool ProcessMsg(string id, string popId, string msg)
        {
            bool rc = true;

            //Stopwatch sw = Stopwatch.StartNew();
            TraceInfo("AzJobHost::ProcessMsg( '{0}', '{1}') - Azure Instance Id: {2}", id, msg, RoleEnvironment.CurrentRoleInstance.Id);

            List<double> x = new List<double>();
            List<double> y = new List<double>();
            List<double> v = new List<double>();
            List<double> sm = new List<double>();
            List<double> sx = new List<double>();
            List<double> sy = new List<double>();
            List<double> sd = new List<double>();

            Start = DateTime.Now;
            dataStoreExposureX.RecieveDataFromStorage<double>(msg, out x);
            dataStoreExposureY.RecieveDataFromStorage<double>(msg, out y);
            dataStoreExposureV.RecieveDataFromStorage<double>(msg, out v);
            Finish = DateTime.Now;
            Stopwatch = Finish.Subtract(Start);
            this.Log.LogMessage(String.Format("Time to Upload Exposure to model {0} milliseconds",  Stopwatch.TotalMilliseconds), Stopwatch.TotalMilliseconds);

            Start = DateTime.Now;
            dataStoreCatalogSm.RecieveDataFromStorage<double>(msg, out sm);
            dataStoreCatalogSx.RecieveDataFromStorage<double>(msg, out sx);
            dataStoreCatalogSy.RecieveDataFromStorage<double>(msg, out sy);
            dataStoreCatalogSd.RecieveDataFromStorage<double>(msg, out sd);

            Finish = DateTime.Now;
            Stopwatch = Finish.Subtract(Start);
            this.Log.LogMessage(String.Format("Time to Upload Catalog to model {0} milliseconds", Stopwatch.TotalMilliseconds), Stopwatch.TotalMilliseconds);

            List<double> losses = new List<double>();

            Start = DateTime.Now;
            eQModel.RunModel(x, y, v, sm, sx, sy, sd, out losses);
            Finish = DateTime.Now;
            Stopwatch = Finish.Subtract(Start);
            this.Log.LogMessage(String.Format("Time to Run Japan Earthquake {0} milliseconds", Stopwatch.TotalMilliseconds), Stopwatch.TotalMilliseconds);

            this.Log.LogMessage("=== Appending Model Log to Server Log ====");

            List<LogEvent> ModelLogs = eQModel.log.GetLogs();
            this.Log.Add(ModelLogs);

            jobLossBlob.SendDataToStorage<double>(msg, losses);
            jobLogBlob.SendDataToStorage<LogEvent>(msg, this.Log.GetLogs());

            jobOutputQueue.PutMessage(msg);

            jobInputQueue.DeleteMessage(id, popId);
            return rc;
        }

        private void TraceInfo(string format, params object[] args)
        {
            string msg = string.Format(format, args);
            Trace.WriteLine(msg, "Information");
        }

        private void RoleEnvironmentChanging(object sender, RoleEnvironmentChangingEventArgs e)
        {
            // If a configuration setting is changing
            if (e.Changes.Any(change => change is RoleEnvironmentConfigurationSettingChange))
            {
                // Set e.Cancel to true to restart this role instance
                e.Cancel = true;
            }
        }

        void CurrentDomain_UnhandledException(object sender, UnhandledExceptionEventArgs e)
        {
            TraceInfo("AzJobHost::UnhandledException: " + (Exception)e.ExceptionObject);
            RoleEnvironment.RequestRecycle();
        }

    }
}

Thursday 26 May 2011

Numerical calculations in Azure

Here are the steps that I took in looking at Azure and getting a feel for how we could use this. Recently we have made a number of bench mark tests with a simplified Japan earthquake model. This seems a natural place to start. The idea would be to utilize the compute power of the cloud to make earthquake simulations. Here some pseudo code as to what I intend to do

1.    The client loads exposure into cloud storage
2.    The client loads earthquake catalog into cloud storage
3.    The client adds a reference to the exposure and earthquake catalogue data to a queue
4.    The worker node listens on the input queue
5.    The worker node dequeues a reference to data for download
6.    The worker node downloads data from cloud storage
7.    The worker node processes the data
8.    The worker node saves loss data using a reference
9.    The worker node adds a reference to the loss data to a queue
10.    The client listens on the output queue
11.    The client dequeues the reference to result data
12.    The client downloads results and logs

Looking at this sequence the first thing I want to do is to be able to make CRUD operations on Queues and blobs. I found some good examples how this could be done, but I wanted to extend these examples to be more generic. Here's a handy generic function that sends lists of serializable objects to and from blob storage.

        public void RecieveDataFromStorage<T>(string key, out List<T> data)
        {
                CloudBlobContainer container = _blobClient.GetContainerReference(_blobContainer.Name);
                CloudBlob blob = container.GetBlobReference( key);
                byte[] bdata = blob.DownloadByteArray();

                MemoryStream f = new MemoryStream();
                BinaryFormatter sf = new BinaryFormatter();
                f.Write(bdata, 0, bdata.Length);
                f.Flush();
                List<T> target = new List<T>();
                f.Position = 0;

                target = (List<T>)sf.Deserialize(f); 
                data = target;

            return;
        }

        public void SendDataToStorage<T>(string key, List<T> data)
        {
                CloudBlobContainer container = _blobClient.GetContainerReference(_blobContainer.Name);
                CloudBlob blob = container.GetBlobReference(key);

                MemoryStream f = new MemoryStream();
                BinaryFormatter sf = new BinaryFormatter();
                sf.Serialize(f, data);
                f.Position = 0;
                byte[] bdata = f.ToArray();

                blob.UploadByteArray(bdata);
            return;
        }

Where T can be any serializable entity. With these functions we have an easy way to transfer data too and from the cloud storage.

Next step is that we want to do some calculations. The simplified japan earthquake algorithm is single threaded. In order to make use of various number of processors we need to make this multi core enabled. Normally when making an algorithm multicore enabled you need to first really understand what is the algorithm trying to do. Then the next step is to think still in the problem domain how to split the work, fore example by data or by process. Then this is reviewed in the context of the hardware and technology available. Finally you would code applying patterns that do things like take account of cache invalidation etc. In our case we just want to get a feeling for what various CPU configurations can bring. Therefore I just made the inner loop parallel. The algorithm falls under the cataegory of "embarrasingly parrallizable code" because the loops are allmost completely independent from each other, only the line highlighted in bold has some cross dependancies. The trick is how to sum the losses. Below is the code before and after…

            for (i = 1; i <= ns; i++)
            {
                loss[i] = 0;

                if ((sm[i] > 3))
                {
                    dkrit = 0.5 * sm[i] * sm[i] * sm[i];

                    for (j = 1; j <= nr; j++)
                    {
                        rr[j] = 0.001 * Math.Sqrt((sx[i] - rx[j]) * (sx[i] - rx[j]) + (sy[i] - ry[j]) * (sy[i] - ry[j]));
                        rr[j] = Math.Sqrt(sd[i] * sd[i] + rr[j] * rr[j]);

                        if ((rr[j] < dkrit))
                        {
                            rlog[j] = Math.Log(rr[j]);
                            mmi[j] = 981 * Math.Exp(c1 + c2 * sm[i] + c3 * sm[i] * sm[i] + c4 * rlog[j] + c5 * rr[j]);
                            if ((mmi[j] > 50))
                            {
                                mmi[j] = 3.66 * Math.Log10(mmi[j]) - 1.66;
                                lmmi = Math.Log(mmi[j]);
                                mdr[j] = 0.01 * Math.Exp(v1 * lmmi * lmmi + v2 * lmmi + v3);
                                loss[i] = loss[i] + mdr[j] * value[j]; // <----------
                            }
                        }
                    }
                }

                   Parallel.For(1, nr,
                        (jj) => {
                                    //rr[jj] = 0.001 * Math.Sqrt((sx[i] - rx[jj]) * (sx[i] - rx[jj]) + (sy[i] - ry[jj]) * (sy[i] - ry[jj]));
                                    double rr = 0.001 * Math.Sqrt((sx[i] - rx[jj]) * (sx[i] - rx[jj]) + (sy[i] - ry[jj]) * (sy[i] - ry[jj]));
                                    // rr[jj] = Math.Sqrt(sd[i] * sd[i] + rr[jj] * rr[jj]);
                                    rr = Math.Sqrt(sd[i] * sd[i] + rr * rr);;

                                    if ((rr < dkrit))
                                    {
                                        //rlog[jj] = Math.Log(rr[jj]);
                                        double rlog = Math.Log(rr);
                                        //mmi[jj] = 981 * Math.Exp(c1 + c2 * sm[i] + c3 * sm[i] * sm[i] + c4 * rlog[jj] + c5 * rr[jj]);
                                        double mmi = 981 * Math.Exp(c1 + c2 * sm[i] + c3 * sm[i] * sm[i] + c4 * rlog + c5 * rr);
                                        if ((mmi > 50))
                                        {
                                            //mmi[jj] = 3.66 * Math.Log10(mmi[jj]) - 1.66;
                                            mmi = 3.66 * Math.Log10(mmi) - 1.66;
                                            lmmi = Math.Log(mmi);
                                            //mdr[jj] = 0.01 * Math.Exp(v1 * lmmi * lmmi + v2 * lmmi + v3);
                                            double mdr = 0.01 * Math.Exp(v1 * lmmi * lmmi + v2 * lmmi + v3);
                                            localsum[jj] = mdr * value[jj];
                                        }
                                }

                            });
                    loss[i] = localsum.Sum();

There is a lot of scope for further optimization for example we could add a buffer into localsum to account for cache invalidation. Here's an example of what I did with the base class.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Contracts;
using Entities;

namespace Contracts
{
    public abstract class JobProcessorBase : IJobProcessor
    {
        private DateTime Start { get; set; }
        private DateTime Finish { get; set; }
        private TimeSpan Stopwatch { get; set; }
        public List<double> LossList { get; set; }

        public ILog Log { set; get; }

        public StorageType StorageType { get; private set; }

        public void ProcessJob()
        {

            string key = "";
            TimedAction(() => key = DequeueAndDeleteKey(), "GetExposure");

            this.StorageType = StorageType.None;
            if (key.ToLower().IndexOf("blob") == 0)
            {
                this.StorageType = StorageType.Blob;
            }

            TimedAction(() => GetExposure(key), "GetExposure");
            TimedAction(() => GetEventCatalogue(key), "GetEventCatalogue");
            TimedAction(() => RunModel(), "RunModel");
            TimedAction(() => SaveLosses(key), "SaveLosses");
            TimedAction(() => SaveLog(key), "SaveLog");

        }

        private void TimedAction(Action ToDo, string MethodName)
        {
            Start = DateTime.Now;
            ToDo.Invoke();
            Finish = DateTime.Now;
            Stopwatch = Finish.Subtract(Start);
            this.Log.LogMessage(String.Format("Time to {0} {1} milliseconds",MethodName, Stopwatch.TotalMilliseconds), Stopwatch.TotalMilliseconds);
        }

        public abstract string DequeueAndDeleteKey(); 
        public abstract void GetExposure(string key); 
        public abstract void GetEventCatalogue(string key); 
        public abstract void RunModel(); 
        public abstract void SaveLosses(string key); 
        public abstract void SaveLog(string key); 
    }
}

Next I started to implement the base classes that implements a template method pattern that times each individual step. In the true spirit of test driven development I build slowly an end to end test that loaded the data ran the model and downloaded the results.

        [TestMethod]
        public void RunBlobModel()
        {
            BlobJobDispatcher blobJobDispatcher = new BlobJobDispatcher();
            string key = "";

            // Load data into Cload
            blobJobDispatcher.CreateJob(out key);

            // Pretend to be the Worker Role
            LoadData loadData = new LoadData();
            JobProcessor jobProcessor = new JobProcessor(loadData);
            jobProcessor.ProcessJob();

            // Recieve results back from the cload
            blobJobDispatcher.RecieveResults(out key);
        }

Here's how the client implementation looked like:

namespace Client
{
    public class BlobJobDispatcher : JobDispatcherBase
    {

        private IJobProcessor JobProcessor;
        private LoadData loadData { get; set; }
        private IDataStore dataStoreExposureX { get; set; }
        private IDataStore dataStoreExposureY { get; set; }
        private IDataStore dataStoreExposureV { get; set; }
        private QueueStorage dataStoreExposureQueue { get; set; }

        private IDataStore dataStoreCatalogSm { get; set; }
        private IDataStore dataStoreCatalogSx { get; set; }
        private IDataStore dataStoreCatalogSy { get; set; }
        private IDataStore dataStoreCatalogSd { get; set; }
        private QueueStorage dataStoreCatalogQueue { get; set; }

        private QueueStorage jobQueue { get; set; }

        private QueueStorage resultsQueue { get; set; }

        private IDataStore dataStoreLosses { get; set; }
        private IDataStore dataStoreLog { get; set; }

        public BlobJobDispatcher()
        {
            this.ClientLog = new Log("BlobJobDispatcher");
            loadData = new LoadData();
            string storageConnectionString = ConfigurationManager.ConnectionStrings["Storage"].ConnectionString;
            dataStoreExposureX = new BlobStorage(storageConnectionString, "exposurex", false);
            dataStoreExposureY = new BlobStorage(storageConnectionString, "exposurey", false);
            dataStoreExposureV = new BlobStorage(storageConnectionString, "exposurev", false);
            dataStoreExposureQueue = new QueueStorage(storageConnectionString, "exposurequeue", true);

            dataStoreCatalogSm = new BlobStorage(storageConnectionString, "catalogsm", false);
            dataStoreCatalogSx = new BlobStorage(storageConnectionString, "catalogsx", false);
            dataStoreCatalogSy = new BlobStorage(storageConnectionString, "catalogsy", false);
            dataStoreCatalogSd = new BlobStorage(storageConnectionString, "catalogsd", false);
            dataStoreCatalogQueue = new QueueStorage(storageConnectionString, "catalogqueue", true);

            jobQueue = new QueueStorage(storageConnectionString, "jobqueue", true);

            resultsQueue = new QueueStorage(storageConnectionString, "resultsqueue", true);
            dataStoreLosses = new BlobStorage(storageConnectionString, "catalogsd", false);
            dataStoreLog = new BlobStorage(storageConnectionString, "log", false);
        }

        #region JobDispatcherBase Members

        public override void ReadExposure()
        {
            loadData.ReadExposure();
        }

        public override void ReadEventCatalogue()
        {
            loadData.ReadEventCatalogue();
        }

        public override void SendExposureToCload(string key)
        {
            this.dataStoreExposureX.SendDataToStorage<double>(key, this.loadData.rxList);
            this.dataStoreExposureY.SendDataToStorage<double>(key, this.loadData.ryList);
            this.dataStoreExposureV.SendDataToStorage<double>(key, this.loadData.valueList);
            this.dataStoreExposureQueue.SendDataToStorage(key);
        }

        public override void SendEventCatalogueToCload(string key)
        {
            dataStoreCatalogSm.SendDataToStorage<double>(key, this.loadData.smList);
            dataStoreCatalogSx.SendDataToStorage<double>(key, this.loadData.sxList);
            dataStoreCatalogSy.SendDataToStorage<double>(key, this.loadData.syList);
            dataStoreCatalogSd.SendDataToStorage<double>(key, this.loadData.sdList);
            this.dataStoreCatalogQueue.SendDataToStorage(key);
        }

        public override void SubmitJobToCload(string key)
        {
            jobQueue.SendDataToStorage(key);
        }

        public override string WaitForKey()
        {
            string recievedKey = "";
            while (recievedKey == "")
            {
                resultsQueue.RecieveDataFromStorage(out recievedKey);
                Thread.Sleep(1000);
            }
            return recievedKey;
        }

        public override List<double> GetLossListFromCloud(string key)
        {
            List<double> recievedLosses = new List<double>();
            dataStoreLosses.RecieveDataFromStorage<double>(key,out recievedLosses);
            return recievedLosses;
        }

        public override List<LogEvent> GetLogsFromCloud(string key)
        {
            List<LogEvent> recievedLogs = new List<LogEvent>();
            dataStoreLog.RecieveDataFromStorage<LogEvent>(key, out recievedLogs);
            return recievedLogs;
        }

        public override void PersistLogs()
        {
            this.ClientLog.Save("C:\\Azure\\PartnerRe\\CloudInfra\\Data\\FlatFileLogg.txt",false);
        }

        public override void PersistLossList()
        {
            StreamWriter FileOut = new StreamWriter("C:\\Azure\\PartnerRe\\CloudInfra\\Data\\LossFile.txt");
            foreach (double l in LossList)
            {
                FileOut.WriteLine(string.Format("{0}", l));
            }
            FileOut.Close();
        }
        #endregion

    }
}

Here's how the server side implementation looked like

namespace Server
{
    public class
        JobProcessor : JobProcessorBase
    {
        public LoadData loadData {get; set;}

        private EQModel eQModel { get; set; }

        private IDataStore dataStoreExposureX { get; set; }
        private IDataStore dataStoreExposureY { get; set; }
        private IDataStore dataStoreExposureV { get; set; }
        private QueueStorage dataStoreExposureQueue { get; set; }

        private IDataStore dataStoreCatalogSm { get; set; }
        private IDataStore dataStoreCatalogSx { get; set; }
        private IDataStore dataStoreCatalogSy { get; set; }
        private IDataStore dataStoreCatalogSd { get; set; }
        private QueueStorage dataStoreCatalogQueue { get; set; }

        private QueueStorage jobQueue { get; set; }

        private QueueStorage resultsQueue { get; set; }

        private IDataStore dataStoreLosses { get; set; }
        private IDataStore dataStoreLog { get; set; }

        public JobProcessor(LoadData LoadData)
        {
            this.loadData = LoadData;
            Log = new Log("FlatFileJobProcessor");
            eQModel = new EQModel();

            string storageConnectionString = ConfigurationManager.ConnectionStrings["Storage"].ConnectionString;
            dataStoreExposureX = new BlobStorage(storageConnectionString, "exposurex", false);
            dataStoreExposureY = new BlobStorage(storageConnectionString, "exposurey", false);
            dataStoreExposureV = new BlobStorage(storageConnectionString, "exposurev", false);
            dataStoreExposureQueue = new QueueStorage(storageConnectionString, "exposurequeue", false);

            dataStoreCatalogSm = new BlobStorage(storageConnectionString, "catalogsm", false);
            dataStoreCatalogSx = new BlobStorage(storageConnectionString, "catalogsx", false);
            dataStoreCatalogSy = new BlobStorage(storageConnectionString, "catalogsy", false);
            dataStoreCatalogSd = new BlobStorage(storageConnectionString, "catalogsd", false);
            dataStoreCatalogQueue = new QueueStorage(storageConnectionString, "catalogqueue", false);

            jobQueue = new QueueStorage(storageConnectionString, "jobqueue", false);

            resultsQueue = new QueueStorage(storageConnectionString, "resultsqueue", false);
            dataStoreLosses = new BlobStorage(storageConnectionString, "joblossblob", false);
            dataStoreLog = new BlobStorage(storageConnectionString, "log", false);

        }

        public override string DequeueAndDeleteKey()
        {
            string recievedKey = "";
            while (recievedKey == "")
            {
                jobQueue.RecieveDataFromStorage(out recievedKey);
                if (recievedKey == "")
                {
                    Thread.Sleep(1000);
                }
            }
            return recievedKey;
        }

        public override void GetExposure(string key)
        {
            switch ( this.StorageType)
            {
                case StorageType.None:
                    break;
                case StorageType.Blob:
                    List<double> recievedx = new List<double>();
                    List<double> recievedy = new List<double>();
                    List<double> recievedv = new List<double>();

                    dataStoreExposureX.RecieveDataFromStorage<double>(key, out recievedx);
                    dataStoreExposureY.RecieveDataFromStorage<double>(key, out recievedy);
                    dataStoreExposureV.RecieveDataFromStorage<double>(key, out recievedv);

                    this.loadData.rxList = recievedx;
                    this.loadData.ryList = recievedy;
                    this.loadData.valueList = recievedv;

                    break;
                case StorageType.SQLAzure:
                    break;
                case StorageType.Table:
                    break;
            }
            return;
        }

        public override void GetEventCatalogue(string key)
        {
            switch (this.StorageType)
            {
                case StorageType.None:
                    break;
                case StorageType.Blob:

                    List<double> recievedsm = new List<double>();
                    List<double> recievedsx = new List<double>();
                    List<double> recievedsy = new List<double>();
                    List<double> recievedsd = new List<double>();

                    dataStoreCatalogSm.RecieveDataFromStorage<double>(key, out recievedsm);
                    dataStoreCatalogSx.RecieveDataFromStorage<double>(key, out recievedsx);
                    dataStoreCatalogSy.RecieveDataFromStorage<double>(key, out recievedsy);
                    dataStoreCatalogSd.RecieveDataFromStorage<double>(key, out recievedsd);

                    this.loadData.sxList = recievedsx;
                    this.loadData.syList = recievedsy;
                    this.loadData.smList= recievedsm;
                    this.loadData.sdList = recievedsd;
                    break;
                case StorageType.SQLAzure:
                    break;
                case StorageType.Table:
                    break;
            }
            return;
        }

        public override void RunModel()
        {
            this.Log.LogMessage("Start earthquake model");

            List<double> result = new List<double>();

            eQModel.RunModel(loadData.rxList,
                             loadData.ryList,
                             loadData.valueList,
                             loadData.smList,
                             loadData.sxList,
                             loadData.syList,
                             loadData.sdList,
                             out result);
            this.LossList = result;

            this.Log.LogMessage("=== Appending Model Log to Server Log ====");

            List<LogEvent> ModelLogs = eQModel.log.GetLogs();
            this.Log.Add(ModelLogs);

        }

        public override void SaveLosses(string key)
        {
            switch (this.StorageType)
            {
                case StorageType.None:
                    break;
                case StorageType.Blob:
                    this.dataStoreLosses.SendDataToStorage<double>(key,this.LossList);
                    this.resultsQueue.SendDataToStorage(key);
                    break;
                case StorageType.SQLAzure:
                    break;
                case StorageType.Table:
                    break;
            }
            return;
        }

        public override void SaveLog(string key)
        {
            switch (this.StorageType)
            {
                case StorageType.None:
                    break;
                case StorageType.Blob:
                    this.dataStoreLog.SendDataToStorage<LogEvent>(key,this.Log.GetLogs());
                    break;
                case StorageType.SQLAzure:
                    break;
                case StorageType.Table:
                    break;
            }
            return;
        }
    }
}

And the unit test worked. My next step was implement a worker role that executes the ProcessJob method and to change the unit test to look like

        [TestMethod]
        public void RunBlobModel()
        {
            BlobJobDispatcher blobJobDispatcher = new BlobJobDispatcher();
            string key = "";
            blobJobDispatcher.CreateJob(out key);
            //LoadData loadData = new LoadData();
            //JobProcessor jobProcessor = new JobProcessor(loadData);
            //jobProcessor.ProcessJob();

            blobJobDispatcher.RecieveResults(out key);
        }

I thought that making the last step of replacing the processing part with a Worker role would be a trivial step. But it wasn’t. My main problem was that when I published my worker role the role would not start. As far as I can see there are no logs to help figure out what is going wrong. So instead I found an example of a worker role that was deployable and slowly refactored this to be in a state where it would process my earthquake models. This is quite a slow process because each time I want to test if the worker role can be deployed it took around 9 minutes.