Thursday 30 December 2010

Improvements in number crunching speeds from .Net 3.5 to .Net 4

Here's something interesting: In a previous blog entry we found a 2 times speedup in the performance of a C# version of our Cat Models from .Net 3.5 to .Net 4.0. I followed up with Microsoft on this and got a very helpfull feedback from Surupa Biswas and Stephen Toub:

Hello again Nigel,

Someone on my team took a deeper look at this and was able to deterministically reproduce a ~2x speed up on his box. It looks like the primary wins are coming from the CRT, not the JIT. The 8.0/3.5 CRT was using x87 floating point instructions, but the 10.0/4.0 CRT is using SSE. The JIT uses the CRT for the Log/ Log10 and Exp calls. It seems like the SSE instructions used by the C runtime are the primary source of the wins although the JIT-ed code was better too.

.NET 3.5 also had the CRT installed in a global location (WinSxS) and hence could be affected by other applications that updated the CRT, which could explain the differences in our observations.

Thanks again for sharing your test case and reporting this to us.

Best,
Surupa


For the non technical:

JIT = Just In Time (it is the compiler that converts the IL Intermediate code (what we see as .exe file) into a native executable)
CRT = Common Runtime Library (It is all the libraries that the native code uses to interact with the hardware) This is a libray that is delivered with the framework and therefore has different versions:
.NET 2.0 -> CRT 8.0.50727.42
.NET 2.0 SP1 -> CRT 8.0.50727.762
.NET 3.5 -> CRT 9.0.21022.8
.NET 3.5 SP1 -> CRT 9.0.30729.1

I believe SSE means Streaming SIMD Extensions which are a set of processor extensions for Intel processors

From this web site http://neilkemp.us/src/sse_tutorial/sse_tutorial.html I found the following description of what SSE is
"First what the heck is SSE? Basically SSE is a collection of 128 bit CPU registers. These registers can be packed with 4 32 bit scalars after which an operation can be performed on each of the 4 elements simultaneously."

If I understand this correctly the .Net 4 takes advantage of certain processor extensions on the Intel processor via an improved version of the Common Runtime Library

Here's a great article over the CLR improvements on .Net 3.5 http://msdn.microsoft.com/en-us/magazine/dd569747.aspx

Tuesday 21 December 2010

A WCF Hosting story

I have a VS2010 project that has a WCF service that is ready to be deployed. I now want to setup IIS7 to use WAS to host the WCF components. Here are the steps needed to do this

1. Goto Control Panel/Administrative Tools/Internet Information Services (IIS) Manager
2. Browse to Sites/Default Website
3. Right mouse click and "Add Virzual Directory..."
4. In the Add Virtual Directory
   - Alias : 05
   - Physical path: C:\InetPub\wwwroot\wmi\2010\12\05

(The objective is: To have a simple URL. Have the ability to version both the service and the namespaces. Monitor when decommissioning a service so as to prevent breaking applications [In our case done via logging in a base class])

5. Browse to Application Pools
6. Right mouse click on the list box and Add Application Pool....
7. Add Application Pool
   - Name :Wmi
   - .Net Framework v4.0.30319
   - Managed pipeine mode : Integrated
8. Right mouse click on the newly created WMI service account and select Advanced Settings
9. Under Process Model click on the "..." button next to Identity
10. Select Custom account and enter the service account details

At this point you could try and publish...
i Right mouse click on the ASP.NET project containing the WCF service
ii Enter
   - Publish profile: BulsburgProfile
   - Publish method: File Sysetm
   - Target location \\chbbpresxxx\wwwroot$\wmi\2010\12\05
Then if you enter the url to the service http://chbbpresxxx/wmi/Disk.svc you will get the following error

Server Error in '/' Application.
________________________________________
Runtime Error
Description: An application error occurred on the server. The current custom error settings for this application prevent the details of the application error from being viewed remotely (for security reasons). It could, however, be viewed by browsers running on the local server machine.

Details: To enable the details of this specific error message to be viewable on remote machines, please create a <customErrors> tag within a "web.config" configuration file located in the root directory of the current web application. This <customErrors> tag should then have its "mode" attribute set to "Off".

<!-- Web.Config Configuration File -->

<configuration>
    <system.web>
        <customErrors mode="Off"/>
    </system.web>
</configuration>

Notes: The current error page you are seeing can be replaced by a custom error page by modifying the "defaultRedirect" attribute of the application's <customErrors> configuration tag to point to a custom error page URL.

<!-- Web.Config Configuration File -->

<configuration>
    <system.web>
        <customErrors mode="RemoteOnly" defaultRedirect="mycustompage.htm"/>
    </system.web>
</configuration>

So now we add the following to Web.Config and republish
    <system.web>
        <customErrors mode="Off"/>
    </system.web>

Then you will get the following error:


Server Error in '/' Application.
________________________________________
The type 'Ccp.Wcf.Wmi.Disk', provided as the Service attribute value in the ServiceHost directive, or provided in the configuration element system.serviceModel/serviceHostingEnvironment/serviceActivations could not be found.
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

The solution is to
11. Go back to the Internet Information Services (IIS) Manager
12. Right mouse click on the newly created virtual directory and select "Convert to Application"
13. - Alias: wmi
    - Application pool: wmi
    - Physical path: C:\InetPub\wwwroot\wmi\2010\12\05
Then it works...


Next I want to add the finishing touches to the web.config
    <system.serviceModel>
      <services>
        <service behaviorConfiguration="Ccp.Wcf.Wmi.DiskBehavior"
          name="Ccp.Wcf.Wmi.Disk">
          <endpoint address="" binding="wsHttpBinding" contract="Ccp.Wcf.Wmi.IDisk">
            <identity>
              <dns value="localhost" />
            </identity>
          </endpoint>
          <endpoint address="mex" binding="mexHttpBinding" contract="IMetadataExchange" />
          <host>
            <baseAddresses>
              <!-- We don't need a base address when deploying to production -->
              <!--add baseAddress="http://localhost:8731/Design_Time_Addresses/Ccp.Wcf.Wmi/Disk/" /-->
            </baseAddresses>
          </host>
        </service>
      </services>

If you happen to try the same thing on IIS6 there are some differences:

1. Inside the Internet Information Services (IIS) Manager
2. Browse to Web Service Extentions and enable ASP.NET v4.0.30319
3. Repeat the excersize of creating the application pool and virtual directory
4. Copy the bin and the web.config file from C:\InetPub\wwwroot\wmi\2010\12\05 to C:\InetPub\wwwroot\

Then when you try to do some special WMI calls you will end up with some errors like : Access is denied. (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED)). In our case the IIS6 Server is a fringe server that will be updated to Server 2008

So now taking our system administrator hat off and putting our developer hat back on. We have an client app.config file that looks like:
            <endpoint address="http://chbbpresxxx.global.partnerre.net/Wmi/2010/12/05/Disk.svc"
                binding="wsHttpBinding" bindingConfiguration="DiskwsHttp"
                contract="WmiDeployedServiceReference.IDisk" name="DiskwsHttpBalsburg">
                <identity>
                    <dns value="localhost" />
                </identity>
            </endpoint>

NB the dns entry is necessary otherwise the service will not be found

The corresponding code is as follows:


        [TestMethod]
        public void GetFreeSpaceInGBWCfDeployed_CheckDiskCapacityOnC_StringContainingDiskCapacity()
        {
            WmiDeployedServiceReference.DiskClient proxy = new WmiDeployedServiceReference.DiskClient("DiskwsHttpBalsburg");

            string diskCapacityString = "";
            string serverName = Environment.MachineName;
            string driveLetter = "C";

            diskCapacityString = proxy.GetFreeSpaceInGB(serverName, driveLetter);

            long AvailableSpaceOnC = 0;

            DriveInfo d = new DriveInfo(driveLetter);
            AvailableSpaceOnC = d.AvailableFreeSpace;
            AvailableSpaceOnC = AvailableSpaceOnC / 1024 / 1024 / 1024;

            long diskCapacityLong = Convert.ToInt64(Convert.ToDouble(diskCapacityString));

            Assert.IsFalse(diskCapacityString == "");
            Assert.IsTrue(diskCapacityLong == AvailableSpaceOnC);

        }

Now I want to use NetTcp because it is more efficient than wsHttp.
In some early iterations after adding the netTcp binding when I try http://chbbpresxxx/wmi/2010/12/05/Disk.svc
I get the error like an error like:


Server Error in '/Wmi/2010/12/05' Application.
--------------------------------------------------------------------------------

Could not find a base address that matches scheme net.tcp for the endpoint with binding NetTcpBinding. Registered base address schemes are [http].
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

To fix this error you need to add the net.tcp binding to the binding of the application in IIS.
1. Right click the virtual directory/application in IIS -> Manage application -> Advanced settings.
2. On the Enabled Protocols : http,net.tcp

I tried out the WCF configuration editor to setup a simple net-Tcp binding to the existing contract and I publish it. Then when I open the http://chbbpresxxx/wmi/2010/12/05/Disk.svc I got the error:


Cannot load the X.509 certificate identity specified in the configuration.

The solution in my case was to comment out the certificate in the web.config


          <endpoint binding="wsHttpBinding" name="DiskwsHttp" contract="Ccp.Wcf.Wmi.IDisk"
            listenUriMode="Explicit">
            <identity>
              <dns value="localhost" />
              <!--certificateReference storeName="My" storeLocation="LocalMachine"
                x509FindType="FindBySubjectDistinguishedName" /-->
            </identity>
          </endpoint>
          <endpoint address="mex" binding="mexHttpBinding" contract="IMetadataExchange"
            listenUriMode="Explicit">
            <identity>
              <!--certificateReference storeName="My" storeLocation="LocalMachine"
                x509FindType="FindBySubjectDistinguishedName" /-->
            </identity>
          </endpoint>

Alternatively you can follow the instructions in the following url to setup the certificate

http://wcfsecurity.codeplex.com/wikipage?title=How%20To%20-%20Create%20and%20Install%20Temporary%20Certificates%20in%20WCF%20for%20Message%20Security%20During%20Development&ProjectName=wcfsecurity


After some other edits for the wsHttp binding works but the net-tcp binding gives an error like

Could not connect to net.tcp://chbbpresxxx.global.partnerre.net:8731/Wmi/2010/12/05/Disk.svc. The connection attempt lasted for a time span of 00:00:01.1986602. TCP error code 10061: No connection could be made because the target machine actively refused it 10.102.65.150:8731.

This error had me stuck for a while, I tried a lot of things, all failed. Here is a list:
- Look at the Advanced settings of wwwroot and add http,net.tcp to the enabled protocols (as in the Virtual directory)
- Right mouse click on wwwroot and add Binding.
   - Modified existing binging from 808:* to 9* to 9001 and back
   - Adding net-tcp with 9001
- Under Server Manager/Configuration/Microsoft Fire Wall with.../ look at the properties and set Firewall status to off at doamin, private and public levels

The problem was the setup of Features where different. To fix this we have to put on our system administrator hat on and do the following:

1. Open the Server Manager
2. Browse to Server Manager/features
3. Add Under .Net Framework 3.0 Features (Installed)
       Under WCF Activation
       + HTTP Activation
       + Non HTTP Activations
4. The installation took some time

Now if we try the test client by:
1. D:\Program Files\Microsoft Visual Studio 10.0\Common7\IDE\WcfTestClient.exe
2. File/ Add Service
3. http://chbbpresxxx.global.partnerre.net/Wmi/2010/12/05/Disk.svc
We get the error


Error: Cannot obtain Metadata from http://chbbpresxxx.global.partnerre.net/Wmi/2010/12/05/Disk.svc
If this is a Windows (R) Communication Foundation service to which you have access, please check that you have enabled metadata publishing at the specified address.  For help enabling metadata publishing, please refer to the MSDN documentation at http://go.microsoft.com/fwlink/?LinkId=65455.WS-Metadata Exchange etc etc

I found a link that describes what is happening

http://blogs.msdn.com/b/webtopics/archive/2010/04/28/system-typeloadexception-for-system-servicemodel-activation-httpmodule-in-asp-net-4.aspx

Installation of .NET 3.5.1 WCF HTTP activation feature adds a global module for the 3.0 framework’s 'System.ServiceModel’ assembly for the type 'System.ServiceModel.Activation.HttpModule'. Since the application pool’s runtime version is v4.0, this assembly is tried to be loaded from the .NET 4 assemblies folder. Since, the definition of the 'System.ServiceModel.Activation.HttpModule’ is now moved to the “System.ServiceModel.Activation” assembly, it fails.

Following the instructions the to fix this open a command prompt:

 
C:\>cd C:\Windows\Microsoft.NET\Framework\v4.0.30319
C:\Windows\Microsoft.NET\Framework\v4.0.30319>aspnet_regiis -iru
Start installing ASP.NET (4.0.30319).
..................................
Finished installing ASP.NET (4.0.30319).

Then reboot the server...
By the way the following error happens when I tried adding a Binding in the IIS Manager for net-tcp port 9001

The protocol binding '9001' does not conform to the syntax for 'net.tcp'. The following is an example of valid 'net.tcp' protocol bindings: '808:*'.

The solution is to remove the added net-tcp binding from the IIS Server Manager.


The Final Solution with our developer hat on


We use baseAddress="net.tcp://localhost:8080/Wmi/2010/12/05/Disk.svc" in the Web.config that looks like:

<?xml version="1.0"?>
<configuration>
    <system.web>
        <compilation debug="true" targetFramework="4.0" />
      <customErrors mode="Off"/>
    </system.web>

    <system.serviceModel>
      <services>
        <service behaviorConfiguration="Ccp.Wcf.Wmi.DiskBehavior" name="Ccp.Wcf.Wmi.Disk">
          <clear />
          <endpoint binding="wsHttpBinding" name="DiskwsHttp" contract="Ccp.Wcf.Wmi.IDisk"
            listenUriMode="Explicit">
            <identity>
              <dns value="localhost" />
            </identity>
          </endpoint>
          <endpoint address="mex" binding="mexHttpBinding" contract="IMetadataExchange"
            listenUriMode="Explicit">
            <identity>
            </identity>
          </endpoint>
          <endpoint address=""
            binding="netTcpBinding"  name="DisknetTcp" contract="Ccp.Wcf.Wmi.IDisk" >
            <identity>
              <dns value="localhost" />
            </identity>
          </endpoint>
          <host>
            <baseAddresses>
              <!-- We don't need a base address when deploying to production -->
              <add baseAddress="net.tcp://localhost:8080/Wmi/2010/12/05/Disk.svc" />
            </baseAddresses>
          </host>

        </service>
      </services>
      <behaviors>
        <serviceBehaviors>
          <behavior name="Ccp.Wcf.Wmi.DiskBehavior">
            <!-- To avoid disclosing metadata information,
          set the value below to false and remove the metadata endpoint above before deployment -->
            <serviceMetadata httpGetEnabled="True"/>
            <!-- To receive exception details in faults for debugging purposes,
          set the value below to true.  Set to false before deployment
          to avoid disclosing exception information -->
            <serviceDebug includeExceptionDetailInFaults="true" />
          </behavior>
        </serviceBehaviors>
      </behaviors>
      <bindings>
        <netTcpBinding>
          <binding name="ReliableSessionBinding">
            <reliableSession ordered="false" inactivityTimeout="00:10:00" enabled="true" />
          </binding>
          <binding name ="DisknetTcp">
          </binding>
        </netTcpBinding>
      </bindings>
      <client>
        <endpoint address="net.tcp://services-tst.global.partnerre.net/Group/Common/Logging/2009/02/11/LogService.svc"
             binding="netTcpBinding" bindingConfiguration="ReliableSessionBinding"
             contract="LogServiceReference.ILogService" >
          <identity>
            <servicePrincipalName value="host/CHBBPRES563.global.partnerre.net" />
          </identity>
        </endpoint>
      </client>
    </system.serviceModel>
</configuration>

We use address="net.tcp://chbbpresxxx.global.partnerre.net/Wmi/2010/12/05/Disk.svc"in the App.config, that ends up looking like:


<?xml version="1.0" encoding="utf-8" ?>
<configuration>
    <system.serviceModel>
        <client>
            <endpoint address="http://chbbpresxxx.global.partnerre.net/Wmi/2010/12/05/Disk.svc"
                binding="wsHttpBinding" bindingConfiguration="DiskwsHttp"
                contract="WmiDeployedServiceReference.IDisk" name="DiskwsHttpBalsburg">
                <identity>
                    <dns value="localhost" />
                </identity>
            </endpoint>
            <endpoint address="http://localhost:8732/Design_Time_Addresses/Ccp.Wcf.Wmi/Disk/"
                binding="wsHttpBinding" bindingConfiguration="WSHttpBinding_IDisk"
                contract="WmiServiceReference.IDisk" name="WSHttpBinding_IDisk">
                <identity>
                    <dns value="localhost" />
                </identity>
            </endpoint>
            <endpoint address="net.tcp://localhost:8731/Wmi/2010/12/05/Disk"
                binding="netTcpBinding" contract="WmiServiceReference.IDisk"
                name="NetTcpBinding_IDisk">
                <identity>
                    <dns value="localhost" />
                </identity>
            </endpoint>
            <endpoint address="http://chbbpresxxx.global.partnerre.net/Wmi/2010/12/05/Disk.svc"
                binding="wsHttpBinding" bindingConfiguration="DiskwsHttp1"
                contract="WmiDeployedServiceReference.IDisk" name="DiskwsHttp">
                <identity>
                    <dns value="localhost" />
                </identity>
            </endpoint>
            <endpoint address="net.tcp://chbbpresxxx.global.partnerre.net/Wmi/2010/12/05/Disk.svc"
                binding="netTcpBinding"
                contract="WmiDeployedServiceReference.IDisk" name="DisknetTcp">
                <identity>
                    <dns value="localhost" />
                </identity>
            </endpoint>
        </client>
    </system.serviceModel>
</configuration>

And the unit test looks like:
        [TestMethod]
        public void GetFreeSpaceInGBWCfDeployedTcp_CheckDiskCapacityOnC_StringContainingDiskCapacity()
        {
            WmiDeployedServiceReference.DiskClient proxy = new WmiDeployedServiceReference.DiskClient("DisknetTcp");

            string diskCapacityString = "";
            string serverName = Environment.MachineName;
            string driveLetter = "C";

            diskCapacityString = proxy.GetFreeSpaceInGB(serverName, driveLetter);

            long AvailableSpaceOnC = 0;

            DriveInfo d = new DriveInfo(driveLetter);
            AvailableSpaceOnC = d.AvailableFreeSpace;
            AvailableSpaceOnC = AvailableSpaceOnC / 1024 / 1024 / 1024;

            long diskCapacityLong = Convert.ToInt64(Convert.ToDouble(diskCapacityString));

            Assert.IsFalse(diskCapacityString == "");
            Assert.IsTrue(diskCapacityLong == AvailableSpaceOnC);

        }

Thursday 25 November 2010

Number crunching

The benchmark tests that were carried out last year to include code generated with VS2010 and .Net 4.0. There was a significant improvement in speed (more than 10 times) which makes a model that has been completely written in C# about 0.7 times as fast as code written in unmanaged C++. We are looking into Intel C++ compiler options to determine if we are missing something. But the outcome is that it is no longer a clear cut decision between managed C# an unmanaged C++. The 30% drop in performance could be compensated with the business value of having all code in one domain.

For completeness: It is well know that Fortran compilers are more efficient than C and are used for the core algorithms in most super computer sites. Our test cases showed that this is 5 times faster than C#/C++ solution that we have now. If we compile everything in Intel C++ we get a 30% speed up on 32 bit and a 40% speed up on 64 bit.

Although we reserve the right to write code in the most efficient language available there is a lot of value in keeping to standard well know languages. We therefore considered 3 options:

1. Everything in C++
2. Everything in C#
3. The Event loop in C# and the location loop in C++

The idea of options 1 and 2 is to have everything in the same domain. This makes it easier to optimize both the event and the location loops. This is important for the case when we want to look into per risk policies. This is because instead of having an event loop as the outer loop it makes more sense to have the exposure location loop as the outer loop since instead of summing on an event basis we are summing on a per risk basis. The point is that the exposure table is very big and it is expensive to make copies of this data, hence it is better to change the inner and outer loop.

The decision was for option 3 for the following reasons:
- Preserves existing investment
- There will always be a need for a mathematical library, having this in unmanaged code is the optimal way from a compute point of view and allows portability (eg super computers)
- The Per Risk Policies would be handled by adding new functionality to the existing library
- The readability of the code will be improved by writing some more specific functions within the mathematical library

The discussions showed that by choosing this option we do give up the following things:
- A homogenous language where there is complete freedom over how to optimize and arrange code.
- There is a need to maintain know-how in 2 languages and there is an associated barrier for scientists to understand, debug and further develop core mathematical libraries
- IT will not be able to support C++. (This is not such a big issue because a deep domain knowledge is required which would make such collaboration difficult in the first place)

With the choice of language made we can look more seriously into the structure of base classes where code could be shared between the different peril models. The base class would consist of a template method pattern that would execute methods for:

- Loading exposure
- Disaggregation
- Running the Event Loop
- Calculating Ground up loss
- Calculating loss sigma
- Applying policy structures

The idea is that the models would override the model specific operations.

For the loading of exposure to work we need to use one exposure format.

Disaggregation: the conversion of locators into a distribution of lat long point values according to distributions like population, industry exposure, Commercial and residential distribution etc seems to be an operation that is quite independent. But the models make choices such as vulnerability, soil type etc based on geographic location. It is much easier to derive this information from a locator than to determine in which location a lat long is located. Although the building of such a database sounds a simple task there are implications on how we understand the model output. Basically it is important to have a thorough understanding of disaggregation in order to understand the model results and to make an underwriting decision that makes sense.

Looking at how we want to host the models on a server farm

We are looking into service providers that would allow us to rent virtual machines on a seasonal basis. It is not clear whether we will find a provider that is willing to do this at an appropriate price.

Our computational needs are not the same as some of the Reinsurance companies that rely completely on Vendor models. Since the license fee for the HPC with RMS is significant we don’t have an immediate advantage of using HPC for RMS. The existing Scheduler for the common platform is working well therefore there is no pressure from this side.

There is value in using a scheduler that has an industry following. There is also value in choosing a scheduler that will be commonly used in our industry. There could be value in the ability to scale out on VM Roles in the cloud. From a pure number crunching point of view the first priority should be to make the number crunching modules as efficient as possible by optimizing algorithms and choice of compiler etc.

We had a discussion over distributed computing. Algorithms such as finite element analysis need to pass an entire interface between iterations and will therefore cause a lot of messages to be passed with MPI. In this case it is really important that the compute nodes very well connected. In the case of cat modeling we are passing some totals between iterations and therefore MPI messages are very small. Therefore it might not be so essential that the compute nodes are very well connected. In the case of Windstorm models algoritm speed is disk io bound because the bottle neck is on reading storm footprint files, in which case it would make sense to partition jobs based on storm footprints.

My first steps in Azure Cloud Computing

Here's a log of my first experiments in Azure. To start of with it's good to have a plan of what to learn. This is what I came up with :

1. Development Environment
-> Now works on PDC laptop
2. Storage Model
- Blobs
- Tables
- Cache
3. How to initialize data in the cloud
4. How to extract data from the cloud
5. Simple UI
6. Data Access layer
7. Worker Roles
8. Cache
9. Test Driven Development

Setting up the development environment has some hurdles. Firstly it does not seem to work on WindowsXP and it requires IIS7. Here's a list of steps:

1. Open VS2010
2. Select the Cload Project
3. Click on Download Windows Azure Tools
Windows Azure Tools for Microsoft Visual Studio 1.2 (June 2010)
4. Click Download
5. Run
6. Run

This time it worked well. When you run an Azure application in debug mode you need to start the AppFabric service on your PC. To do this there is a component installed ion the Azure SDK section of the start menu.

I had no problems in starting the Development AppFabric but could not start the development storage

To get around this I ran c:\Program Files\Windows Azure SDK\v1.2\bin\devstore\dsservice.exe which produced an error:

A network-related or instance-specific error occured while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that the SQL Server is configured to allow remote connections.( provider SQL Network Interfaces, error 26 - Error Locating Server/Instance Specified)

I started the SQL Server Service and tried again. and I was able to run my hallo world Azure application.

The next thing I wanted to do is see how easy it would be to deploy this to Azure To do this I did the following steps

1. Right mouse click cloud project an select publish
2. create a certificate and install on pc
3. export certificate to file
4. import certificate to azure
5. create storage and servce accounts
6. This takes a long time

Then I got the warning:

WARNING: This deployment has at least one role with only one instance. We recommend that you deploy at least two instances per role to ensure high availability in case one of the instances becomes unavailable. Doing so also enables coverage of the Windows Azure Compute SLA, which guarantees 99.95% uptime. For more information please visit this link.

To correct this:
1. Goto WebRoleTable under Roles in the TableCloadService subproject
2. Increase the instance count by 1 to 2

This got rid of the error message but the package could not be deployed to staging.

Some hallo world experiments later I got a message from Microsoft:


Sehr geehrte(r) Nigel Findlater!

Diese E-Mail-Benachrichtigung soll Sie über Ihre Nutzung von Windows Azure Platform informieren. Nach unseren Aufzeichnungen hat Ihr Abonnement 125 % der Angebots-Servernutzungszeit für den aktuellen Abrechnungszeitraum überschritten. Für alle Stunden, die die im Abonnement inbegriffenen Zeiten übersteigen, fallen Standardgebühren an.

Nachstehend sind die Nutzungswerte bis zum jetzigen Zeitpunkt im Rahmen des laufenden Abrechnungszeitraums aufgeführt.

: 42.000000

Soweit dieser Wert nicht unerwartet ist, sind keine Handlungen Ihrerseits erforderlich. Sollte der Nutzungswert unerwartet sein, melden Sie sich beim Windows Azure Dev-Portal an, um Ihre derzeit beanspruchten Dienste einzusehen, und nehmen Sie die erforderlichen Änderungen an diesen Diensten vor, um Ihre Nutzung wieder auf einen Nominalwert zu bringen.

Sie können sich jederzeit beim Microsoft Online Services-Kundenportal unter https://mocp.microsoftonline.com anmelden, um die Nutzungszeiten Ihres Abonnements einzusehen. Wenn Sie hier klicken, erhalten Sie detaillierte Anweisungen dazu, wie die Nutzungsinformationen auf Ihren Rechnungen zu lesen und auszulegen sind.

Antworten Sie nicht auf diese E-Mail. Das Postfach wird nicht überwacht. Wenn Sie Kundensupport benötigen, wenden Sie sich an einen Customer Partner Care-Mitarbeiter, indem Sie auf den folgenden
8<--

Name Ressource Verbraucht Inklusive Fakturierbar Satz Betrag
Windows Azure Compute Rechenstunden 43.000000 25.000000 18.000000 0.132000 CHF 2.38
Windows Azure-Speicher Speichertransaktionen (in 10.000) 0.027800 1.000000 0.000000 0.011000 CHF 0.00

This translates to 2.50 CHF which won't break the bank. I am surprised that a couple of Hallo world applications deployed to staging could run up more than 50 hours of compute time. To be sure that the time did not build up any more I delete the application and services. Next I am going to take a much closer look at http://msdn.microsoft.com/en-us/windowsazure/ff796218.aspx and order a book.

Thursday 11 November 2010

Notes from PDC2010

High Performance Computing (HPC Server)
There is a re-insurance industrial move towards HPC Servers for modeling. RMS and AIR are planning to use HPC Servers as a platform for scheduling number crunching nat cat models.
The news is that the HPC Server Scheduler will support Azure computing nodes in the Cloud. This is very interesting for BUCat because for a large portion of the year the modeling machines are not under heavy load. The advantage of having cloud nodes is that we could scale out during the renewal seasons and scale back when we don’t need the computational power.
Since there is currently no fast connection between cloud nodes you must be careful when using MPI because the performance depends on the lowest connection. To get around this there are small medium and large hpc nodes that can ensure that processing takes place on the same box. I was talking to the ms consultant who will setup hpc for rms. He confirmed that Intel C with MPI would be a better choice of compiler than just MS C.

These Azure nodes are called “Virtual Server Role” and are virtual servers that can be RDPed into. Data can be transferred in a blob or you could host the database on site and use oData wcf services.

I had a discussion about how Sql Azure could be used to process EDM databases. SQL Azure cannot attach mdf database files therefore it is unlikely that we will use this to process EDMs. SQL Azure has some different concepts to SQL Server. For example the idea is to have elastic provisioning of databases. No VMs, no servers, Pay-as-you-go business model and Zero Physical Administration, Linear scaling through database independence. Fan-out expensive computation. For more information goto http://player.microsoftpdc.com/Session/1b08b109-c959-4470-961b-ebe8840eeb84

In the key note there was a presentation called “Pixar's RenderMan using Microsoft Azure by Chris Ford” where Pixar demonstrated their software that they use for rendering their films. Previously they needed a lot of hardware to render the films. Azure allows scalability such that additional hardware can be added or removed within minutes

HPC and Azure have 2 different perspectives on scalability. The Azure team looks at scalability being the ability to expand horizontally in an unlimited manor. The HPC team see that certain algorithms especially MPI really require strong connectivity between the nodes. To achieve this a compromise has been made where new nodes can be small (1 core), medium (4 core) or large (8 core). The Azure team don’t seem to see the need for fast network connectivity between nodes which unfortunately really limits the scalability for algorithms that use MPI. The scenario of scaling compute is quite common, therefore we can expect some movement on this front

Microsoft offers to test your application on their datacenter. They can host up to 8 customers at a time and have access to the product teams if things need tweaking or products need new features.

oData (Open Data Protocol)

oData is a layer that sits on top of EF. The idea is to produce a RESTfull web interface to the database thus making it open for all platforms. oData will include structures for relational data and a method for filtering data. According to Alex John it is not possible to expose Iqueryable functions through wcf. I need to double check this but Alex is the lead on the oData.

For more information on OData click on:




The idea behind oData is to be able to embed a query within the URL in a RESTfull way. For example:



An open protocol to allow secure API authorization in a simple and standard method from desktop and web applications.”


Entity Framework

I had lunch with the presenter of Code First with EF. This was interesting because we discussed the various methods of building applications. There was an agreement that building the database model first is a valid approach because of the life time of a datamodel exceeds that of an application. There will be some improvements made on the validation. There was a discussion if this should be made within the edm or the poco template. The idea behind the t4 template was that this would somehow make the interface to odata easier. Currently I do this with partial classes and an additional layer over EF. They said that this will improve soon. We also discussed the problems of updating the EDM after updating the database. There are some discussions around how this will improve soon. Soon there will be some additional features that add heuristics to the poco class generator. The idea is that you can set up patterns how to deal with table and field naming like tbl_tableName.

For more information on Code First with EF goto


Within the above presentation a demonstration of the use of DataAnnotations with EF was demonstrated. Here is a small extract of how this makes a POCO look like:

public class User
{
public User()
{
Chirps = new List<Chirps>();
}
public int Id { get; set; }

[StringLength(5, MinimumLength =3) ]
public string Name { get; set; }
// public ICollection<Chirp> Users { get; set; }
public virtuel ICollection<Chirp> Users { get; set; } // For lazy loading
}

public class Chirp
{
public int Id { get; set; }
[StringLength(255, MinimumLength = 1)]
public string Message { get; set; }
public DatTime When { get; set; }
public virtual User User { get; set; }

[StoreIgnore]
public int ChirpActivity {
get
{
return (from c in Chirps
where c.When >DateTime.Now.addDay(1);
select c).count();
}
}
}


Feature Voting: http://ef.mswish.net


Features (yet to come)

· Enums

· Spatial

· Alternate keys

· TVF support

· Migrations and deployment

· Performance & scalability

· Designer improvements

· LINQ improvements




Cloud Computing Workshop

I took part in a cloud computing workshop where participants shared their experiences in cloud computing. The main issues reported in this group where:

· Not knowing exactly on which server and hard disk the data is located. Some auditors need to know this because of the attach that a technician can take data by just removing the hard disk

· Making sure that data stays within the country.

· Configuration control: It is not acceptable to have the operating system update itself without proper testing (configuration management) This can be avoided by using virtual machines because the configuration of the machines is not maintained by Microsoft.

· Support

· Unclear pricing model

· Brand name

Many startup companies use this because it means practically no infrastructure is needed to be maintained in house

Later in the workshops there was some concern about the unlimited cost model. In other words there is currently no way to limit the monthly price. There was some discussion in the workshop about how this could be achieved. My suggestion was to have a burn down chart but it is open as which criteria could be used to slow down the burn down rate. There was also some discussion of what criteria could be used to create new worker instances.

Microsoft announced Visual Studio Team Foundation Server on Windows Azure

A trial can accessed from:


Security can be handled using

· Windows Live ID

· Google

· Yahoo

· CorporateID

Behind the scenes

· Job Agent as a Worker Role

· Team Build as a VM Role

· SQL Azure for tabular data, stored procedures etc

· Blob storage for files and attachments

Microsoft also announced Windows Azure Marketplace formally known as Microsoft Codename “Dallas”. This includes a Data Market currently with 35 Content partners with 60 data offerings including ESRI etc. The idea is to easily discover and explore datasets

https://datamarket.azure.com


For more information on Building, Deploying and Managing Windows Azure Applications

Wednesday 3 November 2010

An example of paired programing in making a deployment utility

When I wrote this I was sitting in an airplane on my way to the PDC and I cannot sleep. So it is time to write on my blog. Last week in Paris I had a great experience doing paired programming. Here is what we did.

We started off from some Use cases that described things that we want to automate when we make our deployment of the modeling platform. For example:

  • Asynchronous update of storm foot print files
  • Synchronous build of server
  • Synchronous deleting of databases
  • Etc

Based on these Use Cases we set to work on the sequence diagrams of what we want the deployment tool to do. We came to the notion of a task that would have inputs, results and an execute method. Then we came to the notion of an asynchronous task that would inherit from the task and would have the additional property of a list of targets. We then started to think about the components that would make up the system. Keeping simplicity in mind, or in other words to treat simplicity as a feature we decided that the deployment mechanism should not be distributed unless we find no alternative. So we came up with a TaskDispatcher that would process a series of tasks and a scheduler that could do higher level operations like schedule tasks to run.

So with this ground work we set about building the interfaces that would be needed as input to the tasks and we came up with:

namespace PartnerRe.Deployment.Contracts

    public interface IParameters
    {
        string Description { get; set; }
        string SourcePath { get; set; }
        string DestinationPath { get; set; }
        bool FailOnFirstError { get; set; }
        TimeSpan TaskTimeOut { get; set; }
    }
}

We compared this interface with the list of Use cases and found that with some interpretation from the custom task implementations this would be sufficient for inputs required by the Tasks.

Next we looked at the outputs and came up with:

namespace PartnerRe.Deployment.Contracts
{
    public interface IExecutionResults
    {
        bool Success { get; set; }
        string Message { get; set; }
        string LogFilePath { get; set; }
    }
}

Again we considered each use case in turn and came to the conclusion that this would be enough.

Next we thought about how to setup the task structures. We decided to use abstract classes for the Tasks, in this way we have a class structure that is easier to extend. Starting with the Synchonous TaskBase we came up with:

namespace PartnerRe.Deployment.Contracts
{
    public abstract class TaskBase : PartnerRe.Deployment.Contracts.ITaskBase
    {
        public TaskEnumeration Task = TaskEnumeration.None;
        public TaskType TaskType = TaskType.Synchronous;
        public IParameters Parameters;
        public IExecutionResults ExecutionResults;
        public abstract void TaskOperation();
        public void Execute()
        {
            StartTime = DateTime.Now;
            TaskOperation();
            EndTime = DateTime.Now;
        }
        public DateTime StartTime { get; private set; }
        public DateTime EndTime { get; private set; }
    }
}

The implementation of the ITaskBase interface is a little overkill. We decided we need something to describe where the task is synchronous or asynchronous as well as an enumeration to identify what the task is. The Execute function implements a template method pattern where the programmer must implement the TaskOperation Method.

The Asynchonous task looks like:

namespace PartnerRe.Deployment.Contracts
{
    public abstract class AsynchronousTaskBase : TaskBase
    {
        public IList<string> Targets;
        public string CurrentTarget { get; set; }
        public AsynchronousTaskBase()
            : base()
        {
            this.TaskType = Contracts.TaskType.Asynchrounous;
        } 
    }
}

Notice we have the additional list of targets and CurrentTarget properties. We thought for a while over whether we would rather implement a list of tasks. We decided that the above was simpler because all we are interested in is a list of results and not a list of tasks that also include the Input parameters and Execution methods. Our idea is to setup one task with place holders for the list of targets

Next we want to implement the Task dispatcher and we came up with the following

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using PartnerRe.Deployment.Contracts;
using System.Threading.Tasks;
using System.Threading;

namespace PartnerRe.Deployment
{
   public class TaskDispatcher
   {
      private List<object> taskDaisyChain;
      private List<object> targets;

      public TaskDispatcher()
      {
          taskDaisyChain = new List<object>;
      }
      public void AddTask<T>(T task)

{
            this.taskDaisyChain.Add(task);
        }
        public void ExecuteTasks()
        {
            for (int i = 0; i < taskDaisyChain.Count; i++)
            {
                TaskBase currentTask = this.taskDaisyChain[i] as TaskBase;

                //object task = this.TaskDaisyChain[i];
                if (currentTask.TaskType == TaskType.Synchronous)
                {
                    currentTask.Execute();
                }
                if (currentTask.TaskType == TaskType.Asynchrounous)
                {
                    if (targets == null)
                    {
                        throw new ArgumentException("Asynchronous tasks needs a list of targets to run on");
                    }
                    if (targets.Count == 0)
                    {
                        throw new ArgumentException("Asynchronous tasks needs a list of targets to run on");
                    }

                    AsynchronousTaskBase taskAsynchronous = this.taskDaisyChain[i] as AsynchronousTaskBase;
                    taskAsynchronous.Targets = targets;

                    IExecutionResults[] executionResults = new IExecutionResults[targets.Count];

                    Parallel.For(0, targets.Count, x =>
                    {
                        TaskFactory taskFactory = new TaskFactory();
                        AsynchronousTaskBase taskParallel = taskFactory.CreateTask(taskAsynchronous.Task) as AsynchronousTaskBase;
                        taskParallel.Parameters = taskAsynchronous.Parameters;
                        taskParallel.CurrentTarget = taskAsynchronous.Targets[x];
                        taskParallel.Targets = taskAsynchronous.Targets;
                        taskParallel.Execute();
                        lock (executionResults)
                        {
                            executionResults[x] = taskParallel.ExecutionResults;
                        } 
                    }
                    );

                    taskAsynchronous.ExecutionResults.Message = "";
                    taskAsynchronous.ExecutionResults.Success = true;
                    for (int j = 0; j < targets.Count; j++)
                    {
                        taskAsynchronous.ExecutionResults.Message += executionResults[j].Message;
                        if (!executionResults[j].Success)
                        {
                            taskAsynchronous.ExecutionResults.Success = false;
                        }
                    } 
                }
            } 
        }
        public void AddListOfTargets(List<string> Targets)
        {
            this.targets = Targets;
        } 
    }
}

We decided that we would need a queue of tasks implement above as a daisy chain. There is some scope for refactoring for example we have implemented another list of targets here. But the design principles are sound. Also probably the lock in the parallel for is over kill but it is a shared variable and in this case the time to execute the tasks are far greater than the time lost in locking the results. The syntax around the parallel for was a little different, we did not find many examples for to implement an itterator.

We implemented a one way file synchronization task that looks like:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using PartnerRe.Deployment.Contracts;
using System.IO;
using System.Diagnostics;
using System.ComponentModel;
using System.ComponentModel.Composition;

namespace PartnerRe.Deployment
{
    [Export(typeof(TaskBase))]
    public class SynchronizeDirectoryTask : AsynchronousTaskBase
    {

        private string WorkingDirectory = "C:\\Program Files\\Windows Resource Kits\\Tools";

        public SynchronizeDirectoryTask() : base()
        {
            this.Task = TaskEnumeration.SynchronizeDirectory;
            this.Parameters = new Parameters();
            this.ExecutionResults = new ExecutionResults();
        }

        public override void TaskOperation()
        {
            if (this.Task == TaskEnumeration.None)
            {
                throw new Exception("The programmer forgot to set the Task enumeration in the Task constructor");
            }
//… Lots more tests with ArgumentExceptions

            //C:\Program Files\Windows Resource Kits\Tools\Robocopy.exe
            // robocopy Source Destination *.* /XO

            this.Parameters.DestinationPath = this.Parameters.DestinationPath.Replace("<SERVER>", this.CurrentTarget);

            ProcessStartInfo pCopy = new ProcessStartInfo();
            pCopy.WorkingDirectory = WorkingDirectory;
            pCopy.FileName = "Robocopy.exe";
            pCopy.UseShellExecute = false;
            pCopy.RedirectStandardOutput = true;
            pCopy.RedirectStandardError = true;
            pCopy.Arguments = this.Parameters.SourcePath + " "+this.Parameters.DestinationPath+" *.* /XO";
            Process proc = Process.Start(pCopy);

            string output = proc.StandardOutput.ReadToEnd();
            proc.WaitForExit();

            string error = proc.StandardError.ReadToEnd();
            proc.WaitForExit();

            this.ExecutionResults.Message = output;
            this.ExecutionResults.LogFilePath = "\\\\" + this.CurrentTarget + "\\c$\\MyLog";
            this.ExecutionResults.Success = true;

        }
    }
}

Here we wanted to program as little as possible, so we took Robocopy and redirected the standard output. We also implemented a MEF Export. The corresponding object factory looks like:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using PartnerRe.Deployment.Contracts;
using System.ComponentModel;
using System.ComponentModel.Composition;
using System.ComponentModel.Composition.Hosting;

namespace PartnerRe.Deployment
{
    public class TaskFactory
    {
        public TaskFactory()
        {
            DirectoryCatalog directoryCatalog = new DirectoryCatalog(@".");
            CompositionBatch batch = new CompositionBatch();
            batch.AddPart(this);
            CompositionContainer container = new CompositionContainer(directoryCatalog);
            container.Compose(batch);
        }

        [ImportMany(typeof(TaskBase))]
        private IEnumerable<TaskBase> Tasks;

        public TaskBase CreateTask(TaskEnumeration TaskType)
        {
            foreach(TaskBase t in Tasks)
            {
                if (t.Task == TaskType)
                {
                    switch (t.TaskType)
                    {
                        case Contracts.TaskType.Asynchrounous:
                            {
                                return t as AsynchronousTaskBase;
                            }
                        case Contracts.TaskType.Synchronous:
                            {
                                return t as TaskBase;
                            }
                    }
                }
            }
            throw new Exception("This task has not yet been implemented in the TaskFactory");
        } 
    }
}

The idea is that when we add new tasks we want them to be imported automatically into our application with the least amount of manual programming as possible. In this case we only need to add to the TaskEnumeration each time we add a new task.

We developed using Test Driven Development, this was used as a way to design the API. In the end our test looked like:

[TestMethod]
public void SynchronizeDirectory_UnsynchronizedFiles_TargetFilesSyncronized()
{
    // Arrange

    DestinationDir = "\\\\<SERVER>\\D$\\SOURCE\\PartnerRe.Deployment\\TestFiles\\Source";

    TaskDispatcher taskDispatcher = new TaskDispatcher();
    SynchronizeDirectoryTask synchronizeDirectoryTask = new SynchronizeDirectoryTask();
    synchronizeDirectoryTask.Parameters.DestinationPath = DestinationDir;
    synchronizeDirectoryTask.Parameters.SourcePath = SourceDir;

    taskDispatcher.AddTask<TaskBase>(synchronizeDirectoryTask);
    List<string> Targets = new List<string>();

    Targets.Add("CHZUPRELR886W9X");
    //Targets.Add("Server2");
    taskDispatcher.AddListOfTargets(Targets);

    // Act
    taskDispatcher.ExecuteTasks();

    //// Assert
    Assert.IsTrue(synchronizeDirectoryTask.ExecutionResults.Success);

}

As can be seen the test is not complete, but the method is sound.

In this case the experience I made with Paired programming showed it is an efficient method of programming. It has the advantage that it integrates at least 2 people directly into the decision making process of building and refactoring an architecture while sharing knowledge. The only time that I found paired programming not to be appropriate is when the programming method is unclear, for example when intensive googling is needed to resolve a small detail.

Wednesday 22 September 2010

Making SQL Queries statement act on the same database across multiple servers

Here’s a useful way to apply queries across a number of sql 2008 servers

1. Open Microsoft SQL Server Management Studio
2. Under Database Engine select Central Management Servers
   (It is on the tab "Registered Servers" next to the tab "Object Explorer"
3. Right Mouse click "Central Management Servers" and select Register Central Management Server...
4. Choose the Server that you want to add in the "New Server Registration dialog"
   This has the effect of adding a node directly under "Central Management Server". This is used to host the Server groups
5. Right Mouse click on the new node and select "New Server Group..."
6. Give a group name
7. Right mouse click on the new group and select "New Server Registration..."
8. Select the servers that you want to be part of the group

This completes the setup.

To create queries that act across the registered servers
1. Right mouse click on a server group
2. "New Query"

When you execute the query you will receive the results with an additional "Server Name" column giving the result of that query on that server

Setting up Transaction log shipping

We have a high performance physical sql server that we want to backup via netApp shares. To get this to work we setup a second virtual sql server that uses netapp shares which are backup using the Snap manager tool.

Data is sent to the virtual sql server using Transaction log shipping. Here is how to setup transaction log shipping

1. Create a Backup of the database making shure that:
   - Recovery model = Full
   - Backup type = Full
2. Restore this backup onto the virtual sql server choosing the second option:
   - Leave the database non-operational, and do not roll back transactions. Additional transaction logs can be restored (RESTORE WITH NORECOVERY)
3. After the backup is complete there is a green upward pointing arrow on the database on the virtual server and the database has (restoring..) after it.
4. Goto the physical server, right mouse click the database and select properties
5. In the select page choose "Transaction Log Shipping"
6. tick the check box "Enable this as primary database in a log shipping configuration"
7. Click the button Backup Settings
   This will open "Transaction Log Backup Settings" dialog
8. In the Network path to backup folder give the path on the physical server where the tranaction logs are to be found eg
   \\Myserver\LOGSHIPPING$
   In the field "If the backup folder is located on the primary server, type the path to the folder"
   G:\LOGSHIPPING
   OK
9. In the Secondary server instances and databases press the Add button
10. Press the connect button and select the secondary database
11. In "Destination folder for copied files"
    G:\LOGSHIPPING
12. OK
13. Check that the backup and restore jobs created on both servers run successfully

Friday 11 June 2010

Some refactoring using E4 POCO with generics

Here is a story around refactoring some switch statements using polymorphism and some repetitive code using some generics to produce DRY code.

We has a smell in our EarthquakeSettingsLegacy class that had a switch statements that needs some refactoring to use a polymorphic approach

   private void GetVulnerability(ModelRun modelRun)
        {

            ModelType modelType;
            modelType = (ModelType)modelRun.ModelId;

            List<Ccp.Entities.VulnerabilityAggregate> vulnerabilityAggregates = databaseContext
                .Query<VulnerabilityAggregate>()
                .Where(v => v.ModelRunId.HasValue && v.ModelRunId.Value.Equals(modelRun.Id)).ToList();

            switch (modelType)
            {
                case ModelType.EarthquakeCanada:
                    GetCanadaVulnerability(modelRun, vulnerabilityAggregates);

                    break;
                case ModelType.EarthquakeMexico:
                    GetMexicoVulnerability(modelRun, vulnerabilityAggregates);
                    break;

                case ModelType.EarthquakeAfricaIndia:
                    GetAfricaToIndiaVulnerability(modelRun, vulnerabilityAggregates);
                    break;

                case ModelType.EarthquakeJapan:
                    GetJapanVulnerability(modelRun, vulnerabilityAggregates);
                    break;

                case ModelType.EarthquakeSouthAmerica:
                    GetSouthAmericaVulnerability(modelRun, vulnerabilityAggregates);
                    break;

                default:
                    return;
            }

The first step was to make the following more generic. The challenge is that we are using auto generate POCO classes, in other words we cannot simply make the classes inherit from a base class that we could use as a compile time type with generics. Another complication was that we need to use a compile time type as apposed to a run time type. Because Generics are place holders for types that are compiled at compile time.

        private void GetSouthAmericaVulnerability(ModelRun modelRun, List<Ccp.Entities.VulnerabilityAggregate> vulnerabilityAggregates)
        {
            List<Ccp.Entities.Earthquake.SouthAmericaVulnerabilityCatalogue> vaList = vulnerabilityAggregates
                .Select<Ccp.Entities.VulnerabilityAggregate, Ccp.Entities.Earthquake.SouthAmericaVulnerabilityCatalogue>(
                va => EntityMapper.Map<Ccp.Entities.VulnerabilityAggregate, Ccp.Entities.Earthquake.SouthAmericaVulnerabilityCatalogue>(va))
                .ToList();

            //Earthquake handles vulnerability functions as strings where as Tropical Cyclone handles them as integers. Therefore we need to
            //have some special code to map these together

            var result = from av in this.databaseContext.GetAll<VulnerabilityAggregate>().Where(z => z.ModelRunId.Value.Equals(modelRun.Id))
                         join vf in this.databaseContext.GetAll<VulnerabilityFunction>()
                         on av.VulnerabilityFunctionId equals vf.Id
                         select new { av.VulnerabilityFunctionId, vf.VulnerabilityFunctionString };

            var dic = result.Select(p => new { p.VulnerabilityFunctionId, p.VulnerabilityFunctionString })
                            .Distinct()
                            .AsEnumerable()
                            .ToDictionary(k => k.VulnerabilityFunctionId, v => v.VulnerabilityFunctionString);

            for (int i = 0; i < vulnerabilityAggregates.Count; i++)
            {
                String Vulnerability;
                int? VulnerabilityFunctionId = vulnerabilityAggregates[i].VulnerabilityFunctionId;
                if (dic.TryGetValue(VulnerabilityFunctionId.Value, out Vulnerability))
                {
                    List<string> vulnerabilityCoeficients;
                    vulnerabilityCoeficients = Vulnerability.Split(' ').ToList<string>();

                    vaList[i].A = Convert.ToDouble(vulnerabilityCoeficients[0]);
                    vaList[i].B = Convert.ToDouble(vulnerabilityCoeficients[1]);
                    vaList[i].C = Convert.ToDouble(vulnerabilityCoeficients[2]);
                    vaList[i].VulnerabilityFunction = Vulnerability;
                }
            }

            if (vulnerabilityAggregates.Count > 0)
            {
                using (this.databaseContextEarthquake.BeginTransaction())
                {
                    this.databaseContextEarthquake.DeleteAll<Ccp.Entities.Earthquake.SouthAmericaVulnerabilityCatalogue>();
                    this.databaseContextEarthquake.Add(vaList.ToArray());
                    this.databaseContextEarthquake.Save();

                    this.databaseContextEarthquake.CommitTransaction();
                }
            }
        }

We translated the above into the following generic function. To make the POCO DAL entities work TDestination needs to inherit from class and implement a IVulnerabilityParameter interface

        public void GetVulnerability<TDestination>(ModelRun modelRun, List<Ccp.Entities.VulnerabilityAggregate> vulnerabilityAggregates)
            where TDestination : class, IVulnerabilityParameters
        {
            List<TDestination> vaList = vulnerabilityAggregates
                .Select<Ccp.Entities.VulnerabilityAggregate, TDestination>(
                va => EntityMapper.Map<Ccp.Entities.VulnerabilityAggregate, TDestination>(va))
                .ToList();

            //Earthquake handles vulnerability functions as strings where as Tropical Cyclone handles them as integers. Therefore we need to
            //have some special code to map these together

            var result = from av in this.databaseContext.GetAll<VulnerabilityAggregate>().Where(z => z.ModelRunId.Value.Equals(modelRun.Id))
                         join vf in this.databaseContext.GetAll<VulnerabilityFunction>()
                         on av.VulnerabilityFunctionId equals vf.Id
                         select new { av.VulnerabilityFunctionId, vf.VulnerabilityFunctionString };

            var dic = result.Select(p => new { p.VulnerabilityFunctionId, p.VulnerabilityFunctionString })
                            .Distinct()
                            .AsEnumerable()
                            .ToDictionary(k => k.VulnerabilityFunctionId, v => v.VulnerabilityFunctionString);

            for (int i = 0; i < vulnerabilityAggregates.Count; i++)
            {
                String Vulnerability;
                int? VulnerabilityFunctionId = vulnerabilityAggregates[i].VulnerabilityFunctionId;
                if (dic.TryGetValue(VulnerabilityFunctionId.Value, out Vulnerability))
                {
                    List<string> vulnerabilityCoeficients;
                    vulnerabilityCoeficients = Vulnerability.Split(' ').ToList<string>();

                    vaList[i].A = Convert.ToDouble(vulnerabilityCoeficients[0]);
                    vaList[i].B = Convert.ToDouble(vulnerabilityCoeficients[1]);
                    vaList[i].C = Convert.ToDouble(vulnerabilityCoeficients[2]);
                    vaList[i].VulnerabilityFunction = Vulnerability;
                }
            }

            if (vulnerabilityAggregates.Count > 0)
            {
                using (this.databaseContextEarthquake.BeginTransaction())
                {
                    this.databaseContextEarthquake.DeleteAll<TDestination>();
                    this.databaseContextEarthquake.Add(vaList.ToArray());
                    this.databaseContextEarthquake.Save();

                    this.databaseContextEarthquake.CommitTransaction();
                }
            }
        }

The interface looks like:
namespace Ccp.Entities.Earthquake
{
    /// <summary>
    /// All the earthquake vulnerability catalogues need to implement this interface for the settings.
    /// </summary>
    public interface IVulnerabilityParameters
    {
        string VulnerabilityFunction { get; set; }
        Nullable<double> A { get; set; }
        Nullable<double> B { get; set; }
        Nullable<double> C { get; set; }
    }
}

Next we created to partial classes for each entity that we need to process.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Ccp.Entities.Earthquake
{
    public partial class SouthAmericaVulnerabilityCatalogue : IVulnerabilityParameters
    {
    }
}

This looks a little strange but the implementation of IVulnerabilityParameters is in the partial class of the POCO DAL.

We can continue to refactor the 4 link statements into one using AsEnumarble. The difference between Linq and sql is that the where clause is not used in the same way. In SQL the where clause would explicitly join a foreign key to a primary key. In Linq we want to associate to another entity the relationship v.VulnerabilityFunction, in which case you need to include any property from the child entity. In the case below this is a VulnerabilityFunctionString (as apposed to Id). The emitted sql includes an outer join. In this case the relationship is 0-1 to many which means the outer join is not so bad

            List<Ccp.Entities.TropicalCyclone.VulnerabilityAggregate> vaList =
                databaseContext
                    .Query<VulnerabilityAggregate>()
                    .Where(v => v.ModelRunId.HasValue && v.ModelRunId.Value.Equals(modelRun.Id) && v.VulnerabilityFunction.VulnerabilityFunctionString != "")
                    .AsEnumerable()
                    .Select<Ccp.Entities.VulnerabilityAggregate, Ccp.Entities.TropicalCyclone.VulnerabilityAggregate>(
                        va =>
                            {
                                var vae = EntityMapper.Map<Ccp.Entities.VulnerabilityAggregate, Ccp.Entities.TropicalCyclone.VulnerabilityAggregate>(va);
                                vae.VfId = Convert.ToInt32(va.VulnerabilityFunction.VulnerabilityFunctionString);
                                return vae;
                            })
                    .ToList();

Here’s a book recommendations for creating linq queries:
http://www.amazon.de/LINQ-Pocket-Reference-OReilly/dp/0596519249/ref=sr_1_4?ie=UTF8&s=books-intl-de&qid=1276066598&sr=1-4