Thursday 11 November 2010

Notes from PDC2010

High Performance Computing (HPC Server)
There is a re-insurance industrial move towards HPC Servers for modeling. RMS and AIR are planning to use HPC Servers as a platform for scheduling number crunching nat cat models.
The news is that the HPC Server Scheduler will support Azure computing nodes in the Cloud. This is very interesting for BUCat because for a large portion of the year the modeling machines are not under heavy load. The advantage of having cloud nodes is that we could scale out during the renewal seasons and scale back when we don’t need the computational power.
Since there is currently no fast connection between cloud nodes you must be careful when using MPI because the performance depends on the lowest connection. To get around this there are small medium and large hpc nodes that can ensure that processing takes place on the same box. I was talking to the ms consultant who will setup hpc for rms. He confirmed that Intel C with MPI would be a better choice of compiler than just MS C.

These Azure nodes are called “Virtual Server Role” and are virtual servers that can be RDPed into. Data can be transferred in a blob or you could host the database on site and use oData wcf services.

I had a discussion about how Sql Azure could be used to process EDM databases. SQL Azure cannot attach mdf database files therefore it is unlikely that we will use this to process EDMs. SQL Azure has some different concepts to SQL Server. For example the idea is to have elastic provisioning of databases. No VMs, no servers, Pay-as-you-go business model and Zero Physical Administration, Linear scaling through database independence. Fan-out expensive computation. For more information goto http://player.microsoftpdc.com/Session/1b08b109-c959-4470-961b-ebe8840eeb84

In the key note there was a presentation called “Pixar's RenderMan using Microsoft Azure by Chris Ford” where Pixar demonstrated their software that they use for rendering their films. Previously they needed a lot of hardware to render the films. Azure allows scalability such that additional hardware can be added or removed within minutes

HPC and Azure have 2 different perspectives on scalability. The Azure team looks at scalability being the ability to expand horizontally in an unlimited manor. The HPC team see that certain algorithms especially MPI really require strong connectivity between the nodes. To achieve this a compromise has been made where new nodes can be small (1 core), medium (4 core) or large (8 core). The Azure team don’t seem to see the need for fast network connectivity between nodes which unfortunately really limits the scalability for algorithms that use MPI. The scenario of scaling compute is quite common, therefore we can expect some movement on this front

Microsoft offers to test your application on their datacenter. They can host up to 8 customers at a time and have access to the product teams if things need tweaking or products need new features.

oData (Open Data Protocol)

oData is a layer that sits on top of EF. The idea is to produce a RESTfull web interface to the database thus making it open for all platforms. oData will include structures for relational data and a method for filtering data. According to Alex John it is not possible to expose Iqueryable functions through wcf. I need to double check this but Alex is the lead on the oData.

For more information on OData click on:




The idea behind oData is to be able to embed a query within the URL in a RESTfull way. For example:



An open protocol to allow secure API authorization in a simple and standard method from desktop and web applications.”


Entity Framework

I had lunch with the presenter of Code First with EF. This was interesting because we discussed the various methods of building applications. There was an agreement that building the database model first is a valid approach because of the life time of a datamodel exceeds that of an application. There will be some improvements made on the validation. There was a discussion if this should be made within the edm or the poco template. The idea behind the t4 template was that this would somehow make the interface to odata easier. Currently I do this with partial classes and an additional layer over EF. They said that this will improve soon. We also discussed the problems of updating the EDM after updating the database. There are some discussions around how this will improve soon. Soon there will be some additional features that add heuristics to the poco class generator. The idea is that you can set up patterns how to deal with table and field naming like tbl_tableName.

For more information on Code First with EF goto


Within the above presentation a demonstration of the use of DataAnnotations with EF was demonstrated. Here is a small extract of how this makes a POCO look like:

public class User
{
public User()
{
Chirps = new List<Chirps>();
}
public int Id { get; set; }

[StringLength(5, MinimumLength =3) ]
public string Name { get; set; }
// public ICollection<Chirp> Users { get; set; }
public virtuel ICollection<Chirp> Users { get; set; } // For lazy loading
}

public class Chirp
{
public int Id { get; set; }
[StringLength(255, MinimumLength = 1)]
public string Message { get; set; }
public DatTime When { get; set; }
public virtual User User { get; set; }

[StoreIgnore]
public int ChirpActivity {
get
{
return (from c in Chirps
where c.When >DateTime.Now.addDay(1);
select c).count();
}
}
}


Feature Voting: http://ef.mswish.net


Features (yet to come)

· Enums

· Spatial

· Alternate keys

· TVF support

· Migrations and deployment

· Performance & scalability

· Designer improvements

· LINQ improvements




Cloud Computing Workshop

I took part in a cloud computing workshop where participants shared their experiences in cloud computing. The main issues reported in this group where:

· Not knowing exactly on which server and hard disk the data is located. Some auditors need to know this because of the attach that a technician can take data by just removing the hard disk

· Making sure that data stays within the country.

· Configuration control: It is not acceptable to have the operating system update itself without proper testing (configuration management) This can be avoided by using virtual machines because the configuration of the machines is not maintained by Microsoft.

· Support

· Unclear pricing model

· Brand name

Many startup companies use this because it means practically no infrastructure is needed to be maintained in house

Later in the workshops there was some concern about the unlimited cost model. In other words there is currently no way to limit the monthly price. There was some discussion in the workshop about how this could be achieved. My suggestion was to have a burn down chart but it is open as which criteria could be used to slow down the burn down rate. There was also some discussion of what criteria could be used to create new worker instances.

Microsoft announced Visual Studio Team Foundation Server on Windows Azure

A trial can accessed from:


Security can be handled using

· Windows Live ID

· Google

· Yahoo

· CorporateID

Behind the scenes

· Job Agent as a Worker Role

· Team Build as a VM Role

· SQL Azure for tabular data, stored procedures etc

· Blob storage for files and attachments

Microsoft also announced Windows Azure Marketplace formally known as Microsoft Codename “Dallas”. This includes a Data Market currently with 35 Content partners with 60 data offerings including ESRI etc. The idea is to easily discover and explore datasets

https://datamarket.azure.com


For more information on Building, Deploying and Managing Windows Azure Applications