Saturday 5 September 2009

Some ideas about Optimization and Restructuring

I am looking into restructuring and optimization of a catastrophe modeling platform. Here are some of the ideas that have come to mind.

The Aggregate Cat models consist of a disaggregation where the sums insured for locators are redistributed according to population fraction. Then some unmanaged mathematical routines are run that determine the loss ratio for each risk followed by the application of treaty structures. We will look into precalculating the loss ratios for each combination of locator, risk and vulnerability, in this way we can remove the disaggregation and ground up loss calculation from the productive code which will significantly improve the speed and simplify our aggregate models for all perils to run through the same code. The size of the lookup tables depend on the number of vulnerability functions. The models have various numbers of vulnerability functions meaning that the sizing of the hardware will depend on the model with the most vulnerability functions.

The optimization of the detailed models depends on the way that the loss is calculated for each peril. For example Wind storm calculations are based on storm footprint files that are a grid of information that is overlaid onto the exposure. In this case we could use the same approach as the aggregate models except we would need to determine from the Latitude and Longitude which storm footprint grid point is to be used. This means that we could use the geospatial queries that came with sql 2008. For Earthquake there are no grid points because the damage propagates in concentric circles from the earthquake. We could think of make a pseudo detailed model but there are about a thousand vulnerability functions meaning the database will be big and the time needed to precalculate the loss ratios will be long. So for Earthquake we need to optimize that mathematical component that calculates the loss ratio to run under a multi core 64 bit O/S. We will be considering:
  • Using C# with immutable data
  • Using F# or a combination of f# and C#
  • Convert the existing C code to 64bit multi core
On the non science side of the platform we are thinking of replacing the pull based job scheduling with WCF Services that take parameters such as the DLL and instructions of where to read and write data. The idea is to asynchronously call this wcf service. This means the client implements an Event handler that is called on completion. A Scheduler is used to send jobs to the WCF Services distributed on the modeling machines. This approach improves the software distribution because only the code for a very generic WCF service is deployed on the modeling machines. The DLLs doing the work are distributed at run time from the Scheduler. But the scheduler is a single point of failure, so it would be best made on the central database server. The development machine would have the code for the scheduler as well as the WCF Service for running the DLLs that are doing the work.

At the moment the Server side consists of 140 sub projects. Instead of decomposing an application into sub projects we will create a few general projects but using folders instead of sub projects. This has the advantage that the program compiles faster and executes faster. Such a project could consist of:
  • GraphicalUserInterface
  • Utilities (eg Zip, log4Net)
  • Interfaces
  • DataAccessLayer
  • BusinessObjects (Types)
  • BusinessLogic

Sharing of objects between the client and the server:
1. The Server would reference the Client projects as done now
2. We could make WCF Services hosted on IIS7 on the scheduling server
3. We could load balance WCF Services if the clients have a heavy computational demand.

We will consider the Microsoft has a load Balancing implementation as a part of Windows 2008. If we need to load balance then we should use this because programming your own has a lot of things that can go wrong.


Implementation strategy:
My approach would be to progressively transition the live system into the target architecture. Since the live system must function every step of the way has exhaustive tests that guarantee that progress is made. I find the danger of a basic rewrite is that testing will never be a thorough as on a real productive system which means when the prototype is made live there will be a lot of bugs to fix.
Logging
As we restructure we will build in a configurable logging intensity so that when something is being debugged intensive debugging is possible and during normal operation minimal debugging is made.

Here is how we are thinking of splitting the Tiers of the application
1. GUI Layer
eg Has functions like GetLossFileView etc
2. Business Layer
eg LossFileBO as entity
3. Data access layer Client
eg WCF Services using generic functions with entities defined in the Business Layer
4. Data access layer server
eg WCF Services using generic functions directly accessing the database.
The DAL would use a Factory using the Type Off construct that will enable intellisence through the entity data types

Versioning
1. Using MEF: Probably not worthwhile for PRECCP, because this would require very careful source control branching and consequently the maintenance work increases
2. Using an interface and built in database functions: Depends on the level at which you want to track versions. Eg it does not make sense to track the version of events in loss files directly.
3. Just keep track of versions: This is the simplest method and what we are doing now. Currently if you ask underwriters the only versions that are interesting are the current version, the previous version and the next version.