Saturday, 19 March 2016

Using R to visualize data with GoogleEarth

Recently I had to analyse a file with 3.5 million rows of data. The reason why this becomes challenging is that this file cannot be opened in Excel or even notepad The first challenge was to check that we had the right ammount of data. We new the we should have 3455385. Before writing some C# utilities or importing it into SQL I decided to try R.

R is a statistics language and framework that is very popular in accedemia and over the last 7 years has increasingly being used. At the Build last year it was announced that R was to become a part of SQL2016. Also R is being used in Machine Learning in Azure. So it looks worth a closer look

So the first task to determine the total number of rows. This can be done as below

> frame2 <- read.csv("C:\\DAT203xLabfiles\\Jba_FL_USD.txt", header = TRUE, stringsAsFactors = FALSE)
> nrow(frame2)
[1] 3455385

We load a DataFrame using read.csv and assign this to a variable frame2. This takes a couble of minutes. A Dataframe is like a table, internally it is a list of vectors. Each vector can have different data types, the stringsAsFactors=FALSE stops the default behaviour of reading strings as a Factor (enumeration in C#). The total number of rows can then be found using nrow.

To see the data we can use the head function to get the first 5 rows of data.

> head(frame2,n=5)

If frame2 was a sql table In SQL this would be select top 5 * from frame2

In the case of 3.5 Million rows the file was still small enough to fit into memory. Apparently there are ways of handling truely huge files aka big data. In anycase my data is small enough to be handled in memory so I don't have to wory about that yet.

Now we would like to query this data to learn more about it's data quality

We can see how many different countries are involved by

> country<-unique(frame2$Country)
> country
[1] "ENG" "WAL" "SCO" "NIR" "XXX"

The corresponding sql would be select distinct country from frame2

So there are some lines with unknown countries.

> dataUK <- frame2[which(frame2$Country!="XXX"),]
> nrow(dataUK)
[1] 3419335

> nrow(frame2) - nrow(dataUK)
[1] 36050

Next we notice that not all data has Lat Long data

> dataUK.LatLong <- dataUK[which(dataUK$Lat!=0),]
> nrow(dataUK.LatLong)
[1] 3303483
>
> nrow(dataUK) - nrow(dataUK.LatLong)
[1] 115852

> sumtiv <- sum(dataUK.LatLong$TIV1)
> sumtiv
[1] 1.675874e+12

There is a lot more data analysis to be done, looking at distributions of sums insured for different lines of business. Also some data has been geocoded which can also be analysed at various levels

I would like to visualize this data. Here is a way of doing this using GoogleEarth kmz files. Here's my first attempt at visualizing 3.5M locations. Not suprizingly it did not work

>colors<-colorRampPalette(c("white",'lavenderblush',"pink",'yellow','orange',"red", "brown",'forestgreen','green','cyan', "blue", "black"))(12)
>SIplot<-data.frame(latitude=dataUK.LatLong$Lat, longitude=dataUK.LatLong$Long, SI=dataUK.LatLong$TIV1)
>coordinates(SIplot) <- ~longitude + latitude
>crs.geo <- CRS("+proj=longlat +ellps=WGS84 +datum=WGS84")
>proj4string(SIplot) <- crs.geo
#plotKML(SIplot, colour_scale=colors)
>plotKML(SIplot, colour_scale=colors, altitude=50, points_names="")

Plotting the first variable on the list
KML file opened for writing...
Writing to KML...
Error: C stack usage 19923204 is too close to the limit
In addition: Warning messages:
Lost warning messages

It makes more sence to look at the top 100 locations. To do this we need to first sort the data: http://stackoverflow.com/questions/1296646/how-to-sort-a-dataframe-by-columns

> sorted <- SIplot[ with(SIplot, order(-SIplot$SI),SIplot$Logitude,SIplot$Latitude),]


colors<-colorRampPalette(c("white",'lavenderblush',"pink",'yellow','orange',"red", "brown",'forestgreen','green','cyan', "blue", "black"))(12)

#SIplot<-data.frame(latitude=dataUK.LatLong$Lat, longitude=dataUK.LatLong$Long, SI=dataUK.LatLong$TIV1)
coordinates(sorted100) <- ~longitude + latitude
crs.geo <- CRS("+proj=longlat +ellps=WGS84 +datum=WGS84")
proj4string(sorted100) <- crs.geo
#plotKML(sorted100, colour_scale=colors)
#SIplot_raster<-vect2rast(SIplot, cell.size=0.02)
plotKML(sorted100, colour_scale=colors, altitude=50, points_names="")

R has a lot of powerful libraries that plot shape files. I am impressed by how much can be done with only a few lines of code with a free library

Wednesday, 26 March 2014

WebApi2

Introduction

This is a guide how to implement Web API2. Here is a list of requirements:

·         We shall use the all the HTTP Verbs

·         When an error or a warning occurs the details shall be a part of the response

·         The implementation shall keep the server threads available for serving other requests

·         The API should be callable across multiple domains

·         Shall use Windows Authentication
Installation of WebAPI2

Here are the steps that I took in implementing WebApi from VS2012 update 4

1.    Create an MVC4 project

2.    Select the WebApi template
This results in a WebApi with EF5 and some other old package

3.    Right mouse click on the project and select Manage NuGet Packages

4.    Upgrade all does work, but be aware that MVC5 uses and older JQuery library for compatibility reasons with some of the MVC unobtrusive validation features depend on an older version of JQuery

5.    Add references to MyApplication projects
Pre.MyApplication.Contracts
Pre.MyApplication.DAL

6.    Change the web.config to add the MyApplication specific configuration sections

       <connectionStrings>

              <add name="MYAPPLICATION" connectionString="metadata=res://*/MyApplicationModel.csdl|res://*/MyApplicationModel.ssdl|res://*/MyApplicationModel.msl;provider=System.Data.SqlClient;provider connection string="data source=servername;initial catalog=MYAPPLICATION;integrated security=True;MultipleActiveResultSets=True;App=EntityFramework"" providerName="System.Data.EntityClient" />

       </connectionStrings>

       <appSettings>

              <add key="webpages:Version" value="2.0.0.0" />

              <add key="webpages:Enabled" value="false" />

              <add key="PreserveLoginUrl" value="true" />

              <add key="ClientValidationEnabled" value="true" />

              <add key="UnobtrusiveJavaScriptEnabled" value="true" />

 

              <add key="aspnet:UseTaskFriendlySynchronizationContext" value="true" />

         

          <add key="CoreDataServiceRestURL" value="http://EWP-S-PRE/EWPDataServiceRest/" />

          <add key="ApprovalWorkflowURL" value="http://localhost:18245/Approval.xamlx" />

          <add key="MyServiceWorkflowURL" value="http://localhost:18245/MyService.xamlx" />

   </appSettings>

Finally check that the sample webapi works by opening chrome going to the following url http://localhost:55528/api/values (note that the port may be different)

When you start the default website you will get the error The object has not yet been initialized. Ensure that HttpConfiguration.EnsureInitialized() is called in the application's startup code after all other initialization code.

To fix this goto Global.asax.cs and in the Application_Start:

            //WebApiConfig.Register(GlobalConfiguration.Configuration);

            GlobalConfiguration.Configure(WebApiConfig.Register);

The next message you will get is Inheritance security rules violated by type: 'System.Web.Mvc.MvcWebRazorHostFactory'. Derived types must either match the security accessibility of the base type or be less accessible.

To fix this change the version from Version=4.0.0.0 to Version5.0.0.0

 

  <system.web.webPages.razor>

    <host factoryType="System.Web.Mvc.MvcWebRazorHostFactory, System.Web.Mvc, Version=5.0.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35" />

Automated Api documentation

This works via the NuGet Package Microsoft.AspNet.WebApi.HelpPage. To get this to work do the following steps:

1.    Goto the build properties of the WebAPI project

2.    In the Output section of the Build tab enable XML documentation file

3.    Change the path from
bin\Pre.MyApplication.WebApi.XML to App_Data\Pre.MyApplication.WebApi.XML

4.    Goto Area/HelpPage/App_Start/HelpPageConfig.cs

5.    Uncomment the following line

// Uncomment the following to use the documentation from XML documentation file.

config.SetDocumentationProvider(new XmlDocumentationProvider(

HttpContext.Current.Server.MapPath("~/App_Data/Pre.MyApplication.WebApi.XML")));

 

To use the automated documentation features you need to comment the WebApi controller. Here is an example:

    ///
    /// MyApplication Users
    ///
    public class UserController : ApiController
    {

        ///
        /// Gets specified User
        ///
        /// User Id
        /// The user
        public UserModel Get(int id)
        {

            ...
Implementing return code
 

Use IHttpActionResult as in the following example

 

        [ResponseType(typeof(UserModel))]
        public IHttpActionResult Get(int id)
        {
            UserModel result = new UserModel()
            {
                ErrorLevel = "Warning",
                ErrorMessage = "Not Implemented yet!"
            };
            User u = new User() { Id = 1, ADUserName = "nfindlater",            
                                  DefaultRoutingGroupId = 1 };

            result.Data = u;
            var helper = new UrlHelper(Request);
            result.Url = helper.Link("User", new { userId = 1 });

            return Ok (result);
        }

Note that you need to specify [ResponseType(typeof(UserModel))] because otherwise the sample code will be missing the API documentation

Routing and Versioning
There are a number of way to version REST APIs. The following closely matches the existing WCF versioning policy. To set this up you need to do the following:

1.    WebApiConfig.cs file you need to add

             config.MapHttpAttributeRoutes();

 

            //This matches the url to use a controller
            config.Routes.MapHttpRoute(
                name: "DefaultApi",
                routeTemplate: "api/{controller}/{id}",
                defaults: new { id = RouteParameter.Optional }
            );


Here is an example of how to make use of these attributes 

namespace Pre.MyApplication.WebApi.Controllers

{
    ///
    /// MyApplication Users
    ///
    [RoutePrefix("2013/12/05")]    
    public class UserController : BaseApiController
    {

        ///
        /// Gets specified User
        ///
        /// User Id
        /// The user
        [Route("user/{id}")]
        [ResponseType(typeof(UserModel))]
        public IHttpActionResult Get(int id)
        {
..

        }

 


In the above example all methods in the UserController are prefixed with 2013/12/05. This provides a versioning of this controller that enables multiple versions of the User WebApi to be instanciable in the same project.
 
The Route attributes also provide much greater flexibility in the design of the REST API
 

Route
Description
[Route("patients/{id}/medications")]                        
Easier to get to "nested" resources
[Route["patients/{id:int:min(1)}"]                              
Constraints
[Route("patients/{id:int}")][Route["patients/{name:alpha}"] 
Multiple routes peraction
 
[Route("patients/{id?}")]                                           
Can apply to entire controller
 
[RoutePrefix("api/patients")]                                    
Prefix for all other routes in controller
 
[RouteArea("Admin")]                                              
Attributes for MVC Areas
 


CORS Cross Origin Resource Sharing


When a browser makes a request to a server an “Origin : url” header is sent. If the origin is different to that of the server and the server allows CORS the header returned is “Access-Control-Allowed-Origin: OrginalUrl”. Then the browser will allow the REST call to be made even though the JavaScript is hosted on a different domain

To get this to work the following needs to be done:

1.    Right Mouse click on the WebApi project and select Managed NuGet Packages

2.    Select Microsoft ASP.NET WebApi 2 Cross-Origin Support

3.    In the WebApiConfig.cs

using System.Web.Http.Cors;

..


        public static void Register(HttpConfiguration config)
            var cors = new EnableCorsAttribute("*", "*", "GET,POST");
            config.EnableCors(cors);

 

 

An alternative approach is to use the EnableCorsAttribute in the controller

    [EnableCors("*", "*", "GET,POST")]
    [RoutePrefix("2013/12/05")]    
    public class UserController : BaseApiController
    {

Authentication
Set the Website to not allow anonymous users but to use Windows Authentication. The server authenticates the user according to some configuration in the machine.config file
If the user is not authenticated then the browser prompts the user for a user name and password and authenticates the user.

To take advantage of this you can decorate your controller with attributes. Here is an example:

 

    [AllowAnonymous]
    public class HomeController : Controller
    {
        [Authorize(Roles="MyRole",Users="MyUser")];
        public ActionResult Index()
        {
            return View();
        }
    }

[This needs some further work because we would want to make a configuration file that would specify resource, action and ADGroup

JSON CamelCasing

In C# there is a PascalCasing convention. In JavaScript there is a camelCasing convention. To enable C# POCO classes that have Pascal Cased variables to be used in JavaScript with camel Casing add the following to the WebApiConfig.cs

var jsonFormatter = Config.Formatters.OfType<JsonMediaTypeFormatter>().FirstOrDefault();

 

jsonFormatter.SerializerSettings.ContractResolver = new

                            CamelCasePropertyNamesContractResolver();

Adding error information in the REST response

 This can be done by using the following base class::

    public abstract class BaseModel
    {
        public string Url { get; set; }
        public T Data { get; set; }
        public string ErrorLevel { get; set; }
        public string ErrorMessage { get; set; }
    }

 
This can be used in the following way:

namespace Pre.MyApplication.WebApi.Models
{
    public class UserModel : BaseModel<User>
    {
    }
}

        [Route("user/{id}")]
        [ResponseType(typeof(UserModel))]
        public IHttpActionResult Get(int id)
        {
            UserModel result = new UserModel()
            {
                ErrorLevel = "Warning",
                ErrorMessage = "Not Implemented yet!"
            };

            User u = new User() { Id = 1, ADUserName = "nfindlater", DefaultRoutingGroupId = 1 };
            result.Data = u;

            var helper = new UrlHelper(Request);
            result.Url = helper.Request.RequestUri.AbsoluteUri;

            return Ok (result);
        }

dsd
 UrlHelper is used to determine the url of the what ever server is used to host this REST service. The resulting response looks like:

HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Type: application/json; charset=utf-8
Expires: -1
Server: Microsoft-IIS/8.0
X-AspNet-Version: 4.0.30319
X-SourceFiles: =?UTF-8?B?RDpcTmV3IGZvbGRlclxXb3JrZmxvdyAyLjBcU291cmNlXFRBT3BzXFByZS5UQU9wc1xQcmUuVEFPcHMuV2ViQXBpXDIwMTNcMTJcMDVcVXNlclwx?=
X-Powered-By: ASP.NET
Date: Wed, 27 Nov 2013 10:11:31 GMT
Content-Length: 191

 
   ”data":
  {"id":1,
   "adUserName":"nfindlater",
   "name":null
   ,"defaultRoutingGroupId":1},
   "errorLevel":"Warning"
   ,"errorMessage":"Not Implemented yet!"}


The url is a useful piece of information. In this case it identifies where the information comes from. In a list this can be useful. The use of this style of url can be extended in paginated lists. For performance reasons you never want to dump the entire contents of a table. Instead you page through contents. This can be made using code that looks like the following:

//var totalCount = result.Count();
//var totalPages = Math.Ceiling((double)totalCount / PAGE_SIZE);

//var helper = new UrlHelper(Request);
//var prevUrl = page > 0 ? helper.Link("User", new { page = page - 1 }) : "";
//var nextUrl = page < totalPages - 1 ? helper.Link("User", new { page = page + 1 }) : "";

//var results = result.Skip(PAGE_SIZE * page)
//                       .Take(PAGE_SIZE)
//                       .ToList()
//                       .Select(f => TheModelFactory.Create(f));

Implementation of Asynchronous calls

There are a limited number of threads available for processing REST calls. To keep these free the REST functions can be asynchronous, here is an example of how this can be done:

        ///
        /// Gets specified User
        ///
        /// User Id
        /// The user
        [Route("user/{id}")]
        [ResponseType(typeof(UserModel))]
        public async Task<IHttpActionResult> Get(int id)
        {
            //http://localhost:55528/2013/12/05/user
            UserModel result = new UserModel()
            {
                ErrorLevel = "Warning",
                ErrorMessage = "Not Implemented yet!"
            };

            // Long running task goes here
            Task<User> t = new Task<User>(() => {
                    User u = new User() { Id = 1,
                                         ADUserName = "nfindlater",
                                         DefaultRoutingGroupId = 1 };
                    return u;
                    });
            t.Start();
            await t;
            result.Data = t.Result;
            var helper = new UrlHelper(Request);
            result.Url = helper.Request.RequestUri.AbsoluteUri;
            return Ok(result);
        }

Making Json the default return type
To make the default Json we can add the following to WebApiConfig.cs

// Remove XML so that all responses are made in Json
// This is important for the Chrome browser
var appXmlType = config.Formatters.XmlFormatter.SupportedMediaTypes
                   .FirstOrDefault(t => t.MediaType == "application/xml");
config.Formatters.XmlFormatter.SupportedMediaTypes.Remove(appXmlType);

WebAPI caching

When we have static data such as the UsersByTaskAndRoutingGroup we want to avoid a time consuming database hit. This can be done by caching the data on the WebAPI side. To do this do the following

1.  Install the nugget package CacheCow
>Install-Package CacheCow.Server
    2.  Decorate methods with [HttpCacheControlPolicy(true, 120)]

Here’s an example

 
        [HttpCacheControlPolicy(true, 120)]
        [Route("UsersByTaskAndRoutingGroup/{name}")]
        [ResponseType(typeof(UsersByTaskAndRoutingGroupListModel))]
        public async Task<IHttpActionResult> Get(String name)
        {
            ...
            return Ok(result);
        }

This reduced a REST call that took 2s to a call the second time around to 300ms

Exception handling
We will handle exceptions within the WebAPI project. In this way we centralize the error handling functions at a higher level. We decided to use try catch blocks in the web api methods and make the exceptions more readable using a method in the base class:

namespace Pre.MyApplication.WebApi.Controllers
{
    ///
    /// Base class for controllers needing access to the MyApplication database
    ///
    public abstract class BaseApiController : ApiController
    {

...

 

        ///
        /// This is used to help extract important information form exceptions
        ///
        /// The exception
        /// A string containing a summary of usefull debugging information
        public string MakeReadableExceptionMessage(Exception ex)
        {
            if (ex is DbEntityValidationException)
            {
                // These are errors ccoming from EF
                string outerErrors = "";
                string outerDel = "";
                DbEntityValidationException eV = ex as DbEntityValidationException;
 

                foreach (var entity in eV.EntityValidationErrors)
                {
 
                    string innerDel = "";
                    string ErrorMsg = "";
                    foreach (var e in entity.ValidationErrors)
                    {
                        ErrorMsg = e.PropertyName + " " + e.ErrorMessage;
                        outerErrors += innerDel + ErrorMsg;
                        innerDel = ",";
                    }

                    outerErrors += outerDel + ErrorMsg;
                    outerDel = ";";
                }

                return outerErrors;

            }
            else if (ex is DbUpdateException)
            {

                // These are errors coming from the database
                string Message = ex.InnerException.InnerException.ToString();
                return Message;
                //SqlException s = e.InnerException.InnerException as SqlException;
                //if (s != null && s.Number == 2627)
                //{
                //}
            }

            else
            {
                return ex.Message;
            }
        }


This is used in the WebApi in the following way:

        [HttpGet]
        [Route("MyService")]
        [ResponseType(typeof(MyServiceListModel))]
        public async Task<IHttpActionResult> Get()
        {
            //http://localhost:55528/2013/12/05/MyService
            MyServiceListModel result = new MyServiceListModel();
            var helper = new UrlHelper(Request);
            result.Url = helper.Request.RequestUri.AbsoluteUri;
            try
            {

                Task<List<MyService>> t = new Task<List<MyService>>(() =>
                {
                    List<MyService> list = new List<MyService>();
                    //throw new DivideByZeroException();
                    list = this.Db.MyServiceList();
                    return list;
                });
                t.Start();
                await t;
                Task.WaitAll(t);
 
                result.DataList = t.Result;
            }
            catch (Exception ex)
            {
                result.ErrorMessage = MakeReadableExceptionMessage(ex);
                result.ErrorLevel = "Error";
                Request.CreateResponse(HttpStatusCode.BadRequest, result);
            }
            return Ok(result);
        }

This is work in progress. We should log these errors meaning we need mockable loggers. In other words a logger that implements an ILogger interface. We can also handle un handled exceptions either adding the following to the Global.asax.cs file:

AppDomain.CurrentDomain.UnhandledException += (s, e) => {
    // This one hardly get's called at all
    // Your logging logic
};
 

AppDomain.CurrentDomain.FirstChanceException += (s, e) => {

    // This one get called quite a few times when simulating a divide by zero error
    // Your logging logic
};

 


Referencing WF services


Here are the steps needed to include asynchronous calls to the WF services

1.    Right Mouse click on References
2.    Select Add new Service Reference…
3.    Click the Discover button
4.    Select the appropriate xamlx file (for example MyWorkflow.xamlx
5.    Use an appropriate namespace eg MyServiceService
6.    Click the advanced button
7.    Check the Allow generation of asynchronous operations
8.    Select Generate Task based operations
If you get the error

Error 4 Custom tool error: Failed to generate code for the service reference 'MyServiceService'. Please check other error and warning messages for details. D:\New folder\Workflow 2.0\Source\MyApplication\Pre.MyApplication\Pre.MyApplication.WebApi\Service References\MyServiceService\Reference.svcmap 1 1 Pre.MyApplication.WebApi

Then you may need to uncheck the Reuse types in all referenced assemblies from Configure service reference option

Here’s a code snippet how to use services asynchronously:

 [HttpPost]
 [Route("MyServiceNew")]
 [ResponseType(typeof(Pre.MyApplication.WebApi.Models.MyServiceResponse))]
 public async Task<IHttpActionResult> PostMyServiceNew([FromBody]JToken data)
 {
    try
    {
    MyServiceNewRequest value = DeserializeJson<MyServiceNewRequest>(data);
 
    MyServiceService.MyServiceResponse wfResponse = new MyServiceService.MyServiceResponse();
 
     using (MyServiceService.MyServiceWFClient wfClient = new MyServiceService.MyServiceWFClient())
     {
         string url = ConfigurationManager.AppSettings["MyServiceWorkflowURL"].ToString();
 
         wfClient.Endpoint.Address = new EndpointAddress(url);
         wfResponse = await wfClient.MyServiceNewAsync(value);
     }

Deploayment 

Here is how to setup the WebAPI infrastructure

1.    First Browse to the folder that is named after the DNS entry to the web site. For example under D:\WebServices\MYAPPLICATION-S
2.    Create a subdirectory such as MYAPPLICATION_20131205
The rational is that we want to make it possible to run 2 versions of the WebApi at the same time. This would be needed for smooth migration from one version to another. The naming is choose to represent the web service and the version of the web service. The idea is that this can be shared without having to rename the share from the folder name
3.    Right mouse click the newly created folder select properties. Then select Sharing and click the Advanced sharing button
4.    Check the share folder box
5.    Click permissions and give appropriate permissions
6.    This makes it possible to reach \\MYAPPLICATION-s-tst\MYAPPLICATION_20131205
It was necessary to add MYAPPLICATION to the share name because the same Server is used to host other WebApi projects

7.    Open the Internet Information Services (IIS) Manager and browse to MYAPPLICATION-S

8.    Next right mouse click on the newly created subdirectory and Add Application
9.    Change the alias to just have YYYYMMDD The idea is that we don’t want to repeat too much information in the url later used in the WebApi



Setting up Visual studio to deploy WebAPI

First you need to setup a compiler configuration.
1.    Right mouse click on the solution and select configuration manager
2.    Click the Configuration Manager button
3.    Under the Configuration column select combo box against the project that you want to add a configuration to and Select
4.    Create a new configuration with a name like TEST and copy settings from Release and check the Create new solution configurations check box
 
5.    Right mouse click on the Web.config
6.    Add Config Transform
7.    Then you get an extra web.config

Next setting up the publish:

1.    Right mouse click on the WebApi project and select Publish

2.    Click the Select or import a publish profile and select “

3.    Give the profile a name like TEST

4.    Under Publish method select File system and enter the path to where you want to publish to

5.    Click next

6.    Change Configuration to TEST

7.    Click Publish