Nigel Findlater: July 2013

Monday, 15 July 2013

Notes from Build 2013 HTML5 vs XAML, Neural Networks and security

I met some people that where very pro HTML5 seeing this as the future. I met some people that where very pro XAML and see this as the future. After the silent demise of Silverlight it’s a bit difficult to know where the future is.

On the HTML5 front I intend to catch up on subjects such as Bootstrap (MVC5), TypeScript, Reactive Binding (MDV ECMA), ShadowOn, Angular, Bower, Command.js, Node/Grat, Handlebars.js...

I heard a rumor that out of 10 projects MS started in HTML5 7 have been rewritten in XAML. The pragmatic approach of hybrid solutions is the way to go. Use HTML5 when it makes sense but be aware there is a cost associated with its use. WPF is a more elegant solution because it uses OO and properly separates concerns. But when you need customer reach then HTML5 is the way to go but be careful that the target customers have browsers capable of handling html5,

A friend of mine will be working together with xamarin to provide a VS template with MVVM Light for ios native apps with portable c# libraries

Here’s a session that I can highly recommend

4-554 Building Big: Lessons Learned from Windows Azure Customers

http://channel9.msdn.com/Events/Build/2013/4-554

It’s not likely that I will be writing software that needs to scale in quite the same way as described in this session. The real life examples where very interesting and definitely worth watching.

Another session that is definitely worth watching is

AZ-18 Securing Windows Store Applications and REST services with Active Directory

http://channel9.msdn.com/Events/Build/2013/3-518

The talk was arrange around a story and was very good:

The story started around an isolated corporate network that had users, resources and access control that could be administered easily.
Then along comes an external resource that needs to be accessed by domain users and the administrator looks a little less happy
Next external users need to access domain resources which really upsets the administrator
Finally BYOD need to access domain resources (Vittorio then drew the picture of the screem)

REST OAuth2

A user enters a code on an authorization endpoint
The user reçoives a code
The user sends this code to a token endpoint and receives am Authorization Token
This Token can then be used to access external resources
There is a Reentry token that allows the authorization token to be cached for a limited period of time

Windows Azure Active Directory

This can be stand alone or a synchronized part of an on premise AD directory
Supports OAuth2, SAML-P, WS-Federation and has metadata end points
There is a OneClient preview in the Azure portal that is used to maintain the Azure AD
Windows Azure Authentication Library (AAL)
See presentation for links how to use this

Essential AAL Ussage

Authenticate the user to get a token:
AuthenticationContext aCtx= new AuthenticationContext(

https://login.windows.net/contoso.onmicrosoft.com);

AuthenticationResult = result = await authorizationContext.AquireTokenAsync(

http://host.com/shipmentservice, clientId);
Use the token to invoke a REST service
HttpClient httpClient = new HttpClient();

httpClient.DefaultRequestHeaders.Authorization = new AuthorizationHeaderValue("Bearer", result.AccessToken);

Although I don’t have a direct application of Neuron networks this talk was really well presented:

2-401 Developing Neural Networks Using Visual Studio

http://channel9.msdn.com/Events/Build/2013/2-401

Agenda

What types of problems does a neural network solve
What exactly is a neural network
How does a neural network actual work
Understanding activation functions
Alternatives to neural networks
Understanding neural networks training
Neural network over-fitting
Developing with Visual Studio
Summary and resources

What types of problems does a neural network solve

Tabular information where you have some inputs (independent variables) to produce an output (the thing you want to predict). The idea is that you have some training data that is used to fit internal variables of the neural network after which you have a system that can predict an output from a given set of inputs

What exactly is a neural network

The inputs are normalized, Boolean variables are converted to -1 and +1, enumerations to a set of individual inputs that are set to 0 or 1. There are then used in the input nodes of the neural network. Then a to be determined number of hidden nodes evaluate a function based on all these inputs to produce a set of output nodes.

Activation Functions

Logistic sigmoid output between [0,1] y=1.0/(1.0 +exp(x))
Hyperbolic tangent output value between [-1, +1] y = tanh(x) = (ex - e-x)/(ex + e-x)
Heaviside step output value between [0,1] if (x<0 else="" if="" then="" x="" y="0">=0) then y=1
Softmax output between [0,1] and sum to 1.0 y=(e-x)/Sum(e-xj)

The ability to customize these functions means that it is often better top write your own neural network

Alternatives to neural networks

Linear regression y = a x1 + bx2 + .....
Logistic regression y = 1.o/(1,0+e-(ax1 + bx2 + ... + k))
Naive Bayes: assumes input data are all independent and output is binary
Decision trees
Support vector machines: extremely complex implementation, assumes binary output

Neural networks pros and cons

Pro: can model any underlying math equation!
Pro: can handle multinomial output without resorting to tricks.
Con: moderate complexity, requires lots of training data.
Con: must pick number hidden nodes, activation functions, input/output encoding, error definition.
Con: must pick training method, training “free parameters,”
(and over-fitting defense strategy).

Training

Back-propagation

Fastest technique.

Does not work with Heaviside activation.

Requires “learning rate” and “momentum.”

Genetic algorithm

Slowest technique.

Generally most effective.

Requires “population size,” “mutation rate,” “max generations,” “selection probability.”

Particle swarm optimization

Good compromise.

Requires “number particles,” “max iterations,” “cognitive weight,” “social weight.”

Avoiding Over-fitting

What is it?

Symptom: Model is great on predicting existing data, but fails miserably on new data.

Roulette example: red, red, black, red, red, black, red, red, black, red, red, ??

A serious problem for all classification/prediction techniques, not just neural networks.

Five most common techniques

Use lots of training data.

Train-Validate-Test (early stop when error on validate set begins to increase).

K-fold cross validation.

Repeated sub-sampling validation.

Jittering: deliberately adding noise data to make over-fitting impossible.

Quite a few exotic techniques also available (weight penalties, Bayesian learning, etc.).

Summary

Existing neural network tools are difficult or impossible to integrate into a software system.

Commercial and Open Source API libraries work well for some machine learning tasks but are extremely limited for neural networks.

To develop neural networks using Visual Studio you must understand seven core concepts: feed-forward, activation, data encoding, error, training, free parameters, and over-fitting.

Once the concepts are mastered, implementation with Visual Studio

is not difficult (but not easy either).

Monday, 8 July 2013

Notes from Build 2013 Scaling the real time web with ASP.NET SignalR

ASP.NET and SignalR have some improvements. For example SignalR is now being used in the Visual Studio development experience making it possible for edits to be propagated across browsers without referh. Here are some notes from Damian Edwards presentation :

3-502 Scaling the real time web with ASP.NET SignalR

http://channel9.msdn.com/Events/Build/2013/3-502

See www.asp.net/signalr for getting started

· Scaling real-time traffic shares many considerations with traditional web traffic eg CPU, bandwidth, memory

· Application scenarios have huge impact on scaling patterns.

· Big difference is in concurrency, supporting many long running idle and active connections vs short requests

· Different SignalR transports have different overheads

General things to watch out for:

Blocking calls eg block I/O

· Never ever block a Hub method, it jams up pipes

· Use 4.5 async where possible

Sending large messages

· Memory leaks caused by misunderstanding SignalR object lifetime eg Hub instances

· Session - don't use it from SignalR. Instead use Hub state, cookies, browser storage, database etc. instead

Remember the secret of scale "Have your app do as little as possible. If you do nothing, you can scale infinitely" - Scott Hanselman

SignalR core architecture: Pub/Sub

1. Publisher

Message serialized and saved to cache associated with Signal, topic is marked for delivery

2. Message Cache

3. Worker

Worker is scheduled for signal, selects a waiting subscriber, retrieves message from cache

Worker sends message to client as bytes over transport

4. Client

Pattern 1 Sever broadcast

Low rate, message to all clients

Low rate broadcast of the same payload to clients

One message buss sends maps to many users (fan out)

More clients don't increase bus traffic

eg application wide alerts

Pattern 2 Server Push

Low rate, message to unique clients)

Low rate broadcast of the unique payload to each client

One message bus sends maps to one user (no fan out)

More clients means more message bus traffic

eg Job monitor

Pattern 3 User event driven

Broadcast on client action

Broadcast on client actions

One message bus send maps too many users (fan out)

More clients means more message traffic

eg Chat

Pattern 4 High frequency real-time

Fixed high rate, unique message

Fixed high rate broadcast from servers and clients (don't go above 25Hz

One message bus sends maps to one user (no fan out)

More clients means more message traffic

eg Gaming

Demo

There are command line utilities for Signal R in Microsoft.AspNt.SignalR.Utils. These include things like

signalR [args]

· ipc Installs SignalR performance counters

· upc Uninstalls SignalR performance counters

· ghp Generates HubProxy JavaScript files for server Hub classes

To generate load use crank.exe /url:http://localhost:29573/TestConnection /Clients:100 /BatchSize:200

For Isolated scenarios in-proc use Stress (Microsoft.AspNet.SignalR.Stress\bin\Debug)

Adjust IIS Settings for concurrency limits

There was something said about Max concurrent requests per CPU

appcmd.exe set config /section:system.webserver/serverRuntime /appConcurrentRequestLimit:100000

Scale-out Issues

· Client transience: How many messages one server gets to the other servers in my web farm

· Client transience: When does a client disconnect from my App

· Client transience: How do I avoid duplicate and missed messages as I move from one server to another

· Client distribution: What happens if one function on one server is called many more times than others

To solve these issues use SQL Server, Redis & Windows Azure Service Bus (from NuGet). This works well for Server Broadcast pattern but limited for others because every message goes to every server, therefor as traffic increases you are limited by how fast any web server can pull messages off the backplane.

Back planes are much slower than single server

microsoft.aspnet.signalr.redis

public static void Start()

{

GlobalHost.DependencyResolver.UseRedis("localhost", 6379, "", "build");

RouteTable.Routes.MapHubs();

}

Custom scale-out

· Common Server

· Specific Server

· Filtered message bus

· Server transition

· Hybrid

2.0 Scale out improvements

· Support for pre serialized messages

· Support for single serialization when sending multiple messages

Resources

· www.asp.net/signalr

· www.campusmvp.net/signalr-ebook

· github.com/signalr/signalr

· twitter.com/damianedwards

Tuesday, 2 July 2013

Notes from Build 2013 Cloud based performance testing

I have just come back from Build 2013 where I covered a lot of subjects. Over the next few weeks I am going to revisit my notes so that I can retain more of what I learned at this conferance. So here are my first set of notes from a session 2-346 Cloud powered Load testing with Team Foundation Service. To watch the session follow this url http://channel9.msdn.com/Events/Build/2013/2-346
Ankit Saraf @vauntgarde
Some terms:

Performance: How is my application's behavior

Load: How will my application behave in Production

Stress: Can my application handle a lot of users

Scale/Capacity: How many servers do I need

In VS2013 there is a new project template called "Web Performance and Load Test Project". This project has a "Web test recorder" that records the interactions that a user makes while stripping out extra tags / cookies in the urls etc. These form tests that can be replayed. The tests can be parameterized using Context Parameters eg:

WebServer1=http://demomusicstore.cloudapp.net

{{WebServer1}}/

The recorded sessions can also be used to Generate code.

There is a New Load Test Wizard that has a number of steps to specify a load test. These include

Constant Load specifying the number of users

Step load specifying:

Start user count

Step duration

Step user count

Maximum user count

How should the test mix be modeled:

Based on the total number of tests

Based on the number of virtual users

Based on user pace

Based on sequential test order

The relative proportions of individual tests in a given load test can be specified.

A warm-up duration and run duration can be specified or Tests iterations

The Load test scenarios are presented in a tree view and the results in an html format

Nigel Findlater

Monday, 15 July 2013

Notes from Build 2013 HTML5 vs XAML, Neural Networks and security

Monday, 8 July 2013

Notes from Build 2013 Scaling the real time web with ASP.NET SignalR

Tuesday, 2 July 2013

Notes from Build 2013 Cloud based performance testing

Followers

Blog Archive

Software Engineer