Monday, 15 July 2013

Notes from Build 2013 HTML5 vs XAML, Neural Networks and security


I met some people that where very pro HTML5 seeing this as the future. I met some people that where very pro XAML and see this as the future. After the silent demise of Silverlight it’s a bit difficult to know where the future is.

On the HTML5 front I intend to catch up on subjects such as Bootstrap (MVC5), TypeScript, Reactive Binding (MDV ECMA), ShadowOn, Angular, Bower, Command.js, Node/Grat, Handlebars.js...

I heard a rumor that out of 10 projects MS started in HTML5 7 have been rewritten in XAML. The pragmatic approach of hybrid solutions is the way to go. Use HTML5 when it makes sense but be aware there is a cost associated with its use. WPF is a more elegant solution because it uses OO and properly separates concerns. But when you need customer reach then HTML5 is the way to go but be careful that the target customers have browsers capable of handling html5,

A friend of mine will be working together with xamarin to provide a VS template with MVVM Light for ios native apps with portable c# libraries
 
 

Here’s a session that I can highly recommend
 
4-554 Building Big: Lessons Learned from Windows Azure Customers


It’s not likely that I will be writing software that needs to scale in quite the same way as described in this session. The real life examples where very interesting and definitely worth watching.

Another session that is definitely worth watching is

AZ-18 Securing Windows Store Applications and REST services with Active Directory



The talk was arrange around a story and was very good:

  1. The story started around an isolated corporate network that had users, resources and access control that could be administered easily.
  2. Then along comes an external resource that needs to be accessed by domain users and the administrator looks a little less happy
  3. Next external users need to access domain resources which really upsets the administrator
  4. Finally BYOD need to access domain resources (Vittorio then drew the picture of the screem)

REST OAuth2

  1. A user enters a code on an authorization endpoint
  2. The user reçoives a code
  3. The user sends this code to a token endpoint and receives am Authorization Token
  4. This Token can then be used to access external resources
  5. There is a Reentry token that allows the authorization token to be cached for a limited period of time

Windows Azure Active Directory

  • This can be stand alone or a synchronized part of an on premise AD directory
  • Supports OAuth2, SAML-P, WS-Federation and has metadata end points
  • There is a OneClient preview in the Azure portal that is used to maintain the Azure AD
  • Windows Azure Authentication Library (AAL)
    See presentation for links how to use this
     

Essential AAL Ussage
  • Authenticate the user to get a token:
    AuthenticationContext aCtx= new AuthenticationContext(
    AuthenticationResult = result = await authorizationContext.AquireTokenAsync(
  • Use the token to invoke a REST service
    HttpClient httpClient = new HttpClient();
    httpClient.DefaultRequestHeaders.Authorization = new AuthorizationHeaderValue("Bearer", result.AccessToken);
      
Although I don’t have a direct application of Neuron networks this talk was really well presented:

2-401 Developing Neural Networks Using Visual Studio

 
Agenda
 
  1. What types of problems does a neural network solve
  2. What exactly is a neural network
  3. How does a neural network actual work
  4. Understanding activation functions
  5. Alternatives to neural networks
  6. Understanding neural networks training
  7. Neural network over-fitting
  8. Developing with Visual Studio
  9. Summary and resources

What types of problems does a neural network solve

Tabular information where you have some inputs (independent variables) to produce an output (the thing you want to predict). The idea is that you have some training data that is used to fit internal variables of the neural network after which you have a system that can predict an output from a given set of inputs

What exactly is a neural network

The inputs are normalized, Boolean variables are converted to -1 and +1, enumerations to a set of individual inputs that are set to 0 or 1. There are then used in the input nodes of the neural network. Then a to be determined number of hidden nodes evaluate a function based on all these inputs to produce a set of output nodes.

Activation Functions

  • Logistic sigmoid output between [0,1] y=1.0/(1.0 +exp(x))
  • Hyperbolic tangent output value between [-1, +1] y = tanh(x) = (ex - e-x)/(ex + e-x)
  • Heaviside step output value between [0,1] if (x<0 else="" if="" then="" x="" y="0">=0) then y=1
  • Softmax output between [0,1] and sum to 1.0 y=(e-x)/Sum(e-xj)
The ability to customize these functions means that it is often better top write your own neural network
 

Alternatives to neural networks
  1. Linear regression y = a x1 + bx2 + .....
  2. Logistic regression y = 1.o/(1,0+e-(ax1 + bx2 + ... + k))
  3. Naive Bayes: assumes input data are all independent and output is binary
  4. Decision trees
  5. Support vector machines: extremely complex implementation, assumes binary output

Neural networks pros and cons

  • Pro: can model any underlying math equation!
  • Pro: can handle multinomial output without resorting to tricks.
  • Con: moderate complexity, requires lots of training data.
  • Con: must pick number hidden nodes, activation functions, input/output encoding, error definition.
  • Con: must pick training method, training “free parameters,”
    (and over-fitting defense strategy).
     

Training

Back-propagation
Fastest technique.
Does not work with Heaviside activation.
Requires “learning rate” and “momentum.”
 
Genetic algorithm
Slowest technique.
Generally most effective.
Requires “population size,” “mutation rate,” “max generations,” “selection probability.”
 
Particle swarm optimization
Good compromise.
Requires “number particles,” “max iterations,” “cognitive weight,” “social weight.”
 
Avoiding Over-fitting
What is it?
Symptom: Model is great on predicting existing data, but fails miserably on new data.
Roulette example: red, red, black, red, red, black, red, red, black, red, red, ??
A serious problem for all classification/prediction techniques, not just neural networks.
 
Five most common techniques
Use lots of training data.
Train-Validate-Test (early stop when error on validate set begins to increase).
K-fold cross validation.
Repeated sub-sampling validation.
Jittering: deliberately adding noise data to make over-fitting impossible.
Quite a few exotic techniques also available (weight penalties, Bayesian learning, etc.).
 

Summary

Existing neural network tools are difficult or impossible to integrate into a software system.
Commercial and Open Source API libraries work well for some machine learning tasks but are extremely limited for neural networks.
To develop neural networks using Visual Studio you must understand seven core concepts: feed-forward, activation, data encoding, error, training, free parameters, and over-fitting.
Once the concepts are mastered, implementation with Visual Studio
is not difficult (but not easy either).
 

 

Monday, 8 July 2013

Notes from Build 2013 Scaling the real time web with ASP.NET SignalR

ASP.NET and SignalR have some improvements. For example SignalR is now being used in the Visual Studio development experience making it possible for edits to be propagated across browsers without referh. Here are some notes from Damian Edwards presentation :
3-502 Scaling the real time web with ASP.NET SignalR
See www.asp.net/signalr for getting started
 
·    Scaling real-time traffic shares many considerations with traditional web traffic eg CPU, bandwidth, memory
·    Application scenarios have huge impact on scaling patterns.
·    Big difference is in concurrency, supporting many long running idle and active connections vs short requests
·    Different SignalR transports have different overheads
 
General things to watch out for:
Blocking calls eg block I/O
·    Never ever block a Hub method, it jams up pipes
·    Use 4.5 async where possible
Sending large messages
·    Memory leaks caused by misunderstanding SignalR object lifetime eg Hub instances
·    Session - don't use it from SignalR. Instead use Hub state, cookies, browser storage, database etc. instead
Remember the secret of scale "Have your app do as little as possible. If you do nothing, you can scale infinitely" - Scott Hanselman
 
SignalR core architecture: Pub/Sub
1. Publisher
Message serialized and saved to cache associated with Signal, topic is marked for delivery
2. Message Cache
3. Worker
Worker is scheduled for signal, selects a waiting subscriber, retrieves message from cache
Worker sends message to client as bytes over transport
4. Client
 
Pattern 1 Sever broadcast
Low rate, message to all clients
Low rate broadcast of the same payload to clients
One message buss sends maps to many users (fan out)
More clients don't increase bus traffic
eg application wide alerts
 
Pattern 2 Server Push
Low rate, message to unique clients)
Low rate broadcast of the unique payload to each client
One message bus sends maps to one user (no fan out)
More clients means more message bus traffic
eg Job monitor
 
Pattern 3 User event driven
Broadcast on client action
Broadcast on client actions
One message bus send maps too many users (fan out)
More clients means more message traffic
eg Chat
 
Pattern 4 High frequency real-time
Fixed high rate, unique message
Fixed high rate broadcast from servers and clients (don't go above 25Hz
One message bus sends maps to one user (no fan out)
More clients means more message traffic
eg Gaming
 
Demo
There are command line utilities for Signal R in Microsoft.AspNt.SignalR.Utils. These include things like
signalR [args]
·    ipc Installs SignalR performance counters
·    upc Uninstalls SignalR performance counters
·    ghp Generates HubProxy JavaScript files for server Hub classes
To generate load use crank.exe /url:http://localhost:29573/TestConnection /Clients:100 /BatchSize:200
For Isolated scenarios in-proc use Stress (Microsoft.AspNet.SignalR.Stress\bin\Debug)
Adjust IIS Settings for concurrency limits
There was something said about Max concurrent requests per CPU
appcmd.exe set config /section:system.webserver/serverRuntime /appConcurrentRequestLimit:100000
 
Scale-out Issues
·    Client transience: How many messages one server gets to the other servers in my web farm
·    Client transience: When does a client disconnect from my App
·    Client transience: How do I avoid duplicate and missed messages as I move from one server to another
·    Client distribution: What happens if one function on one server is called many more times than others
 
To solve these issues use SQL Server, Redis & Windows Azure Service Bus (from NuGet). This works well for Server Broadcast pattern but limited for others because every message goes to every server, therefor as traffic increases you are limited by how fast any web server can pull messages off the backplane.
 
Back planes are much slower than single server
microsoft.aspnet.signalr.redis
 
public static void Start()
{
GlobalHost.DependencyResolver.UseRedis("localhost", 6379, "", "build");
RouteTable.Routes.MapHubs();
}
 
Custom scale-out
·    Common Server
·    Specific Server
·    Filtered message bus
·    Server transition
·    Hybrid
2.0 Scale out improvements
·    Support for pre serialized messages
·    Support for single serialization when sending multiple messages
Resources
·    github.com/signalr/signalr
·    twitter.com/damianedwards
 

Tuesday, 2 July 2013

Notes from Build 2013 Cloud based performance testing

I have just come back from Build 2013 where I covered a lot of subjects. Over the next few weeks I am going to revisit my notes so that I can retain more of what I learned at this conferance. So here are my first set of notes from a session 2-346 Cloud powered Load testing with Team Foundation Service. To watch the session follow this url http://channel9.msdn.com/Events/Build/2013/2-346
Ankit Saraf @vauntgarde
Some terms:
  • Performance: How is my application's behavior
  • Load: How will my application behave in Production
  • Stress: Can my application handle a lot of users
  • Scale/Capacity: How many servers do I need
In VS2013 there is a new project template called "Web Performance and Load Test Project". This project has a "Web test recorder" that records the interactions that a user makes while stripping out extra tags / cookies in the urls etc. These form tests that can be replayed. The tests can be parameterized using Context Parameters eg:
WebServer1=http://demomusicstore.cloudapp.net
{{WebServer1}}/

The recorded sessions can also be used to Generate code.

There is a New Load Test Wizard that has a number of steps to specify a load test. These include
Constant Load specifying the number of users
Step load specifying:
  • Start user count
  • Step duration
  • Step user count
  • Maximum user count

How should the test mix be modeled:
  • Based on the total number of tests
  • Based on the number of virtual users
  • Based on user pace
  • Based on sequential test order

The relative proportions of individual tests in a given load test can be specified.
A warm-up duration and run duration can be specified or Tests iterations
The Load test scenarios are presented in a tree view and the results in an html format