Hello again Nigel,
Someone on my team took a deeper look at this and was able to deterministically reproduce a ~2x speed up on his box. It looks like the primary wins are coming from the CRT, not the JIT. The 8.0/3.5 CRT was using x87 floating point instructions, but the 10.0/4.0 CRT is using SSE. The JIT uses the CRT for the Log/ Log10 and Exp calls. It seems like the SSE instructions used by the C runtime are the primary source of the wins although the JIT-ed code was better too.
.NET 3.5 also had the CRT installed in a global location (WinSxS) and hence could be affected by other applications that updated the CRT, which could explain the differences in our observations.
Thanks again for sharing your test case and reporting this to us.
Best,
Surupa
For the non technical:
JIT = Just In Time (it is the compiler that converts the IL Intermediate code (what we see as .exe file) into a native executable)
CRT = Common Runtime Library (It is all the libraries that the native code uses to interact with the hardware) This is a libray that is delivered with the framework and therefore has different versions:
.NET 2.0 -> CRT 8.0.50727.42
.NET 2.0 SP1 -> CRT 8.0.50727.762
.NET 3.5 -> CRT 9.0.21022.8
.NET 3.5 SP1 -> CRT 9.0.30729.1
I believe SSE means Streaming SIMD Extensions which are a set of processor extensions for Intel processors
From this web site http://neilkemp.us/src/sse_tutorial/sse_tutorial.html I found the following description of what SSE is
"First what the heck is SSE? Basically SSE is a collection of 128 bit CPU registers. These registers can be packed with 4 32 bit scalars after which an operation can be performed on each of the 4 elements simultaneously."
If I understand this correctly the .Net 4 takes advantage of certain processor extensions on the Intel processor via an improved version of the Common Runtime Library
Here's a great article over the CLR improvements on .Net 3.5 http://msdn.microsoft.com/en-us/magazine/dd569747.aspx