Wednesday, 28 April 2010

How to delete objects in EF4 without loading them first

In LinqForSQL it was possible to make the delete commands in the following style

    Public Sub DeleteRecord(ByVal COUNTRY_ID As Integer)
        iDataClasses1DataContext.CED_COUNTRies.DeleteOnSubmit( _
              (From Data In iDataClasses1DataContext.CED_COUNTRies _
               Where Data.COUNTRY_ID = COUNTRY_ID).Single)
        iDataClasses1DataContext.SubmitChanges()
    End Sub

In EF4 this is not possible in this way see http://stackoverflow.com/questions/1895455/get-database-table-name-from-entity-framework-metadata. There was a design decision that in order to maintain constraints properly within the entities must be loaded into an object graph, in this way the constraints and dependencies can be maintained and transactions made within the business logic before it is persisted into the storage model. The problem is making direct commands like DELETE FROM means that the constraints and dependencies may be broken. But for performance reasons such a delete operation is a handy thing to have.


The first thing we looked at was how to delete the entire contents of a table. To do this we would like to emit SQL such as DELETE FROM TABLE. So we need to determine the table name of the entity. We would like to implement an interface looking like:
        void DeleteAll<TEntity>() where TEntity : class;

Our first attempt at determining the table name looked like:

ItemCollection itemCollection;
context.MetadataWorkspace.TryGetItemCollection(DataSpace.SSpace, out itemCollection);

This did not work because the ORM to the persistence layer is not accessible (The C-Space and the S-Space are separated by design in EF4). Our next attempt was to look at the ToTraceString of an ObjectSet. This is the emitted native SQL statement which can then be passed to return the table name. The following piece of code successfully retrieves the table name of a given entity:

        private string GetTableName<TEntity>() where TEntity : class
        {
            string snippet = "FROM [dbo].[";

            string sql = this.context.CreateObjectSet<TEntity>().ToTraceString();
            string sqlFirstPart = sql.Substring(sql.IndexOf(snippet) + snippet.Length);
            string tableName = sqlFirstPart.Substring(0, sqlFirstPart.IndexOf("]"));
            return tableName;
        }

We can the implement the interface for DeleteAll

        public void DeleteAll<TEntity>() where TEntity : class
        {
            string tableName = GetTableName<TEntity>();
            this.context.ExecuteStoreCommand(string.Format(CultureInfo.InvariantCulture, "DELETE FROM {0}", tableName));
        }

By the way, for massive tables it is faster to drop and recreate the table.
Next we want to make a delete statement with a where clause. The problem is that Linq is designed for queries and not for commands. Our ideal world is to create a Linq expression that looks like

            this.databaseContext
                .Query<VulnerabilityAggregate>()
                .Where<VulnerabilityAggregate>(v => v.Id == 1
                       && v.MainOccupancyType.Equals("OccType"))
                .Delete();

To get to this we need to get to the database field names so that we can build a where clause like DELETE FROM TABLE WHERE X=1. Again it is not possible to access the ORM to the table fields by design. So we play again with ObjectQueries and ToTraceString.
The following code will return the sql clause from the FROM sql statement

        private string GetClause<TEntity>(IQueryable<TEntity> clause) where TEntity : class
        {
            string snippet = "FROM [dbo].[";

            string sql = ((ObjectQuery<TEntity>)clause).ToTraceString();
            string sqlFirstPart = sql.Substring(sql.IndexOf(snippet));

            sqlFirstPart = sqlFirstPart.Replace("AS [Extent1]", "");
            sqlFirstPart = sqlFirstPart.Replace("[Extent1].", "");

            return sqlFirstPart;
        }

We needed to replace the alias of the SQL statement because this will cause a syntax error in when appended to a DELETE statement. The next step is to implement the interface

        [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Design", "CA1004:GenericMethodsShouldProvideTypeParameter")]
        void DeleteAll<TEntity>(IQueryable<TEntity> clause) where TEntity : class;

The implementation looks like:

        public void DeleteAll<TEntity>(IQueryable<TEntity> clause) where TEntity : class
        {
            string sqlClause = GetClause<TEntity>(clause);
            this.context.ExecuteStoreCommand(string.Format(CultureInfo.InvariantCulture, "DELETE {0}", sqlClause));
        }

This worked very well. The next step was to make an extension method so that we can include the where clause within a Linq statement. To do this we need to include the ObjectContext as follows: 

        public static IQueryable<T> Delete<T>(this IQueryable<T> instance, ObjectContext context) where T: class
        {
            string sqlClause = GetClause<T>(instance);
            context.ExecuteStoreCommand(string.Format(CultureInfo.InvariantCulture, "DELETE {0}", sqlClause));

            return instance;
        }
This made the Linq query look like:
            this.databaseContext
                .Query<VulnerabilityAggregate>()
                .Where<VulnerabilityAggregate>(v => v.Id == 1
                           && v.MainOccupancyType.Equals("OccType"))
                .Delete<VulnerabilityAggregate>(this.objectContext);

Looking at the SQL Profiler we could see that the emitted SQL was well formed. We also carried out some performance tests that showed an initial cost of 300ms followed by subsequent deletes taking 100ms. In visual studio 2010 select Test menu/Options Under Test Tools/Test Execution there is an option to “Keep test execution engine running between test runs”. If this is enabled you will get connection pooling and hence the previously observed initially long execution times followed by subsequent shorter times. 100 ms is a good time because using the SQL manager you will get times of 40-60 ms.

Conclusion: This is an efficient way to delete data from tables but you need to take care of the dependencies and constraints outside of the EF component model because we are directly interfacing with the database. In other words use with care.

Using Automapper to map entities with similar property names

We started to use a tool from http://automapper.codeplex.com/ that can be used to map entities with similar properties together. We need this when transferring small amounts of data from one database to another. When you download this DLL to the your External libraries directory you will need to right mouse click on it and look at it’s properties because since it might be blocked. If it is blocked just click on the unblock button. First it’s a good practice to wrap third party libraries in a wrapper. We did this in the following self initializing static class. The first time a mapping is required the Initialize method is called.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Ccp.Entities;
using AutoMapper;

namespace Ccp.Infrastructure
{
    /// <summary>
    /// Mapper to convert entity to entity.
    /// </summary>
    public static class EntityMapper
    {
        private static bool IsInitialized = false;

        /// <summary>
        /// Create all the static entity mapping declarations for the current process/AppDomain.
        /// </summary>
        internal static void Initialize()
        {
            Mappings.Initialize();
            IsInitialized = true;
        }

        /// <summary>
        /// Convert the source type object into a destination type object.
        /// </summary>
        /// <typeparam name="TSource">The type of the source entity.</typeparam>
        /// <typeparam name="TDest">The type of the destination entity.</typeparam>
        /// <param name="source">The source object.</param>
        /// <returns>The destination object.</returns>
        public static TDest Map<TSource, TDest>(TSource source)
        {
            if (!IsInitialized)
            {
                Initialize();
            }

            TDest dest = Mapper.Map<TSource, TDest>(source);
            return dest;
        }
    }
}
Here’s the unit test that tests how the mapping works

        [TestMethod]
        public void CreateMap_BetweenVulnerabilityAggregates_MapIsCreated()
        {
            Ccp.Infrastructure.Mappings.Initialize();

            Ccp.Entities.VulnerabilityAggregate source = CreateVulnerabilityAggregate();
            Ccp.Entities.TropicalCyclone.VulnerabilityAggregate destination = Mapper.Map<Ccp.Entities.VulnerabilityAggregate, Ccp.Entities.TropicalCyclone.VulnerabilityAggregate>(source);

            Assert.AreNotEqual<int>(source.Id, destination.Id); // because ignored in mapping
            Assert.AreEqual<string>(source.BuildingQuality, destination.BuildingQuality);
            Assert.AreEqual<int>(source.VulnerabilityFunctionId.Value, destination.VfId.Value);
        }

Tuesday, 20 April 2010

Today’s journey in the world of EF DALs

Here is a small story of debugging and developing a POCO EF4 DAL. It starts of with this piece of code

          CED_COUNTRY country =  this.databaseContext
                .Query<CED_COUNTRY>()
                .Where(c => c.REGION_ID.Equals(region.REGION_ID))
                .First();

This produced the error:
“Unable to create a constant value of type 'System.Object'. Only primitive types ('such as Int32, String, and Guid') are supported in this context.”

The mistake was that REGION_ID is nullable and could be fixed by

           CED_COUNTRY country =  this.databaseContext
                .Query<CED_COUNTRY>()
                .Where(c => c.REGION_ID.Value.Equals(region.REGION_ID))
                .First();

Next I was interested in being able to debug the Linq expression. By using F11 and step into we can step into Query extension methods such as Query but we are not able to step into the Func expression in the where clause. After some experimentation we found that it is not possible to step into a Func but it is possible to step into an action. We decided to make some Query extension methods for pure debugging purposes. The idea is that the Linq expression can be modified with these additional debugging extension methods in a similar way to the T in a piped unix command.
To debug the above example you can edit the linq expression to look like

            CED_COUNTRY country =  this.databaseContext
                .Query<CED_COUNTRY>()
                .Debug()
                .Where(e)
                .Debug()
                .First();

The corresponding extension methods look like:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.VisualStudio.TestTools.UnitTesting;
namespace CCP.DataAccessLayerTest
{
    public static class EnumerableExtensions
    {
        public static IQueryable<T> Debug<T>(this IQueryable<T> instance)
        {
            // do stuff to debug here, like
            Assert.IsNotNull(instance);
            Console.WriteLine(instance.Count());
            return instance;
        }
    }
}


Next we had a problem when updating a string that is defined as a varchar(100) in the database. As we persist a field with more than 100 characters into the database we got a truncation error.
We first looked at the EDM to determine if there was some truncation or validation options. Unfortunately the EDM did not have these options. Next we considered modify the setter in the a attribute driven way as shown below.

http://msdn.microsoft.com/en-us/library/cc309320%28v=MSDN.10%29.aspx

using Microsoft.Practices.EnterpriseLibrary.Validation;
using Microsoft.Practices.EnterpriseLibrary.Validation.Validators;
public class Customer
{
    [StringLengthValidator(0, 20)]
    public string CustomerName;
    public Customer(string customerName)
    {
        this.CustomerName = customerName;
    }
}
public class MyExample
{
    public static void Main()
    {
        Customer myCustomer = new Customer("A name that is too long");
        ValidationResults r = Validation.Validate<Customer>(myCustomer);
        if (!r.IsValid)
        {
            throw new InvalidOperationException("Validation error found.");
        }
    }
}
In our case this would look like

namespace Ccp.Entities
{
    public class Validation
    {
        public static IEnumerable<ValidationResult> Validate(object component)
        {
            return from descriptor in
TypeDescriptor.GetProperties(component).Cast<PropertyDescriptor>()
                    from validation in descriptor.Attributes.OfType<ValidationAttribute>()
                    where !validation.IsValid(descriptor.GetValue(component))
                    select new ValidationResult(
                        validation.ErrorMessage ?? string.Format(CultureInfo.CurrentUICulture, "{0} validation failed.", validation.GetType().Name),
                        new[] { descriptor.Name });
        }
    }
}

With our T4 template modified to add a compiler attribute looking like:

     [StringLength(100)]
        public virtual string SUBMISSION_NAME
        {
            get;
            set;
        }

But this would mean adding validation everywhere which will have a performance hit. Also this would mean that we need to make another alteration to the T4 template.
Here is how we solved this problem.:

1.    Create an interface for the validator and IEntityRectifier

namespace Ccp.Entities
{
    public interface IEntityValidator
    {
        IEnumerable<ValidationResult> Validate();
    }
}
namespace Ccp.Entities
{
    public interface IEntityRectifier
    {
        void Rectify();
    }
}

2.    Create a partial class that extends the POCO that you want to validate and implement the interfaces as appropriate

using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.ComponentModel.DataAnnotations;
namespace Ccp.Entities
{
    public partial class SUB_SUBMISSION : IEntityValidator, IEntityRectifier
    {
        #region IEntityValidator Members
        public IEnumerable<ValidationResult> Validate()
        {
            if (IsSubscriptionNameTooLong())
            {
                yield return new ValidationResult("SUBMISSION_NAME should not be longer than 100");
            }
            if (this.CEDANT_ID < 0)
            {
                yield return new ValidationResult("CEDANT_ID should not be less than 0");
            }
        }
        private bool IsSubscriptionNameTooLong()
        {
            return this.SUBMISSION_NAME.Length > 100;
        }
       #endregion
        #region IEntityRectifier Members
        public void Rectify()
        {
            if (IsSubscriptionNameTooLong())
            {
                this.SUBMISSION_NAME = this.SUBMISSION_NAME.Substring(0, 100);
            }
        }
        #endregion
    }
}

3. In the generic DataBaseContext add the following procedures to validate and rectify properties. Note that if the interface has not been implemented the validation or rectify will not happen.

       private static void ValidateEntity(object entity)
       {
            IEntityValidator validator = entity as IEntityValidator;
            if (validator != null)
            {
                foreach (ValidationResult result in validator.Validate())
                {
                    Console.WriteLine(result);
                }
            }
        }

        private static void RectifyEntity(object entity)
        {
            IEntityRectifier rectifier = entity as IEntityRectifier;
            if (rectifier != null)
            {
                rectifier.Rectify();
            }
        }

4. Invoke Validate and Rectify from the Add and save methods
        public void Add(params object[] entities)
        {
            foreach (object entity in entities)
            {
                ValidateEntity(entity);
                RectifyEntity(entity);
                string entityType = GetEntitySetName(entity.GetType());
                context.AddObject(entityType, entity);
            }
        } 
      public void Save()
        {
            foreach (ObjectStateEntry objectStateEntry in context.ObjectStateManager.GetObjectStateEntries(EntityState.Modified|EntityState.Added|EntityState.Unchanged|EntityState.Deleted))
            {
                ValidateEntity(objectStateEntry.Entity);
                RectifyEntity(objectStateEntry.Entity);
            }
            int result = context.SaveChanges();
        }

Unfortunately the ObjectStateManager has some problems in getting only Modified entries. This is probably because the POCO is ignorant of the database and therefore the ObjectStateManager cannot determine if the object has been modified. I we used the EDM database context without POCO this would probably work without problems

Advantages of this method
- Optional interface: You only need to implement it where you need it
- It's still POCO and database ignorant
- We can combine several properties to generates more advanced validation
- Better performing than using a declarative approach and reflection

Disadvantages
The Object state manager does not seem to work as expected.
When we change the ORM table naming and regenerate the POCO the partial class will not generate a compiler error. We could introduce a property that is addressed by the partial class, in this way we would get a desirable compiler error. But we decided this is an unclean approach and we will either live with it or find an alternative approach.

Sunday, 11 April 2010

Catching up on some old notes about indexed views

In May 2008 I was looking into optimizing our EDMConverter by using index views. I first came across index views in a talk by Kimberly Tripp in 2005 “SQL Server 2000 Performance Best Practices and Getting Ready for SQL Server 2005”. An index View (Oracle equivalent to Materialized views) is several orders of magnitude faster than a normal query because behind an index view is a cache corresponding to the query. As new data is entered the contents of this cache is updated. Since this data is cached only a read is made hence it’s speed.

Indexed views can be made only on a subset of deterministic data. As new data is added triggers are invoked that update the cached index meaning that there is a smaller performance hit on data updates. Indirectly Indexed views can be used by the sql compiler even when the view is not directly referenced by the select statement. Another consideration is that the contents of the query needs to be stored within the database meaning more space is required

In the case of our EDM data Indexed views are of only very limited value. The reason was that deterministic data includes float, double, min, max etc which in our case was needed because our queries revolved around sums insured which are stored as a float. We used the Tuning advisor to optimize some predefined load, it turned out that that the EDM was already well optimized. Therefore in the end we optimized our queries by adding precalculated data.

I am still looking for ways to optimize and generate reports faster, when I get some time there are a lot of ideas I need to follow up on. At the PDC09 I visited “Data-Intensive Computing on Windows HP Server with the DryadLINQ Framework” presented by John Vert. This could be a way to make reporting faster by spreading the query across nodes of an HPC Cluster. DryadLinq is still experimental but the core structure looks like:

- Scale-out, partitioned container for .NET objects
- Derives from IQueryable<T>, IEnumererable<T> with ToPartitionedTable() extension methods
- DryadLinq operators consume and produce PartitionedTable<T>
- DryadLinq generates code to serialize/deserialize you .NET objects
- Underlying storage can be a partitioned file, partitioned SQL table, cluster files

Tuesday, 6 April 2010

Implementation of custom types within the mapping of E4 POCO DAL

I have a data access layer that I am replacing with an E4 POCO DAL. For historical reasons the database has some custom data types such as a Boolean that has been implemented in the database as a string that contains ‘True’ or ‘False’. We want the business entities to have strongly typed properties, in other words a boolean.

We opened up our EDM and wanted to change the data type of the mapped field sso that on the database side it is a string and on the business logic side it is a Boolean. The editor of VS2010 RC2 even allows you to do this but when you generate the POCOs you will end up with a compile time error. Mike Taulty experienced the same problem in his blog: http://mtaulty.com/CommunityServer/blogs/mike_taultys_blog/archive/2008/02/14/10182.aspx

NHybernate has an extensible model where you can implement IUserType to do this kind of custom types. Since E4 is not quite as extensible we decided to modify the T4 template to generate an additional property who’s setter sets the value of the string value. This is quite fiddly since the template is like a script with a lot of if statements, actually a long way from an OO style of programming. We replaced lines 159-160 with

<#
        // NOTE: added for simple test to map string column to bool type
        if (code.FieldName(edmProperty).EndsWith("StringFlag"))
        {
#>
    }
    <#=PropertyVirtualModifier(Accessibility.ForProperty(edmProperty))#> bool <#=code.Escape(edmProperty).Replace("StringFlag", "")#>
    {
        <#=code.SpaceAfter(Accessibility.ForGetter(edmProperty))#>get
        {
            bool result = false;
            bool.TryParse(<#=code.Escape(edmProperty)#>, out result);
            return result;
        }

        <#=code.SpaceAfter(Accessibility.ForSetter(edmProperty))#>set
        {
            <#=code.Escape(edmProperty)#> = value.ToString();
        }
<#
        }
    }
#>

What this does is that if we rename a property of a POCO database field to have the ending “StringFlag” it adds another property as shown below:

    public virtual string IsFloridaOnlyStringFlag
    {
        get;
        set;
    }
    public virtual bool IsFloridaOnly
    {
        get
        {
            bool result = false;
            bool.TryParse(IsFloridaOnlyStringFlag, out result);
            return result;
        }

        set
        {
            IsFloridaOnlyStringFlag = value.ToString();
        }
    }

 

In order to get the layout of the generated code to be nice the T4 template looks a little unreadable! We did not set the modifier of the StringFlag to protected because E4 needs to reflect and find this field. So this means unfortunately our business entities will have the field that we are trying to replace. In our case this is ok because we will systematically refactor the database model as we would do any other code.

The other interesting problem we had came when we wanted to rename the business entity corresponding to the table name. For Example MOP_MODEL_RUN should become ModelRun. In this way we can satisfy the FxCop code analysis rules. The problem came in the following code:

public IList<TEntity> GetAll<TEntity>()
{

    IList<TEntity> list = this.context
        .CreateQuery<TEntity>(
        "[" + typeof(TEntity).Name + "]")
        .ToList();
    return list;
}

This failed because typeof(TEntity).Name returned the business entity name of the table and not the database table name. Unfortunately the mapping information between the database world and the business entity world is quite hard to get, it is surprising that there almost nobody out there complaining about this. To fix this we changed the above code to:

public IList<TEntity> GetAll<TEntity>()
{
    string entityType = GetEntitySetName(typeof(TEntity));

    IList<TEntity> list = this.context
        .CreateQuery<TEntity>(
        /* "[" + */ entityType /* + "]" */)
        .ToList();
    return list;
}

We cache this mapping in a dictionary so we added:

private IDictionary<Type, string> objectSetNameMap = new Dictionary<Type, string>();

and

private string GetEntitySetName(Type entityType)
{
    if (!this.objectSetNameMap.ContainsKey(entityType))
    {
        this.objectSetNameMap[entityType] = EntityHelper.GetEntitySetName(entityType, this.context);
    }
    return this.objectSetNameMap[entityType];
}

And here is the helper class that figures out the mapping between the business entities and the database entities:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data.Metadata.Edm;
using System.Reflection;
using System.Data.Objects;
using System.ComponentModel;

namespace Ccp.DataAccessLayer
{
    public static class EntityHelper
    {
        private static void LoadAssemblyIntoWorkspace(MetadataWorkspace workspace, Assembly assembly)
        {
            workspace.LoadFromAssembly(assembly);
        }

        #region GetEntitySetName

        public static string GetEntitySetName(Type entityType, ObjectContext context)
        {
            EntityType edmEntityType = GetEntityType(context, entityType);
            EntityContainer container = context.MetadataWorkspace.GetItems<EntityContainer>(DataSpace.CSpace).Single<EntityContainer>();
            EntitySet set = (EntitySet)container.BaseEntitySets.Single<EntitySetBase>(delegate(EntitySetBase p)
            {
                return (p.ElementType == edmEntityType);
            });
            return (container.Name + "." + set.Name);
        }

        #endregion

        #region GetEntityType
        public static EntityType GetEntityType(ObjectContext context, Type clrType)
        {
            if (context == null)
            {
                throw new ArgumentNullException("context");
            }
            if (clrType == null)
            {
                throw new ArgumentNullException("clrType");
            }
            EdmType type = null;
            try
            {
                type = context.MetadataWorkspace.GetType(clrType.Name, clrType.Namespace, DataSpace.OSpace);
            }
            catch (ArgumentException)
            {
                LoadAssemblyIntoWorkspace(context.MetadataWorkspace, clrType.Assembly);
                type = context.MetadataWorkspace.GetType(clrType.Name, clrType.Namespace, DataSpace.OSpace);
            }
            return (EntityType)context.MetadataWorkspace.GetEdmSpaceType((StructuralType)type);
        }

        public static bool TryGetEntityType(ObjectContext context, Type clrType, out EntityType entityType)
        {
            entityType = null;
            if (context == null)
            {
                throw new ArgumentNullException("context");
            }
            if (clrType == null)
            {
                throw new ArgumentNullException("clrType");
            }
            EdmType type = null;
            bool flag = context.MetadataWorkspace.TryGetType(clrType.Name, clrType.Namespace, DataSpace.OSpace, out type);
            if (!flag)
            {
                LoadAssemblyIntoWorkspace(context.MetadataWorkspace, clrType.Assembly);
                flag = context.MetadataWorkspace.TryGetType(clrType.Name, clrType.Namespace, DataSpace.OSpace, out type);
            }
            if (flag)
            {
                entityType = (EntityType)context.MetadataWorkspace.GetEdmSpaceType((StructuralType)type);
                return true;
            }
            return false;
        }
        #endregion

        #region GetReferenceProperty

        public static PropertyDescriptor GetReferenceProperty(PropertyDescriptor pd)
        {
            return GetReferenceProperty(pd, TypeDescriptor.GetProperties(pd.ComponentType).Cast<PropertyDescriptor>());
        }

        public static PropertyDescriptor GetReferenceProperty(PropertyDescriptor pd, IEnumerable<PropertyDescriptor> properties)
        {
            string refPropertyName = pd.Name + "Reference";
            return properties.SingleOrDefault<PropertyDescriptor>(delegate(PropertyDescriptor p)
            {
                return (p.Name == refPropertyName);
            });
        }
        #endregion

    }}