Wednesday 28 April 2010

How to delete objects in EF4 without loading them first

In LinqForSQL it was possible to make the delete commands in the following style

    Public Sub DeleteRecord(ByVal COUNTRY_ID As Integer)
        iDataClasses1DataContext.CED_COUNTRies.DeleteOnSubmit( _
              (From Data In iDataClasses1DataContext.CED_COUNTRies _
               Where Data.COUNTRY_ID = COUNTRY_ID).Single)
        iDataClasses1DataContext.SubmitChanges()
    End Sub

In EF4 this is not possible in this way see http://stackoverflow.com/questions/1895455/get-database-table-name-from-entity-framework-metadata. There was a design decision that in order to maintain constraints properly within the entities must be loaded into an object graph, in this way the constraints and dependencies can be maintained and transactions made within the business logic before it is persisted into the storage model. The problem is making direct commands like DELETE FROM means that the constraints and dependencies may be broken. But for performance reasons such a delete operation is a handy thing to have.


The first thing we looked at was how to delete the entire contents of a table. To do this we would like to emit SQL such as DELETE FROM TABLE. So we need to determine the table name of the entity. We would like to implement an interface looking like:
        void DeleteAll<TEntity>() where TEntity : class;

Our first attempt at determining the table name looked like:

ItemCollection itemCollection;
context.MetadataWorkspace.TryGetItemCollection(DataSpace.SSpace, out itemCollection);

This did not work because the ORM to the persistence layer is not accessible (The C-Space and the S-Space are separated by design in EF4). Our next attempt was to look at the ToTraceString of an ObjectSet. This is the emitted native SQL statement which can then be passed to return the table name. The following piece of code successfully retrieves the table name of a given entity:

        private string GetTableName<TEntity>() where TEntity : class
        {
            string snippet = "FROM [dbo].[";

            string sql = this.context.CreateObjectSet<TEntity>().ToTraceString();
            string sqlFirstPart = sql.Substring(sql.IndexOf(snippet) + snippet.Length);
            string tableName = sqlFirstPart.Substring(0, sqlFirstPart.IndexOf("]"));
            return tableName;
        }

We can the implement the interface for DeleteAll

        public void DeleteAll<TEntity>() where TEntity : class
        {
            string tableName = GetTableName<TEntity>();
            this.context.ExecuteStoreCommand(string.Format(CultureInfo.InvariantCulture, "DELETE FROM {0}", tableName));
        }

By the way, for massive tables it is faster to drop and recreate the table.
Next we want to make a delete statement with a where clause. The problem is that Linq is designed for queries and not for commands. Our ideal world is to create a Linq expression that looks like

            this.databaseContext
                .Query<VulnerabilityAggregate>()
                .Where<VulnerabilityAggregate>(v => v.Id == 1
                       && v.MainOccupancyType.Equals("OccType"))
                .Delete();

To get to this we need to get to the database field names so that we can build a where clause like DELETE FROM TABLE WHERE X=1. Again it is not possible to access the ORM to the table fields by design. So we play again with ObjectQueries and ToTraceString.
The following code will return the sql clause from the FROM sql statement

        private string GetClause<TEntity>(IQueryable<TEntity> clause) where TEntity : class
        {
            string snippet = "FROM [dbo].[";

            string sql = ((ObjectQuery<TEntity>)clause).ToTraceString();
            string sqlFirstPart = sql.Substring(sql.IndexOf(snippet));

            sqlFirstPart = sqlFirstPart.Replace("AS [Extent1]", "");
            sqlFirstPart = sqlFirstPart.Replace("[Extent1].", "");

            return sqlFirstPart;
        }

We needed to replace the alias of the SQL statement because this will cause a syntax error in when appended to a DELETE statement. The next step is to implement the interface

        [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Design", "CA1004:GenericMethodsShouldProvideTypeParameter")]
        void DeleteAll<TEntity>(IQueryable<TEntity> clause) where TEntity : class;

The implementation looks like:

        public void DeleteAll<TEntity>(IQueryable<TEntity> clause) where TEntity : class
        {
            string sqlClause = GetClause<TEntity>(clause);
            this.context.ExecuteStoreCommand(string.Format(CultureInfo.InvariantCulture, "DELETE {0}", sqlClause));
        }

This worked very well. The next step was to make an extension method so that we can include the where clause within a Linq statement. To do this we need to include the ObjectContext as follows: 

        public static IQueryable<T> Delete<T>(this IQueryable<T> instance, ObjectContext context) where T: class
        {
            string sqlClause = GetClause<T>(instance);
            context.ExecuteStoreCommand(string.Format(CultureInfo.InvariantCulture, "DELETE {0}", sqlClause));

            return instance;
        }
This made the Linq query look like:
            this.databaseContext
                .Query<VulnerabilityAggregate>()
                .Where<VulnerabilityAggregate>(v => v.Id == 1
                           && v.MainOccupancyType.Equals("OccType"))
                .Delete<VulnerabilityAggregate>(this.objectContext);

Looking at the SQL Profiler we could see that the emitted SQL was well formed. We also carried out some performance tests that showed an initial cost of 300ms followed by subsequent deletes taking 100ms. In visual studio 2010 select Test menu/Options Under Test Tools/Test Execution there is an option to “Keep test execution engine running between test runs”. If this is enabled you will get connection pooling and hence the previously observed initially long execution times followed by subsequent shorter times. 100 ms is a good time because using the SQL manager you will get times of 40-60 ms.

Conclusion: This is an efficient way to delete data from tables but you need to take care of the dependencies and constraints outside of the EF component model because we are directly interfacing with the database. In other words use with care.