开发者

Entity Framework 4.0 Automatically Truncate/Trim String Before Insert

开发者 https://www.devze.com 2023-02-12 22:02 出处:网络
Suppose I have a table with the column Description, varchar(100). If try to insert a string with more than 100 characters, the insert will fail.

Suppose I have a table with the column Description, varchar(100). If try to insert a string with more than 100 characters, the insert will fail.

Is there a way in Entity Framework to automatically truncate or trim the string to fit into the column before inserting into the column? In my scenario, I really don't care whether the string is truncated, I just want it inserted rather 开发者_JAVA技巧than just failing and logging the rror.

Since the model already knows the length limits, I was thinking there might be a way for Entity Framework to do this for me.

If this is not supported, what is the best way to do this? Extend the auto-generated partial classes and override the On*Changed methods? I would prefer not to hard-code the length limits, but rather use the length limits already defined in the entity model. How could I get access to this?

Edit

My final solution was to implement the On*Changed partial method of the autogenerated entity.

I used this method of getting the ObjectContext from the entity instance, and then used the below method to extract the max length, and truncate the string.


This will give you the max length of a column..

public int? GetColumnMaxLength(ObjectContext context, string entityTypeName, string columnName)
    {
        int? result = null;

        Type entType = Type.GetType(entityTypeName);
        var q = from meta in context.MetadataWorkspace.GetItems(DataSpace.CSpace)
                          .Where(m => m.BuiltInTypeKind == BuiltInTypeKind.EntityType)
                from p in (meta as EntityType).Properties
                .Where(p => p.Name == columnName
                            && p.TypeUsage.EdmType.Name == "String")
                select p;

        var queryResult = q.Where(p =>
        {
            bool match = p.DeclaringType.Name == entityTypeName;
            if (!match && entType != null)
            {
                //Is a fully qualified name....
                match = entType.Name == p.DeclaringType.Name;
            }

            return match;

        }).Select(sel => sel.TypeUsage.Facets["MaxLength"].Value);
        if (queryResult.Any())
        {
            result = Convert.ToInt32(queryResult.First());
        }

        return result;
    }


Here's my One-Line Solution

(invoking it is one line, the implementation is a little more)

I took the code from @elbweb and adapted it for my purposes. In my case I was parsing EDI files, some of which had 15 different levels to the hierarchy and I didn't want to explicitly specify all 15 different types - I wanted a one-liner that worked for all entity types.

It's a bit different but it's now painless to call. There is definitely a performance hit on this but it's acceptable for me. Essentially put this inside of your DbContext class and then it's a one-liner to manually call (or you can automatically call it by overriding SaveChanges to invoke it).

Code in your DbContext:

public class MyContext : DbContext
{

    ...

    public void TruncateAllStringsOnAllEntitiesToDbSize()
    {
        var objectContext = ((IObjectContextAdapter) this).ObjectContext;

        var stringMaxLengthsFromEdmx =
                objectContext.MetadataWorkspace
                             .GetItems(DataSpace.CSpace)
                             .Where(m => m.BuiltInTypeKind == BuiltInTypeKind.EntityType)
                             .SelectMany(meta => ((EntityType) meta).Properties
                             .Where(p => p.TypeUsage.EdmType.Name == "String"))
                             .Select(d => new
                                          {
                                              MaxLength = d.TypeUsage.Facets["MaxLength"].Value,
                                              PropName = d.Name,
                                              EntityName = d.DeclaringType.Name
                                          })
                             .Where(d => d.MaxLength is int)
                             .Select(d => new {d.PropName, d.EntityName, MaxLength = Convert.ToInt32(d.MaxLength)})
                             .ToList();

        var pendingEntities = ChangeTracker.Entries().Where(e => e.State == EntityState.Added || e.State == EntityState.Modified).Select(x => x.Entity).ToList();
        foreach (var entityObject in pendingEntities)
        {
            var relevantFields = stringMaxLengthsFromEdmx.Where(d => d.EntityName == entityObject.GetType().Name).ToList();

            foreach (var maxLengthString in relevantFields)
            {
                var prop = entityObject.GetType().GetProperty(maxLengthString.PropName);
                if (prop == null) continue;

                var currentValue = prop.GetValue(entityObject);
                var propAsString = currentValue as string;
                if (propAsString != null && propAsString.Length > maxLengthString.MaxLength)
                {
                    prop.SetValue(entityObject, propAsString.Substring(0, maxLengthString.MaxLength));
                }
            }
        }
    }
}

Consumption

try
{
    innerContext.TruncateAllStringsOnAllEntitiesToDbSize();
    innerContext.SaveChanges();
}
catch (DbEntityValidationException e)
{
    foreach (var err in e.EntityValidationErrors)
    {
        log.Write($"Entity Validation Errors: {string.Join("\r\n", err.ValidationErrors.Select(v => v.PropertyName + "-" + v.ErrorMessage).ToArray())}");
    }
    throw;
}

Before this code, the SaveChanges would trigger the catch in my example above when you tried inserting a string that was too large. After adding the TruncateAllStringsOnAllEntitiesToDbSize line, it works great now! I'm sure there are some optimizations that can go into this, so do please critique/contribute! :-)

Note: I have only tried this on EF 6.1.3


I took some of the logic from Richard's answer and turned it into a method to truncate all strings of an entity framework object based on their max length, if they're limited.

public static void TruncateStringsInEFObject<T>(List<T> entityObjects, ObjectContext context)
{
    var stringMaxLengthsFromEdmx = context.MetadataWorkspace.GetItems(DataSpace.CSpace)
        .Where(m => m.BuiltInTypeKind == BuiltInTypeKind.EntityType)
        .SelectMany(meta => (meta as EntityType).Properties
            .Where(p => p.TypeUsage.EdmType.Name == "String"
                        && p.DeclaringType.Name == typeof(T).Name))
        .Select(d => new {MaxLength = d.TypeUsage.Facets["MaxLength"].Value, d.Name})
        .Where(d => d.MaxLength is int)
        .Select(d => new {d.Name, MaxLength = Convert.ToInt32(d.MaxLength)})
        .ToList();

    foreach (var maxLengthString in stringMaxLengthsFromEdmx)
    {
        var prop = typeof(T).GetProperty(maxLengthString.Name);
        if (prop == null) continue;

        foreach (var entityObject in entityObjects)
        {
            var currentValue = prop.GetValue(entityObject);
            var propAsString = currentValue as string;
            if (propAsString != null && propAsString.Length > maxLengthString.MaxLength)
            {
                prop.SetValue(entityObject, propAsString.Substring(0, maxLengthString.MaxLength));
            }
        }
    }


I used a slightly different tack, but also utilizing the On*Changed methods. I'm generating partial classes using a stripped down version of the .tt file used by EF. The relevant section is where the properties are generated. The Maximum length is available and can be used to truncate the string.

 foreach (EdmProperty property in 
         entity.Properties.Where(p => p.DeclaringType == entity 
         && p.TypeUsage.EdmType is PrimitiveType))
 {

        /// If this is a string implements its OnChanged method
        if (property.TypeUsage.ToString() != "Edm.String") continue;

        int maxLength = 0;

        if (property.TypeUsage.Facets["MaxLength"].Value == null) continue;

        if (!Int32.TryParse(property.TypeUsage.Facets["MaxLength"].Value.ToString(), 
            out maxLength)) continue;

        if (maxLength == 0) continue;
        // Implement the On*Changed method

        #>
        partial void On<#= property.Name#>Changed() {
            <#=code.FieldName(property)#> =#=code.FieldName(property)#>.Substring(0,<#= maxLength #>);

        } 
        <#
    } 


This approach uses the attributes of the object properties, so it works with EF or perhaps other scenarios. If a property has a "StringLength" attribute, it will be truncated.

// Truncate any string that is too long.
var entry = new MyObject(); // Entity Framework object
entry.GetType().GetProperties().ToList().ForEach(p =>
{
    foreach (StringLengthAttribute attribute in p.GetCustomAttributes(true)
        .Where(a => a is StringLengthAttribute).Cast<StringLengthAttribute>())
    {
        string value = (p.GetValue(entry) ?? "").ToString();
        if (value.Length > attribute.MaximumLength)
        {
            // oops. Its too Long, so truncate it.
            p.SetValue(entry, value.Substring(0, attribute.MaximumLength));
        }
    }
});

this tested correctly using this example property (due to StringLength)

[Required]
[StringLength(6)] // only 6, for testing
public string Message { get; set; }


I'm going to present a unique solution to this problem.

In my use-case I wanted greater control over which fields are auto-truncated.

So, I created an attribute to annotate the entity properties I want truncated:

/// <summary>
/// Indicates that the assigned string should be automatically truncated to the 
/// max length of the database column as specified by 
/// <see cref="System.ComponentModel.DataAnnotations.MaxLengthAttribute"/>
/// </summary>
public class AutoTruncateAttribute : Attribute { }

public class TestEntity
{
    [MaxLength(10), AutoTruncate]
    public string SomeProperty { get; set; }
}

Next, I needed a way to actually truncate the entity. I want this to happen automatically as a pre-process step when the entity is saved.

So, I override the DbContext.SaveAs() methods and added a stub method for my truncation functionality:

public class DbContext
{
    public DbContext() : base("name=DbContext") { }

    public DbSet<TestEntity> TestEntities { get; set; }

    public override int SaveChanges()
    {
        PreProcessEntities();
        return base.SaveChanges();
    }

    public override Task<int> SaveChangesAsync()
    {
        PreProcessEntities();
        return base.SaveChangesAsync();
    }

    public override Task<int> SaveChangesAsync(CancellationToken cancellationToken)
    {
        PreProcessEntities();
        return base.SaveChangesAsync(cancellationToken);
    }

    /// <summary>Process entities before save</summary>
    public void PreProcessEntities()
    {
        // my pre-processing functionality will go here...
    }
}

I could stop at this point and just copy the code from the other answers and paste it into the PreProcessEntities() method. Eg I could loop through the added and modified entities, get the entity type, find properties with the AutoTruncate, and MaxLength attributes, then truncate the property value if needed.

BUT! There are a couple of problems with this approach.

  • Lack of flexibility to add additional pre-processing. Eg In my case, I also want some string properties containing numerical values to be to be formatted in different ways.
  • Looping through all the entities and properties and finding these attributes every time I run SaveAs() must give a noticeable performance hit.

So I wondered if it possible to cache the actions to be performed for each entity type. Turns out... there is :). It takes a bit of code but is easy enough to follow and implement.

Firstly, I need a thread-safe singleton to cache the actions for each entity. Then I want a class or method that can analyse an entity class and its attributes. The result of the analysis will be a list of actions to be applied to instances of that type before being saved to the database.

I created three classes/interfaces:

  • IEntityProcessor is an interface that accepts an entity and performs some action on it
  • AutoTruncateProcessor implements IEntityProcessor and truncates the entity's properties which have been marked with AutoTruncate and MaxLength
  • EntityProcessorChecker analyses an entity for certain attributes such as AutoTruncate and produces a list of IEntityProcessor instances.
  • EntityProcessorCache maintains a thread-safe cache of IEntityProcessor for each entity.

Naming isn't my strong suit, hopefully you get the idea though.

First, I'll show you the implementation of `DbContext.PreProcessEntities().

This can be implemented either using generics or not.

The first approach is without generics. The drawback is that we are accessing the cache for every entity being added or modified. This which may increase the hit to performance.

// without generics 
public void PreProcessEntities()
{
    var entities = ChangeTracker.Entries<T>()
        .Where(e => e.State == EntityState.Added || e.State == EntityState.Modified)
        .ToList();

    foreach (var entity in entities)
    {
        // drawback: access cache for every entity
        var processors = EntityProcessorCache.GetProcessors(entity);
        foreach (var processor in processors)
        {
            processor.Process(entity.Entity);
        }
    }
}

The second approach uses generics. The drawback is we need to explicitly call our generic method with explicit types. Maybe there's a way around this using reflection. The benefit is the cache is accessed only once per type, and an early exit if there are no actions.

// using generics
public void PreProcessEntities()
{
    // drawback: we need to remember to explicitly add new entity types here,
    // or find a dynamic solution using reflection
    PreProcessEntities<TestEntity>();
}

public void PreProcessEntities<T>() where T : class
{
    // benefit: access cache at the start
    var processors = EntityProcessorCache.GetProcessors<T>();

    if (!processors.Any()) return; // benefit: early exit

    // benefit: only processing entities of the given type
    var entities = ChangeTracker.Entries<T>()
        .Where(e => e.State == EntityState.Added || e.State == EntityState.Modified)
        .ToList();

    foreach (var entity in entities)
    {
        foreach (var processor in processors)
        {
            processor.Process(entity.Entity);
        }
    }
}

Finally, here is the implementation for IEntityProcessor, AutoTruncateProcessor, EntityProcessorChecker, and EntityProcessorCache.

public interface IEntityProcessor
{
    void Process(object entity);
}
public class AutoTruncateProcessor : IEntityProcessor
{
    IDictionary<System.Reflection.PropertyInfo, int> PropertyMaxLengths { get; }

    public AutoTruncateProcessor(List<System.Reflection.PropertyInfo> autoTruncateProperties)
    {
        PropertyMaxLengths = new Dictionary<System.Reflection.PropertyInfo, int>();

        // pre-compute values for the properties that should be truncated
        foreach (var property in autoTruncateProperties)
        {
            var customAttributes = property.GetCustomAttributes(true);
            var maxLengthAttribute = customAttributes.FirstOrDefault(a => a is MaxLengthAttribute) as MaxLengthAttribute;
            var maxLength = maxLengthAttribute?.Length;
            if (maxLength.HasValue) PropertyMaxLengths.Add(property, maxLength.Value);
        }
    }

    public void Process(object entity)
    {
        // use the pre-compute values to process entity
        foreach (var kv in PropertyMaxLengths)
        {
            var property = kv.Key;
            var maxLength = kv.Value;
            var currentValue = property.GetValue(entity) as string;

            // exit early
            if (string.IsNullOrEmpty(currentValue )) return;
            if (currentValue .Length < maxLength) return;

            var newValue = str.Substring(0, maxLength);
            property.SetValue(entity, newValue);
        }
    }
}

The EntityProcessorChecker looks through the properties of the given type and instantiates subclasses of IEntityProcessor when it finds matching attributes. This is where you customise the functionality based on your own applications needs. In my implementation I want to look for AutoTruncate and NumberFormat attributes.

public class EntityProcessorChecker
{
    IList<IEntityProcessor> Processors { get; }

    public EntityProcessorChecker(Type type)
    {
        Processors = new List<IEntityProcessor>();

        var properties = type.GetProperties();

        // get properties where there is an AutoTruncateAttribute
        var autoTruncateProperties = properties.Where(p => p.GetCustomAttributes(true).Any(a => a is AutoTruncateAttribute)).ToList();
        if (autoTruncateProperties.Any()) Processors.Add(new AutoTruncateProcessor(autoTruncateProperties));

        // get properties where there is a number formatter
        var formatterProperties = properties.Where(p => p.GetCustomAttributes(true).Any(a => a is NumberFormatAttribute)).ToList();
        // TODO: add this processor
    }

    public IList<IEntityProcessor> GetProcessors() => Processors;
}

// Some people may want a generic version of this class
public class EntityProcessorChecker<T> : EntityProcessorChecker
{
    public EntityProcessorChecker() : base(typeof(T)) { }
}

The EntityProcessorCache means we don't need to continually analyse entity types for the actions we need to perform on their instances. It's an optimisation step. The singleton design was inspired by this article.

// sealed so that we can ensure it remains a singleton
public sealed class EntityProcessorCache
{
    // private concurrent dictionary for thread safety
    private static readonly ConcurrentDictionary<Type, IList<IEntityProcessor>> Cache = new ConcurrentDictionary<Type, IList<IEntityProcessor>>();

    // a lock for when we intend to write to the cache
    private static readonly object CacheWriteLock = new object();

    // Explicit static constructor to tell C# compiler
    // not to mark type as `beforefieldinit`
    static EntityProcessorCache() { }

    // the only way to access the cache
    public static IList<IEntityProcessor> GetProcessors(Type type)
    {
        // return early if cache is populated
        if (Cache.ContainsKey(type)) return Cache[type];

        // lock cache writing writing
        lock (CacheWriteLock)
        {
            // another caller may have locked before this call
            // return early if cache is now populated
            if (Cache.ContainsKey(type)) return Cache[type];

            // analyse the type and cache the list of `IEntityProcessor`
            var checker = new EntityProcessorChecker(type);
            var processors = checker.GetProcessors();
            Cache[type] = processors;

            return processors;
        }
    }

    // alternatively, pass in a typed instance
    public static IList<IEntityProcessor> GetProcessors(object obj) => GetProcessors(obj.GetType());

    // alternatively, use generics
    public static IList<IEntityProcessor> GetProcessors<T>() => GetProcessors(typeof(T));
}

Wow that was a lot of code and explanation!

Hopefully you can see some benefits in this approach.

I like that I can quickly and easily create new attributes to mark up my entity properties and processers to modify my entities before save.

There is of course plenty of room for improvements:

EntityProcessorChecker should probably be refactored to make the analysis process more extensible and explicitly obvious when looking at an entity. It should be easy to define and check for new attributes and create corresponding processors.

EntityProcessorCache was designed the way it was because of how the DbContext is built in .Net Framework. In .Net Core, I presume we could use the built-in DI system to create and manage the singleton instance instead. As long as the DbContext is able to access it, I assume the cache would be passed in via the constructor.

0

精彩评论

暂无评论...
验证码 换一张
取 消