开发者

Need a Better Way Than Reflection

开发者 https://www.devze.com 2022-12-16 15:01 出处:网络
I\'m reading a CSV file and the records are recorded as a string[]. I want to take each record and convert it into a custom object.

I'm reading a CSV file and the records are recorded as a string[]. I want to take each record and convert it into a custom object.

T GetMyObject<T>();

Currently I'm doing this through reflection which is really slow. I'm testing with a 515 Meg file with several million records. It takes under 10 seconds to parse. It takes under 20 seconds to create the custom objects using manual conversions with Convert.ToSomeType but around 4 minutes to do the conversion to the objects through reflection.

What is a good way to handle this automatically?

It seems a lot of time is spent in the PropertyInfo.SetValue method. I tried caching the properties MethodInfo setter and using that instead, but it was actually slower.

I have also tried converting that into a delegate like the great Jon Skeet suggested here: Improving performance reflection , what alternatives should I consider, but the problem is I don't know what the property type is ahead of time. I'm able to get the delegate

var myObject = Activator.CreateInstance<T>();
foreach( var property in typeof( T ).GetProperties() )
{
    var d = Delegate.CreateDelegate( typeof( Action<,> )
    .MakeGenericType( typeof( T ), property.PropertyType ), property.GetSetMethod() );
}

The proble开发者_JAVA技巧m here is I can't cast the delegate into a concrete type like Action<T, int>, because the property type of int isn't known ahead of time.


The first thing I'd say is write some sample code manually that tells you what the absolute best case you can expect is - see if your current code is worth fixing.

If you are using PropertyInfo.SetValue etc, then absolutely you can make it quicker, even with juts object - HyperDescriptor might be a good start (it is significantly faster than raw reflection, but without making the code any more complicated).

For optimal performance, dynamic IL methods are the way to go (precompiled once); in 2.0/3.0, maybe DynamicMethod, but in 3.5 I'd favor Expression (with Compile()). Let me know if you want more detail?


Implementation using Expression and CsvReader, that uses the column headers to provide the mapping (it invents some data along the same lines); it uses IEnumerable<T> as the return type to avoid having to buffer the data (since you seem to have quite a lot of it):

using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Linq.Expressions;
using System.Reflection;
using LumenWorks.Framework.IO.Csv;
class Entity
{
    public string Name { get; set; }
    public DateTime DateOfBirth { get; set; }
    public int Id { get; set; }

}
static class Program {

    static void Main()
    {
        string path = "data.csv";
        InventData(path);

        int count = 0;
        foreach (Entity obj in Read<Entity>(path))
        {
            count++;
        }
        Console.WriteLine(count);
    }
    static IEnumerable<T> Read<T>(string path)
        where T : class, new()
    {
        using (TextReader source = File.OpenText(path))
        using (CsvReader reader = new CsvReader(source,true,delimiter)) {

            string[] headers = reader.GetFieldHeaders();
            Type type = typeof(T);
            List<MemberBinding> bindings = new List<MemberBinding>();
            ParameterExpression param = Expression.Parameter(typeof(CsvReader), "row");
            MethodInfo method = typeof(CsvReader).GetProperty("Item",new [] {typeof(int)}).GetGetMethod();
            Expression invariantCulture = Expression.Constant(
                CultureInfo.InvariantCulture, typeof(IFormatProvider));
            for(int i = 0 ; i < headers.Length ; i++) {
                MemberInfo member = type.GetMember(headers[i]).Single();
                Type finalType;
                switch (member.MemberType)
                {
                    case MemberTypes.Field: finalType = ((FieldInfo)member).FieldType; break;
                    case MemberTypes.Property: finalType = ((PropertyInfo)member).PropertyType; break;
                    default: throw new NotSupportedException();
                }
                Expression val = Expression.Call(
                    param, method, Expression.Constant(i, typeof(int)));
                if (finalType != typeof(string))
                {
                    val = Expression.Call(
                        finalType, "Parse", null, val, invariantCulture);
                }
                bindings.Add(Expression.Bind(member, val));
            }

            Expression body = Expression.MemberInit(
                Expression.New(type), bindings);

            Func<CsvReader, T> func = Expression.Lambda<Func<CsvReader, T>>(body, param).Compile();
            while (reader.ReadNextRecord()) {
                yield return func(reader);
            }
        }
    }
    const char delimiter = '\t';
    static void InventData(string path)
    {
        Random rand = new Random(123456);
        using (TextWriter dest = File.CreateText(path))
        {
            dest.WriteLine("Id" + delimiter + "DateOfBirth" + delimiter + "Name");
            for (int i = 0; i < 10000; i++)
            {
                dest.Write(rand.Next(5000000));
                dest.Write(delimiter);
                dest.Write(new DateTime(
                    rand.Next(1960, 2010),
                    rand.Next(1, 13),
                    rand.Next(1, 28)).ToString(CultureInfo.InvariantCulture));
                dest.Write(delimiter);
                dest.Write("Fred");
                dest.WriteLine();
            }
            dest.Close();
        }
    }
}

Second version (see comments) that uses TypeConverter rather than Parse:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Linq.Expressions;
using System.Reflection;
using LumenWorks.Framework.IO.Csv;
class Entity
{
    public string Name { get; set; }
    public DateTime DateOfBirth { get; set; }
    public int Id { get; set; }

}
static class Program
{

    static void Main()
    {
        string path = "data.csv";
        InventData(path);

        int count = 0;
        foreach (Entity obj in Read<Entity>(path))
        {
            count++;
        }
        Console.WriteLine(count);
    }
    static IEnumerable<T> Read<T>(string path)
        where T : class, new()
    {
        using (TextReader source = File.OpenText(path))
        using (CsvReader reader = new CsvReader(source, true, delimiter))
        {

            string[] headers = reader.GetFieldHeaders();
            Type type = typeof(T);
            List<MemberBinding> bindings = new List<MemberBinding>();
            ParameterExpression param = Expression.Parameter(typeof(CsvReader), "row");
            MethodInfo method = typeof(CsvReader).GetProperty("Item", new[] { typeof(int) }).GetGetMethod();

            var converters = new Dictionary<Type, ConstantExpression>();
            for (int i = 0; i < headers.Length; i++)
            {
                MemberInfo member = type.GetMember(headers[i]).Single();
                Type finalType;
                switch (member.MemberType)
                {
                    case MemberTypes.Field: finalType = ((FieldInfo)member).FieldType; break;
                    case MemberTypes.Property: finalType = ((PropertyInfo)member).PropertyType; break;
                    default: throw new NotSupportedException();
                }
                Expression val = Expression.Call(
                    param, method, Expression.Constant(i, typeof(int)));
                if (finalType != typeof(string))
                {
                    ConstantExpression converter;
                    if (!converters.TryGetValue(finalType, out converter))
                    {
                        converter = Expression.Constant(TypeDescriptor.GetConverter(finalType));
                        converters.Add(finalType, converter);
                    }
                    val = Expression.Convert(Expression.Call(converter, "ConvertFromInvariantString", null, val),
                        finalType);
                }
                bindings.Add(Expression.Bind(member, val));
            }

            Expression body = Expression.MemberInit(
                Expression.New(type), bindings);

            Func<CsvReader, T> func = Expression.Lambda<Func<CsvReader, T>>(body, param).Compile();
            while (reader.ReadNextRecord())
            {
                yield return func(reader);
            }
        }
    }
    const char delimiter = '\t';
    static void InventData(string path)
    {
        Random rand = new Random(123456);
        using (TextWriter dest = File.CreateText(path))
        {
            dest.WriteLine("Id" + delimiter + "DateOfBirth" + delimiter + "Name");
            for (int i = 0; i < 10000; i++)
            {
                dest.Write(rand.Next(5000000));
                dest.Write(delimiter);
                dest.Write(new DateTime(
                    rand.Next(1960, 2010),
                    rand.Next(1, 13),
                    rand.Next(1, 28)).ToString(CultureInfo.InvariantCulture));
                dest.Write(delimiter);
                dest.Write("Fred");
                dest.WriteLine();
            }
            dest.Close();
        }
    }
}


You should make a DynamicMethod or an expression tree and build statically typed code at runtime.

This will incur a rather large setup cost, but no per-object overhead at all.
However, it's somewhat difficult to do, and will result in complicated code that is difficult to debug.

0

精彩评论

暂无评论...
验证码 换一张
取 消