In an C#-4.0 application, I have a Dictionary of strongly typed ILists having the same length - a dynamically strongly typed column based table. I want the user to provide one or more (python-)expressions based on the available columns that will be aggregated over all rows. In a static context it would be:
IDictionary<string, IList> table;
// ...
IList<int> a = table["a"] as IList<int>;
IList<int> b = table["b"] as IList<int>;
double sum = 0;
for (int i = 0; i < n; i++)
sum += (double)a[i] / b[i]; // Expression to sum up
For n = 10^7 this runs in 0.270 sec on my laptop (win7 x64). Replacing the expression by a delegate with two int arguments it takes 0.580 sec, for a nontyped delegate 1.19 sec. Creating the delegate from IronPython with
IDictionary<string, IList> table;
// ...
var options = new Dictionary<string, object>();
options["DivisionOptions"] = PythonDivisionOptions.New;
var engine = Python.CreateEngine(options);
string expr = "a / b";
Func<int, int, double> f = engine.Execute("lambda a, b : " + expr);
IList<int> a = table["a"] as IList<int>;
IList<int> b = table["b"] as IList<int>;
double sum = 0;
for (int i = 0; i < n; i++)
sum += f(a[i], b[i]);
it takes 3.2 sec (and 5.1 sec with Func<object, object, object>
) - factor 4 to 5.5. Is this the expected overhead for what I'm doing? What could be improved?
If I have many columns, the approach chosen above will not be sufficient any more. One solution could be to determine the required columns for each expression and use only those as arguments. The other solution I've unsuccessfully tried was using a ScriptScope and dynamically resolve the columns. For that I defined a RowIterator that has a RowIndex for the active row and a property for each column.
class RowIterator
{
IList<int> la;
IList<int> lb;
public RowIterator(IList<int> a, IList<int> b)
{
this.la = a;
this.lb = b;
}
public int Ro开发者_StackOverflow社区wIndex { get; set; }
public int a { get { return la[RowIndex]; } }
public int b { get { return lb[RowIndex]; } }
}
A ScriptScope can be created from a IDynamicMetaObjectProvider, which I expected to be implemented by C#'s dynamic - but at runtime engine.CreateScope(IDictionary) is trying to be called, which fails.
dynamic iterator = new RowIterator(a, b) as dynamic;
var scope = engine.CreateScope(iterator);
var expr = engine.CreateScriptSourceFromString("a / b").Compile();
double sum = 0;
for (int i = 0; i < n; i++)
{
iterator.Index = i;
sum += expr.Execute<double>(scope);
}
Next I tried to let RowIterator inherit from DynamicObject and made it to a running example - with terrible performance: 158 sec.
class DynamicRowIterator : DynamicObject
{
Dictionary<string, object> members = new Dictionary<string, object>();
IList<int> la;
IList<int> lb;
public DynamicRowIterator(IList<int> a, IList<int> b)
{
this.la = a;
this.lb = b;
}
public int RowIndex { get; set; }
public int a { get { return la[RowIndex]; } }
public int b { get { return lb[RowIndex]; } }
public override bool TryGetMember(GetMemberBinder binder, out object result)
{
if (binder.Name == "a") // Why does this happen?
{
result = this.a;
return true;
}
if (binder.Name == "b")
{
result = this.b;
return true;
}
if (base.TryGetMember(binder, out result))
return true;
if (members.TryGetValue(binder.Name, out result))
return true;
return false;
}
public override bool TrySetMember(SetMemberBinder binder, object value)
{
if (base.TrySetMember(binder, value))
return true;
members[binder.Name] = value;
return true;
}
}
I was surprised that TryGetMember is called with the name of the properties. From the documentation I would have expected that TryGetMember would only be called for undefined properties.
Probably for a sensible performance I would need to implement IDynamicMetaObjectProvider for my RowIterator to make use of dynamic CallSites, but couldn't find a suited example for me to start with. In my experiments I didn't know how to handle __builtins__
in BindGetMember:
class Iterator : IDynamicMetaObjectProvider
{
IList<int> la;
IList<int> lb;
public Iterator(IList<int> a, IList<int> b)
{
this.la = a;
this.lb = b;
}
public int RowIndex { get; set; }
public int a { get { return la[RowIndex]; } }
public int b { get { return lb[RowIndex]; } }
public DynamicMetaObject GetMetaObject(Expression parameter)
{
return new MetaObject(parameter, this);
}
private class MetaObject : DynamicMetaObject
{
internal MetaObject(Expression parameter, Iterator self)
: base(parameter, BindingRestrictions.Empty, self) { }
public override DynamicMetaObject BindGetMember(GetMemberBinder binder)
{
switch (binder.Name)
{
case "a":
case "b":
Type type = typeof(Iterator);
string methodName = binder.Name;
Expression[] parameters = new Expression[]
{
Expression.Constant(binder.Name)
};
return new DynamicMetaObject(
Expression.Call(
Expression.Convert(Expression, LimitType),
type.GetMethod(methodName),
parameters),
BindingRestrictions.GetTypeRestriction(Expression, LimitType));
default:
return base.BindGetMember(binder);
}
}
}
}
I'm sure my code above is suboptimal, at least it doesn't handle the IDictionary of columns yet. I would be grateful for any advices on how to improve design and/or performance.
I also compared the performance of IronPython against a C# implementation. The expression is simple, just adding the values of two arrays at a specified index. Accessing the arrays directly provides the base line and theoretical optimum. Accessing the values via a symbol dictionary has still acceptable performance.
The third test creates a delegate from a naive (and bad by intend) expression tree without any fancy stuff like call-side caching, but it's still faster than IronPython.
Scripting the expression via IronPython takes the most time. My profiler shows me that most time is spent in PythonOps.GetVariable, PythonDictionary.TryGetValue and PythonOps.TryGetBoundAttr. I think there's room for improvement.
Timings:
- Direct: 00:00:00.0052680
- via Dictionary: 00:00:00.5577922
- Compiled Delegate: 00:00:03.2733377
- Scripted: 00:00:09.0485515
Here's the code:
public static void PythonBenchmark()
{
var engine = Python.CreateEngine();
int iterations = 1000;
int count = 10000;
int[] a = Enumerable.Range(0, count).ToArray();
int[] b = Enumerable.Range(0, count).ToArray();
Dictionary<string, object> symbols = new Dictionary<string, object> { { "a", a }, { "b", b } };
Func<int, object> calculate = engine.Execute("lambda i: a[i] + b[i]", engine.CreateScope(symbols));
var sw = Stopwatch.StartNew();
int sum = 0;
for (int iteration = 0; iteration < iterations; iteration++)
{
for (int i = 0; i < count; i++)
{
sum += a[i] + b[i];
}
}
Console.WriteLine("Direct: " + sw.Elapsed);
sw.Restart();
for (int iteration = 0; iteration < iterations; iteration++)
{
for (int i = 0; i < count; i++)
{
sum += ((int[])symbols["a"])[i] + ((int[])symbols["b"])[i];
}
}
Console.WriteLine("via Dictionary: " + sw.Elapsed);
var indexExpression = Expression.Parameter(typeof(int), "index");
var indexerMethod = typeof(IList<int>).GetMethod("get_Item");
var lookupMethod = typeof(IDictionary<string, object>).GetMethod("get_Item");
Func<string, Expression> getSymbolExpression = symbol => Expression.Call(Expression.Constant(symbols), lookupMethod, Expression.Constant(symbol));
var addExpression = Expression.Add(
Expression.Call(Expression.Convert(getSymbolExpression("a"), typeof(IList<int>)), indexerMethod, indexExpression),
Expression.Call(Expression.Convert(getSymbolExpression("b"), typeof(IList<int>)), indexerMethod, indexExpression));
var compiledFunc = Expression.Lambda<Func<int, object>>(Expression.Convert(addExpression, typeof(object)), indexExpression).Compile();
sw.Restart();
for (int iteration = 0; iteration < iterations; iteration++)
{
for (int i = 0; i < count; i++)
{
sum += (int)compiledFunc(i);
}
}
Console.WriteLine("Compiled Delegate: " + sw.Elapsed);
sw.Restart();
for (int iteration = 0; iteration < iterations; iteration++)
{
for (int i = 0; i < count; i++)
{
sum += (int)calculate(i);
}
}
Console.WriteLine("Scripted: " + sw.Elapsed);
Console.WriteLine(sum); // make sure cannot be optimized away
}
Although I don't know all the specific details in your case, a slowdown of only 5x for doing anything this low level in IronPython is actually pretty good. Most entries in the Computer Languages Benchmark Game show a 10-30x slowdown.
A major part of the reason is that IronPython has to allow for the possibility that you've done something sneaky at runtime, and thus can't produce code of the same efficiency.
精彩评论