I have a Message object which wraps a message format I do not have control over. The format is a simple list of Key/Value pairs. I want to extract a list of Users from a given Message. For example given the following message...
1. 200->....
2. 300->....
3. ....
4. 405->....
5. 001->first_user_name
6. 002->first_user_phone
7. 003->first_user_fax
8. 001->second_user_name
9. 001->third_user_name
10. 002->third_user_phone
11. 003->third_user_fax
12. 004->third_user_address
13. .....
14. 001->last_user_name
15. 003->last_user_fax
I want to extract four Users with the provided properties set. The initial keys 200/300....405 represent fields I don't need and can skip to get to the User data.
Each users data is in consecutive fields but the number of fields varies depending on how much information is known about a user. The following method does what I'm looking for. It uses an enumeration of possible key types and a method to find the index of the first field with user data.
private List<User> ParseUsers( Message message )
{
List<User> users = new List<User>( );
User user = null; String val = String.Empty;
for( Int32 i = message.IndexOfFirst( Keys.Name ); i < message.Count; i++ )
{
val = message[ i ].Val;
switch( message[ i ].Key )
{
case Keys.Name:
user = new User( val );
users.开发者_运维百科Add( user );
break;
case Keys.Phone:
user.Phone = val;
break;
case Keys.Fax:
user.Fax = val;
break;
case Keys.Address:
user.Address = val;
break;
default:
break;
}
}
return users;
}
I'm wondering if its possible to replace the method with a Linq query. I'm having trouble telling Linq to select a new user and populate its fields with all matching data until you find the start of the next user entry.
Note: Relative key numbers are random (not 1,2,3,4) in the real message format.
I don't see the benefit in changing your code to a LINQ query, but it's definitely possible:
private List<User> ParseUsers(Message message)
{
return Enumerable
.Range(0, message.Count)
.Select(i => message[i])
.SkipWhile(x => x.Key != Keys.Name)
.GroupAdjacent((g, x) => x.Key != Keys.Name)
.Select(g => g.ToDictionary(x => x.Key, x => x.Val))
.Select(d => new User(d[Keys.Name])
{
Phone = d.ContainsKey(Keys.Phone) ? d[Keys.Phone] : null,
Fax = d.ContainsKey(Keys.Fax) ? d[Keys.Fax] : null,
Address = d.ContainsKey(Keys.Address) ? d[Keys.Address] : null,
})
.ToList();
}
using
static IEnumerable<IEnumerable<T>> GroupAdjacent<T>(
this IEnumerable<T> source, Func<IEnumerable<T>, T, bool> adjacent)
{
var g = new List<T>();
foreach (var x in source)
{
if (g.Count != 0 && !adjacent(g, x))
{
yield return g;
g = new List<T>();
}
g.Add(x);
}
yield return g;
}
No, and the reason being, in general, most LINQ functions, in the same way as SQL queries, deal with unordered data, i.e. they don't make assumptions about the order of the incoming data. That gives them flexibility to be parallelized, etc. Your data has intrinsic order, so doesn't fit the model of querying.
How about splitting message into a List<List<KeyValuePait<int, string>>>
where each List<KeyValuePair<int, string>>
represents a single user. You could then do something like:
// SplitToUserLists would need a sensible implementation.
List<List<KeyValuePair<int,string>>> splitMessage = message.SplitToUserLists();
IEnumerable<User> users = splitMessage.Select(ConstructUser);
With
private User ConstructUser(List<KeyValuePair<int, string>> userList)
{
return userList.Aggregate(new User(), (user, keyValuePair) => user[keyValuePair.Key] = keyValuePair.Val);
}
I don't think there is any performance benefit, but it increases readability a lot in my opinion.
A possible solution could look like this:
var data = File.ReadAllLines("data.txt")
.Select(line => line.Split(new[] {"->"}, StringSplitOptions.RemoveEmptyEntries))
.GroupByOrder(ele => ele[0]);
The real magic is happening behind GroupByOrder, which is an extension method.
public static IEnumerable<IEnumerable<T>> GroupByOrder<T, K>(this IEnumerable<T> source, Func<T, K> keySelector) where K : IComparable {
var prevKey = keySelector(source.First());
var captured = new List<T>();
foreach (var curr in source) {
if (keySelector(curr).CompareTo(prevKey) <= 0) {
yield return captured;
captured = new List<T>();
}
captured.Add(curr);
}
yield return captured;
}
(Disclaimer: idea stolen from Tomas Petricek)
Your sample data yields the following groups, which now just have to be parsed into your User object.
User:
first_user_name
first_user_phone
first_user_fax
User:
second_user_name
User:
third_user_name
third_user_phone
third_user_fax
third_user_address
User:
last_user_name
last_user_fax
精彩评论