开发者

Nested LINQ Query Question

开发者 https://www.devze.com 2023-02-14 18:06 出处:网络
I ran into an issue today and I have been stumped for some time in trying to get the results I am searching for.

I ran into an issue today and I have been stumped for some time in trying to get the results I am searching for.

I currently have a class that resembles the following:

public class InstanceInformation
{
     public string PatientID {get; set;}
     public string StudyID {get; set;}
     public string SeriesID {get; set;}
     public string InstanceID {get; set;}
}

I have a List<InstanceInformation> and I am trying to use LINQ (or whatever other means to generate paths (for a file-directory) based on this list that resemble the following:

PatientID/StudyID/SeriesID/InstanceID

My issue is the data is currently unstructured as it comes in the previously mentioned form (List) and I need a way to group all of the data with the following constraints:

  • Group InstanceIDs by SeriesID
  • Group SeriesIDs by StudyID
  • Group StudyIDs by PatientID

I currently have something that resembles this:

var groups = from instance in instances
             group instance by instance.PatientID into patientGroups
             from studyGroups in
                 (from instance in patientGroups
                   group instance by instance.StudyID)
                   from seriesGroup in
                       (from instance in studyGroups
                        group instance by instance.SeriesID)
                            from instanceGroup in
                                 (from instance in seriesGroup
                                  group instance by instance.InstanceID)
             group instanceGroup by patientGroups.Key;

which just groups all of my InstanceIDs by PatientID, and it's quite hard to cull through all of the data after this massive grouping to see if the areas in between (StudyID/SeriesID) are being lost. Any other methods of solving this issue would be more than welcome.

This is primarily just for grouping the objects - as I would need 开发者_JAVA技巧to then iterate through them (using a foreach)


I have no idea if the query you've come up with is the query you actually want or need, but assuming that it is, let's consider the question of whether there is a better way to write it.

The place you want to look is section 7.16.2.1 of the C# 4 specification, a portion of which I quote here for your convenience:


A query expression with a continuation

from ... into x ...

is translated into

from x in ( from ... ) ...

Is that clear? Let's take a look at a fragment of your query that I've marked with stars:

var groups = from instance in instances
             group instance by instance.PatientID into patientGroups
             from studyGroups in
                 **** (from instance in patientGroups
                   group instance by instance.StudyID) ****
                   from seriesGroup in
                       (from instance in studyGroups
                        group instance by instance.SeriesID)
                            from instanceGroup in
                                 (from instance in seriesGroup
                                  group instance by instance.InstanceID)
             group instanceGroup by patientGroups.Key;

Here we have

from studyGroups in ( from ... ) ...

the spec says that this is equivalent to

from ... into studyGroups ...

so we can rewrite your query as

var groups = from instance in instances
             group instance by instance.PatientID into patientGroups
             from instance in patientGroups
             group instance by instance.StudyID into studyGroups
             from seriesGroup in
             **** (from instance in studyGroups
                  group instance by instance.SeriesID) ****
                      from instanceGroup in
                           (from instance in seriesGroup
                            group instance by instance.InstanceID)
             group instanceGroup by patientGroups.Key;

Do it again. Now we have

from seriesGroup in (from ... ) ...

and the spec says that this is the same as

from ... into seriesGroup ...

so rewrite it like that:

var groups = from instance in instances 
             group instance by instance.PatientID into patientGroups
             from instance in patientGroups 
             group instance by instance.StudyID into studyGroups
             from instance in studyGroups
             group instance by instance.SeriesID into seriesGroup
             from instanceGroup in
              ****     (from instance in seriesGroup
                   group instance by instance.InstanceID) ****
             group instanceGroup by patientGroups.Key;

And again!

var groups = from instance in instances 
             group instance by instance.PatientID into patientGroups
             from instance in patientGroups 
             group instance by instance.StudyID into studyGroups
             from instance in studyGroups
             group instance by instance.SeriesID into seriesGroup
             from instance in seriesGroup
             group instance by instance.InstanceID into instanceGroup
             group instanceGroup by patientGroups.Key;

Which I hope you agree is a whole lot easier to read. I would improve its readability more by changing the fact that "instance" is used half a dozen times to mean different things:

var groups = from instance in instances 
             group instance by instance.PatientID into patientGroups
             from patientGroup in patientGroups 
             group patientGroup by instance.StudyID into studyGroups
             from studyGroup in studyGroups
             group studyGroup by studyGroup.SeriesID into seriesGroups
             from seriesGroup in seriesGroups
             group seriesGroup by seriesGroup.InstanceID into instanceGroup
             group instanceGroup by patientGroups.Key;

Whether this is actually the query you need to solve your problem, I don't know, but at least this one you can reason about without turning yourself inside out trying to follow all the nesting.

This technique is called "query continuation". Basically the idea is that the continuation introduces a new range variable over the query so far.


I think this will yield what you're looking for:

public class InstanceInformation {
    public string PatientID { get; set; }
    public string StudyID { get; set; }
    public string SeriesID { get; set; }
    public string InstanceID { get; set; }

    public override string ToString() {
        return String.Format("Series = {0} Study = {1} Patient = {2}", SeriesID, StudyID, PatientID);
    }
}

class Program {
    static void Main(string[] args) {
        List<InstanceInformation> infos = new List<InstanceInformation>() {
            new InstanceInformation(){ SeriesID = "A", StudyID = "A1", PatientID = "P1" },
            new InstanceInformation(){ SeriesID = "A", StudyID = "A1", PatientID = "P1" },
            new InstanceInformation(){ SeriesID = "A", StudyID = "A1", PatientID = "P2" },
            new InstanceInformation(){ SeriesID = "A", StudyID = "A2", PatientID = "P1" },
            new InstanceInformation(){ SeriesID = "B", StudyID = "B1", PatientID = "P1"},
            new InstanceInformation(){ SeriesID = "B", StudyID = "B1", PatientID = "P1"},
        };

        IEnumerable<IGrouping<string, InstanceInformation>> bySeries = infos.GroupBy(g => g.SeriesID);
        IEnumerable<IGrouping<string, InstanceInformation>> byStudy = bySeries.SelectMany(g => g.GroupBy(g_inner => g_inner.StudyID));
        IEnumerable<IGrouping<string, InstanceInformation>> byPatient = byStudy.SelectMany(g => g.GroupBy(g_inner => g_inner.PatientID));

        foreach (IGrouping<string, InstanceInformation> group in byPatient) {
            Console.WriteLine(group.Key);
            foreach(InstanceInformation II in group)
                Console.WriteLine("  " + II.ToString());
        }
}


In you class override the tostring method; like below.

    public class InstanceInformation
    {
        public string PatientID { get; set; } public string StudyID { get; set; } public string SeriesID { get; set; } public string InstanceID { get; set; }
        public override string ToString()
        {
            var r = string.Format("{0}/{1}/{2}/{3}", PatientID, StudyID, SeriesID, InstanceID);
            return r;
        }
    } 

var listofstring = list.ConvertAll<string>(x => x.ToString()).ToList();
var listofstringdistinct = listofstring.Distinct().ToList();

This is easier to read and understand.


Don't know exacly what you need, but this (very long code) will return a dictionary (of dictionaries...) grouped as you said (i.e. PatientID/StudyID/SeriesID/InstanceID):

var byPatient = new Dictionary<string, Dictionary<string, Dictionary<string, Dictionary<string, InstanceInformation>>>>();
foreach (var patientGroup in instances.GroupBy(x => x.PatientID))
{
    var byStudy = new Dictionary<string, Dictionary<string, Dictionary<string, InstanceInformation>>>();
    byPatient.Add(patientGroup.Key, byStudy);
    foreach (var studyGroup in patientGroup.GroupBy(x => x.StudyID))
    {
        var bySeries = new Dictionary<string, Dictionary<string, InstanceInformation>>();
        byStudy.Add(studyGroup.Key, bySeries);
        foreach (var seriesIdGroup in studyGroup.GroupBy(x => x.SeriesID))
        {
            var byInstance = new Dictionary<string, InstanceInformation>();
            bySeries.Add(seriesIdGroup.Key, byInstance);
            foreach (var inst in seriesIdGroup)
            {
                byInstance.Add(inst.InstanceID, inst);
            }
        }
    }
}

P.S.
I've considered InstanceID as unique among all instances.

Otherwise, the last dictionary level should be: Dictionary<string, List<InstanceInformation>>

EDIT:

Reading your last comment, I think you don't need a real GroupBy, but rather an OrderBy().ThenBy()...

foreach (var el in instances.OrderBy(x => x.PatientID)
                            .ThenBy(x => x.StudyID)
                            .ThenBy(x => x.SeriesID)
                            .ThenBy(x => x.InstanceID))
{
    // it yields:
    // Pat1 Std1 Srs1 Inst1
    // Pat1 Std1 Srs1 Inst2
    // Pat1 Std1 Srs2 Inst1
    // Pat1 Std2 Srs2 Inst2
    // ...
}


The following Linq statement in query syntax should solve your problem.

 var groups = from instance in instances
                        group instance by instance.PatientGuid into patientGroups
                        select new
                        {
                            patientGroups.Key,
                            StudyGroups = from instance in patientGroups
                                          group instance by instance.StudyGuid into studyGroups
                                          select new 
                                          { 
                                          studyGroups.Key,
                                          SeriesGroups = from c in studyGroups
                                                         group c by c.SeriesGuid into seriesGroups
                                                         select seriesGroups
                                          }

                        };

You can then iterate your groups with the following set of nested foreach loops on the groups. This will allow you to create your directory tree efficiently and do any other operations at each level.

foreach (var patientGroups in groups)
             {
                 Console.WriteLine("Patient Level = {0}", patientGroups.Key);
                 foreach (var studyGroups in patientGroups.StudyGroups)
                 {
                     Console.WriteLine("Study Level = {0}", studyGroups.Key);
                     foreach (var seriesGroups in studyGroups.SeriesGroups)
                     {
                         Console.WriteLine("Series Level = {0}", seriesGroups.Key);
                         foreach (var instance in seriesGroups)
                         {
                             Console.WriteLine("Instance Level = {0}", instance.InstanceGuid);
                         }
                     }
                 }

             }

This is a proof of concept, but initial testing shows that it works properly. Any comments would be appreciated.


Eric Lippert perfectly explained how you can avoid the horrible nesting and write just a single flat query using "query continuation" (the into keyword).

I think you can do one more step and write it directly using the GroupBy method. Sometimes, using the LINQ methods directly gives you clearer code and I think this is one such example:

var groups = instances.
    GroupBy(instance => instance.PatientID).
    GroupBy(patientGroup => patientGroup.StudyID).
    GroupBy(studyGroup => studyGroup.SeriesID).
    GroupBy(seriesGroup => seriesGroup.InstanceID).
    GroupBy(instanceGroup => patientGroups.Key);

(I don't really know if this is what you're looking for - I just did a "syntactic transformation" of what Eric wrote - and I believe I didn't change the meaning of Eric's query)

EDIT There may be some trickery with the last group by, because it is not completely regular.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号