开发者

Parse nested groups

开发者 https://www.devze.com 2022-12-10 09:22 出处:网络
I have a string with nested groups like this (\'blabla\' is some text within the string that must be ignored)

I have a string with nested groups like this ('blabla' is some text within the string that must be ignored)

string Stream1 = @"group ""Main""
                           bla
                           bla
                               group ""Sub1"" -- block-group
                               var1
                               var2
                               endgroup -- block-group ""Sub1""
                               bla
                               bla
                               group ""Sub2"" -- block-group
                               var1
                               endgroup -- block-group ""Sub2""
                               bla
                               group ""Sub3"" -- block-group
                               var1
                               var2
                               var3
                                  group ""SubSub31"" -- block-group
                                  var10
                                  var20
                            开发者_高级运维      endgroup -- block-group ""SubSub31""
                               endgroup -- block-group ""Sub3""
                           endgroup";

The expected output is a list of GroupObjects like this

public class GroupObject
    {      
        public string GroupName = ""; // Example: SubSub31
        public string GroupPath = ""; // Example: Main/Sub3/SubSub31
        public List<Var> LocalVar = new List<VarBloc();//Var10,var20
    }

I guess some recursive regex will solve this but I can't figure out how to do this.

Can someone give me a hint ?

Sample code would be highly appreciated


A recursive regular expression might solve the problem - but the complexity of it may be too high to easily maintain (and I speak as someone who once implemented and sold a Regular Expression engine).

I'm not going to give you a complete solution - but here's one way to solve the problem.

Your output object needs to change to allow for the nested groups, something like this:

public class Group
{      
    public string Name { get; set; }
    public string GroupPath { get; set; }
    public IEnumerable<VarBlock> Variables { get; }
    public IEnumerable<Group> NestedGroups { get; }
}

(Note use of properties instead of public members)

Assuming your input stream is a line based format, create a function that divides the string into lines:

public Queue<string> GetLines(string definition) { ... }

Then, create a routine to parse a group:

public Group ParseGroup(Queue<string> lines) { ... }
  • When this routine encounters the start of a group, it should recursively call itself to parse the nested group and then add the result to NestedGroups.
  • When this routine encounters the end of a group, it should finish assembling the block, and return the object.

Hope this is helpful.


I recommend ANTLR (http://www.antlr.org/) which has been developed for parsing a wide range of semi-structured documents. There's a book (The Definitive ANTLR Reference) which will get you off the ground. It's capable of providing complete parsers for languages such as Java and C#. You can include (Java) code in the parser which will allow you to process the results into the data structures you require.

0

精彩评论

暂无评论...
验证码 换一张
取 消