I have some data in this form:
@"Managers Alice, Bob, Charlie
Supervisors Don, Edward, Francis"
I need a flat output like this:
@"Managers Alice
Managers Bob
Managers Charlie
Supervisors Don
Supervisors Edward
Supervisors Francis"
The actual "job title" above could be any single开发者_如何学运维 word, there's no discrete list to work from.
Replacing the ,
with \r\n
is easy enough, as is the first replacement:
Replace (^|\r\n)(\S+\s)([^,\r\n]*),\s
With $1$2$3\r\n$2
But capturing the other names and applying the same prefix is what is eluding me today. Any suggestions?
I'm looking for a series of one or more RegEx.Replace()
calls only, without any LINQ or procedural code in C#, which would of course be trivial. The implementation is not directly in C# code, I'm configuring a generic parsing tool that uses a series of .NET regular expressions to transform incoming data from a variety of sources for several uses.
Here's a pure-Replace solution:
string s = @"Managers Alice, Bob, Charlie
Supervisors Don, Edward, Francis";
Regex r = new Regex(@"(?:^\w+)?( \w+)(?<=^(\w+)\b.*)[,\r\n]*",
RegexOptions.Multiline);
string s1 = r.Replace(s0, "$2$1\r\n");
After each name is matched, the lookbehind goes back to the beginning of the current line to capture the title. The (?:^\w+)?
and [,\r\n]*
are only there to consume the parts of the string you don't want to keep.
Why use a regex if you can do it with LINQ?
string s = "Managers Alice, Bob, Charlie\r\nSupervisors Don, Edward, Francis";
var result =
from line in s.Split(new string[] { "\r\n" }, StringSplitOptions.None)
let parts = line.Split(new char[] { ' ' }, 2)
let title = parts[0]
let names = parts[1]
from name in names.Split(new char[] { ',' })
select title.Trim() + " " + name.Trim();
string.Join("\r\n", result)
is
Managers Alice Managers Bob Managers Charlie Supervisors Don Supervisors Edward Supervisors Francis
Since you stressed the need for regex here's a solution that should work for you.
string input = @"Managers Alice, Bob, Charlie
Supervisors Don, Edward, Francis";
string pattern = @"(?<Title>\w+)\s+(?:(?<Names>\w+)(?:,\s+)?)+";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine("Title: {0}", m.Groups["Title"].Value);
foreach (Capture c in m.Groups["Names"].Captures)
{
Console.WriteLine(c.Value);
}
Console.WriteLine();
}
The main concept is to use the named "Title" group to store the job titles and reference them later. The names are stored in the capture collection. The pattern will only work if the data is properly formatted of course, as given in your sample data.
The pattern breakdown is as follows: (?<Title>\w+)\s+(?:(?<Names>\w+)(?:,\s+)?)+
(?<Title>\w+)\s+
- matches the title before the first space and places it in a namedTitle
group. At least one space must follow.- (?:(?\w+)(?:,\s+)?)+ - the name is stored in a
Names
group via the(?<Names>\w+)
part, and a comma and at least one space is matched (but not captured since(?:...)
is used) via the(?:,\s+)?
part and it is optional since a?
is placed after it. Finally the entire portion of the pattern is enclosed in a group that has to be matched at least once(?:...)+
but is not captured since we only capture the parts we are interested in.
You could search for
^(\w+)[ \t]+(\w+),[ \t]+(.+)$
and replace all with
\1 \2\r\n\1 \3
You need to apply it twice to your example, three times if the list of managers grows to four, etc.
So, in C#:
resultString = Regex.Replace(subjectString, @"^(\w+)[ \t]+(\w+),[ \t]+(.+)$", @"$1 $2\r\n$1 $3", RegexOptions.Multiline);
Explanation:
^
: Match the start of the line
(\w+)[ \t]+
: Match any number of alnum characters, capture the match; match following whitespace
(\w+)
: Match the next "word", then
,[ \t]+(.+)$
match a comma, spaces and then whatever follows until the end of the line. This will only match if the line still contains content that needs to be split up.
精彩评论