I have the following robots.txt as an example -
User-agent: googlebot User-agent: slurp User-agent: msnbot User-agent: teoma User-agent: W3C-checklink User-agent: WDG_SiteValidator Disallow: / Disallow: /js/ Disallow: /Web_References/ Disallow: /webresource.axd Disallow: /scriptresource.axd User-agent: Mediapartners-Google* Disallow: User-agent: * Disallow: /webresource.axd Disallow: /scriptresource.axd Disallow: /js/ Disallow: /Web_References/
I may be asking too much of regex but I'm wanting to write an expression which will return matches in the following grouped and ordered fashion -
Matches - [0] - [UserAgents] - "googlebot" - "slurp" - "msnbot" - "teoma" - "W3C-checklink" - "WDG_SiteValidator" - [Routes] - [0] - [Permission] "Allow" - [Url] "/" - [1] - [Permissi开发者_运维百科on] "Disallow" - [Url] "/js/" - [2] - [Permission] "Disallow" - [Url] "/Web_References/" ... etc ...
I've written individual expressions to match elements of the document, however I can't get them to work when pieced together. Maybe someone can point out where I'm going wrong?
Patterns
User agents: (?:user-agent:\s*)(?<UserAgent>[a-z_0-9-*]*)
Permissions: (?<Permission>(?:allow|disallow))(?:\s*:\s*)(?<Url>[/0-9_a-z.]*)
My attempt
((?<UserAgents>(?:user-agent:\s*)(?<UserAgent>[a-z_0-9-*]*))+(?<Routes>(?<Permission>(?:allow|disallow))(?:\s*:\s*)(?<Url>[/0-9_a-z.]*))+)+
FYI, I'm using Expresso to debug these scripts and have the following checked - Multiline, Compiled and Ignore Case
Try this:
(?:^User-agent: (?<UserAgent>.*?)$)|(?<Permission>^(?:Allow)|(?:Disallow)): (?<Url>.*?)$
I'm not sure about that format you want, but the above regex matches and names the parts you are interested in. Maybe you can build on top of that regex. I hardly do C#, but maybe this might work:
try {
Regex regexObj = new Regex("(?:^User-agent: (?<UserAgent>.*?)$)|(?<Permission>^(?:Allow)|(?:Disallow)): (?<Url>.*?)$", RegexOptions.IgnoreCase | RegexOptions.Multiline);
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
for (int i = 1; i < matchResults.Groups.Count; i++) {
Group groupObj = matchResults.Groups[i];
if (groupObj.Success) {
// matched text: groupObj.Value
// match start: groupObj.Index
// match length: groupObj.Length
}
}
matchResults = matchResults.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
精彩评论