So I have string (this is a small snippet) of data which looked similar to:
L L L LIL5LSOLSLQLL AL
BL
B6LLALBLYLL69L6KL6L6L7LLLLLLHZLMhuLPLHILHLHLILILQZLSoLULWLXL4L4L>LZLL
L
LoLzLVLVLLLLLLLDLeLkLLaLLLLLLL5M 5string1:string2:(RANDOM):string3:(RANDOM)R<baseversion><version>0000000297000000025309458093771<version><baseversion> BLYLL69L6KL6L6L7LLLL
I wish to ext开发者_开发技巧ract all strings which conform to the pattern:
string1:string2:[A-Za-z0-9]:string3:[A-Za-z0-9]
NOTE: There are many throughout the text; but only one can be found per line break, although not on EVERY line.
Any guidance would be greatly appriciated :)
Sounds to me like you want:
/string1:string2:[A-Za-z0-9]+:string3:[A-Za-z0-9]+(?=R<baseversion>)/
You could use named groups instead of the lookahead here, but this should get the job done. Also, not sure if you need those +
signs, since your sample regex didn't have them. I'm kind of guessing what the (RANDOM) bits look like.
Note that the whole point here is to capture everything from string1
up to but not including the R<baseversion>
. Looks like that's what you're asking for.
I'm not entirely sure what you want to extract, but the following (untested) would extract the entire match as well as the baseversion only.
$handle = fopen('/path/to/file.txt', "r");
while (!feof($handle)) {
$line .= fread($handle, 8192);
if (preg_match('/string1:string2:.+?:string3:.+?R<baseversion><version>(.+?)<version><baseversion>/', $line, $matches)) {
print 'Match: '.$matches[0]."\n";
print 'Version: '.$matches[1]."\n";
}
}
fclose($handle);
The pattern .*?
is the interesting part. While .*
matches as many as possible, .*?
will only match as few as possible. Say the string is "xaaay", then the pattern /xa+/ matches "xaaa" while /xa+?/ matches "xa" only. (The technical term for this ?
is "greediness". Check it out in the docs, ppl often use stuff like lookahead because they are not aware of greediness.)
I've written a cheatsheet which might come in handy:
http://www.bitcetera.com/en/techblog/2008/04/01/regex-in-a-nutshell/
As a side note: [A-Za-z0-9]
does not match random characters, it wouldn't match "%" for instance.
精彩评论