i have a string:
Recent overs</b> <tt>. . . . . . <b>|</b> 3 . . 1b 4 .<b>|</b> 1 1 1 . . 4 &开发者_开发问答lt;b>|</b> . . . 4 . .</tt></p>
It is all in a single line, so how would I extract only the information about the balls, ie
output should be . . . . . . 3 . . 1b 4 . 1 1 1 . . 4 . . . 4 . .
The closest i got was with [^(Recent overs|<b>|<tt>|</b>|</tt>|</p>)]+
, but it matches the 1 and not 1b.
First, the brackets []
are used for creating what is called a "character class" - this is meant to represent a single character. Your code effectively says don't match these characters: (Recntovrsbp|<>/
You'd be better off using a regex to remove the unwanted strings, then it's easier to parse the result, like this:
Javascript, because you didn't specify the language
var s = "Recent overs</b> <tt>. . . . . . <b>|</b> 3 . . 1b 4 .<b>|</b> 1 1 1 . . 4 <b>|</b> . . . 4 . .</tt></p>";
s = s.replace(/(Recent overs|<[^>]+>|\|)/ig, '');
jsfiddle example
The resulting 's' is much easier to parse.
Try \s[\d\.][\w]*
to match all digit (possibly followed by word) characters or points preceeded by a space!
Based solely on the example you gave, you could try something like:
/(?<>)[a-z\d\s\.]+/g
Alternative, in case your regex engine doesn't support lookbehinds:
/>([a-z\d\s\.]+)/g #Matches will be in the first capture group.
However, it's a little hard to infer the rules of what should/should not be allowed based on the small sample you gave, and your output sample doesn't make much sense to me as a data structure. It seems like you might be better off using an HTML parser for this, since using regex to process HTML is frequently a bad idea.
精彩评论