I am trying to replace two or more occurences of <br/>
(like <br/><br/><br/>
) tags together with two <br/><br/>
with the following pattern
Pattern brTagPattern = Pattern.compile("(<\\s*br\\s*/\\s*>\\s*){2,}",
Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
But there are some case开发者_开发知识库s where '<br/> <br/>
' tags come with a space and they get replaced with 4 <br/>
tags which was actually supposed to be replaced with just 2 tags.
What can i do to ignore 2 or 3(few) spaces that come in between the tags ?
Probably not the answer you want to hear, but it is general wisdom that you should not attempt to parse XML/HTML with regular expressions. So many things can go wrong -- it's a much better idea to use a parsing library specifically meant for such data, which will also completely bypass the issue you're having.
Take a look at JAXB if you are certain your HTML is well-formed XML, or if the HTML is likely to be messy and incompliant (like most real-world HTML) you should try something like TagSoup.
Here's some Groovy code to test your Pattern:
import java.util.regex.*
Pattern brTagPattern = Pattern.compile( "(<\\s*br\\s*/\\s*>\\s*){2,}", Pattern.CASE_INSENSITIVE | Pattern.DOTALL )
def testData = [
['', ''],
['<br/>', '<br/>'],
['< br/> <br />', '<br/><br/>'],
['<br/> <br/><br/>', '<br/><br/>'],
['<br/> < br/ > <br/>', '<br/><br/>'],
['<br/> <br/> <br/>', '<br/><br/>'],
['<br/><br/><br/> <br/><br/>', '<br/><br/>'],
['<br/><br/><br/><b>w</b><br/>','<br/><br/><b>w</b><br/>'],
]
testData.each { inputStr, expected ->
Matcher matcher = brTagPattern.matcher( inputStr )
assert expected == matcher.replaceAll( '<br/><br/>' )
}
And everything seems to pass fine...
You can do that changing a little your regex:
Pattern brTagPattern = Pattern.compile("<\\s*br\\s*/\\s*>\\s*<\\s*br\\s*/\\s*>", Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
This will ignore every spaces between two
. If you just want exactly 2 or three, you can use:
Pattern brTagPattern = Pattern.compile("<\\s*br\\s*/\\s*>(\\s){2,3}<\\s*br\\s*/\\s*>", Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
精彩评论