So say I have some html with an image tag like this:
<p> (1) some image is below:
<img src="/somwhere/filename_(1).jpg">
</p>
I want a regex that will just get rid of the parenthesis in the filename so my html will look like this:
<p> (1) some image is below:
<img src="/somwhere/filename_1.jpg">
</p>
Does anyone know how to do this? My programming language is C#, if that makes a difference...
I will be eternally grateful and send some very nice k开发者_运维百科arma your way. :)
I suspect your job would be much easier if you used the HTML Agility that can help you to do this instead of regex's judging from the answers, it will make parsing the HTML a lot easier for you to achieve what you are trying to do.
Hope this helps, Best regards, Tom.
This (rather dense) regex should do it:
string s = Regex.Replace(input, @"(<img\s+[^>]*src=""[^""]*)\((\d+)\)([^""]*""[^>]*>)", "$1$2$3");
Nick's solution is fine if the file names always match that format, but this one matches any parenthesis, anywhere in the attribute:
s = Regex.Replace(@"(?i)(?<=<img\s+[^>]*\bsrc\s*=\s*""[^""]*)[()]", "");
The lookbehind ensures that the match occurs inside the src
attribute of an img
tag. It assumes the attribute is enclosed in double-quotes (quotation marks); if you need to allow for single-quotes (apostrophes) or no quotes at all, the regex gets much more complicated. I'll post that if you need it.
In this simple case, you could just use string.Replace
, for example:
string imgFilename = "/somewhere/image_(1).jpg";
imgFilename = imgFilename.Replace("(", "").Replace(")", "");
Or do you need a regex for replacing the complete tag inside a HTML string?
Regex.Replace(some_input, @"(?<=<\s*img\s*src\s*=\s*""[^""]*?)(?:\(|\))(?=[^""]*?""\s*\/?\s*?>)", "");
Finds (
or )
preceded by <img src ="
and, optionally, text (with any whitespace combination, though I didn't include newline), and followed by optional text and ">
or "/>
, again with any whitespace combination, and replaces them with nothingness.
精彩评论