It's easy when you understand...unfortunately, I don't! I will deeply appreciate you if you can guide me to the answer, thanks.
I want to capture a string, using just regex, but remove any text that's within brackets. e.g.
This is a typical string...
<td class="rc_entry_alt" >Mark Anthony (IRE)</td>
I can capture "Mark Anthony (IRE)" very easily. I'm currently using...
/<td class="rc_entry(_alt)?" >.*<\/td>/
What i'd like is to remove the " (IRE)". Note the preceding sp开发者_运维技巧ace prior to the first bracket. I want to remove this too. Also, the text between the (
and )
will vary, e.g. USA, ITY, FR, etc. It should look like this...
Mark Anthony
I've no doubt it's very simple, and yet it eludes me. Thanks for your time :)
n.b. The stuff in brackets isn't always there. Sometimes I get what I want with the original code I mentioned.
Your Regexp would look something like that. The acutal Syntax depends on your programming language / tool.
First you need to match the <td ..>
part. Then you capute everything upto (
. then to be sure match everything in brackets followed by </td>
.
/<td[^>].*>\([^(]*\)(.*)</td>/
You should read the Book: Mastering Regular Expressions by Jeffrey Friedl.
Okay, so remove the HTML first, then do something like this to remove the (...) part:
\s+\(.*?\)
If you know the (...) part is the very last thing in the string (i.e. there's nothing after it), you can use this to check that it's at the end, too:
\s+\(.*?\)$
Just use a Regex find and replace function, find the expression above, and replace with nothing.
精彩评论