Well, here I am back at regex and my poor understanding of it. Spent more time learning it and this is what I came up with:
/<a href=\"travis.php?theTaco=([0-9999999])\">(.*)</a>
I basically want the number in this string:
<a href="travis.php?theTaco=510973">510973</a>
开发者_运维技巧
My regex is almost good? my original was:
"/<a href=\"travis.php?theTaco(.*)\">(.*)<\/a>/";
But sometimes it returned me huge strings. So, I just want to get numbers only. I searched through other posts but there is such a large amount of unrelated material, please give an example, resource, or a link directing to a very related question.
Thank you.
Try using a HTML parser provided by the language you are using.
Reason why your first regex fails:
[0-9999999]
is not what you think. It is same as [0-9]
which matches one digit. To match a number you need [0-9]+
. Also .*
is greedy and will try to match as much as it can. You can use .*?
to make it non-greedy. Since you are trying to match a number again, use [0-9]+
again instead of .*
. Also if the two number you are capturing will be the same, you can just match the first and use a back reference \1
for 2nd one.
And there are a few regex meta-characters which you need to escape like .
, ?
.
Try:
<a href=\"travis\.php\?theTaco=([0-9]+)\">\1<\/a>
To capture a number, you don't use a range like [0-99999], you capture by digit. Something like [0-9]+ is more like what you want for that section. Also, escaping is important like codaddict said.
Others have already mentioned some issues regarding your regex, so I won't bother repeating them.
There are also issues regarding how you specified what it is you want. You can simply match via
/theTaco=(\d+)/
and take the first capturing group. You have not given us enough information to know whether this suits your needs.
精彩评论