开发者

How do I extract the title value from a string using Javascript regexp?

开发者 https://www.devze.com 2023-01-25 16:16 出处:网络
I have a string variable which I would like t开发者_如何学编程o extract the title value in id=\"resultcount\" element. The output should be 2.

I have a string variable which I would like t开发者_如何学编程o extract the title value in id="resultcount" element. The output should be 2.

var str = '<table cellpadding=0 cellspacing=0 width="99%" id="addrResults"><tr></tr></table><span id="resultcount" title="2" style="display:none;">2</span><span style="font-size: 10pt">2 matching results. Please select your address to proceed, or refine your search.</span>';

I tried the following regex but it is not working:

/id=\"resultcount\" title=['\"][^'\"](+['\"][^>]*)>/


Since var str = ... is Javascript syntax, I assume you need a Javascript solution. As Peter Corlett said, you can't parse HTML using regular expressions, but if you are using jQuery you can use it to take advantage of browser own parser without effort using this:

$('#resultcount', '<div>'+str+'</div>').attr('title')

It will return undefined if resultcount is not found or it has not a title attribute.


To make sure it doesn't matter which attribute (id or title) comes first in a string, take entire html element with required id:

var tag = str.replace(/^.*(<[^<]+?id=\"resultcount\".+?\/.+?>).*$/, "$1")

Then find title from previous string:

var res = tag.replace(/^.*title=\"(\d+)\".*$/, "$1");
// res is 2

But, as people have previously mentioned it is unreliable to use RegEx for parsing html, something as trivial as different quote (single instead of double quote) or space in "wrong" place will brake it.


Please see this earlier response, entitled "You can't parse [X]HTML with regex":

RegEx match open tags except XHTML self-contained tags


Well, since no one else is jumping in on this and I'm assuming you're just looking for a value and not trying to create a parser, I'll give you what works for me with PCRE. I'm not sure how to put it into the java format for you but I think you'll be able to do that.

span id="resultcount" title="(\d+)"

The part you're looking to get is the non-passive group $1 which is the '\d+' part. It will get one or more digits between the quote marks.

0

精彩评论

暂无评论...
验证码 换一张
取 消