开发者

Android regexp HTML

开发者 https://www.devze.com 2023-04-03 22:48 出处:网络
I\'ve got a HTML code stored in string and I want to extract all parts that match the pattern, which is:

I've got a HTML code stored in string and I want to extract all parts that match the pattern, which is:

<a href="http://abc.pl/(.*开发者_运维百科?)/(.*?)"><img src="(.*?)"

(.*?) stands for any string. I've tried dozens of combinations and couldn't get it working. Can somebody show me a sample code, which extracts all matched data from a String and store it in variables?

Thanks in advance


Here is a solution using JavaScript. I hope this helps.

First, we need a working pattern:

var pattern = '<a href="http://abc.pl/([^/"]+)/([^/"]*)".*?><img src="([^"]*)"';

Now, the problem is that in JavaScript there is no native method or function that retrieves both all matches and all submatches at once, whatever the regexp we use.

We can easily retrieve an array of all the full matches:

var re = new RegExp(pattern, "g");
var matches = yourHtmlString.match(re);

But we also want the submatches, right? In my humble opinion, the simplest way to achieve this is to apply the non-greedy version of the same regexp to each match we obtained (because only non-greedy regexes can return submatches):

var reNonGreedy = new RegExp(pattern);
var matchesAndSubmatches = [];
for(var i = 0; i < matches.length; i++) {
    matchesAndSubmatches[i] = matches[i].match(reNonGreedy);
}

Each element of matchesAndSubmatches is now an array such that:

matchesAndSubmatches[n][0] is the n-th full match,
matchesAndSubmatches[n][1] is the first submatch of the n-th full match, matchesAndSubmatches[n][2] is the second submatch of the n-th full match, and so on.


Well, here's the sample:

Pattern pattern = Pattern.compile("patternGoesHere");
Matcher matcher = pattern.matcher(textGoesHere);
while (matcher.find())
{
    // You can access substring here via matcher.group(substringIndex) [note they are indexed from 1, not 0]
}
0

精彩评论

暂无评论...
验证码 换一张
取 消