开发者

How to use preg match all in php?

开发者 https://www.devze.com 2023-03-07 13:20 出处:网络
Hi i want to retrieve certain information from a website. This is what is display on the website with html tags.

Hi i want to retrieve certain information from a website.

This is what is display on the website with html tags.

    <a href="ProductDisplay?catalogId=10051&amp;storeId=90001&amp;productId=258033&amp;langId=-1" id="WC_CatalogSearchResultDisplay_Link_6_3" class="s_result_name">

                                                                SA开发者_开发知识库LT - Fine
</a>

What i want to extract is "SALT - FINE" using preg match however i do not know why i cant use it. isit because they are all on different line? cos i realise if they are on a single line i can actually retrieve what i want.

This is my code -

$pattern = '/id="WC_CatalogSearchResultDisplay_Link_6_3.*<\/a>/';
preg_match_all($pattern, $response, $match);
print_r($match);

I do not get anything in my array. if they are on a single line it works?.why is that so?


Have a look at:

http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

especially the m and s modifiers.

Also, I would recommend, changing the pattern to something like:

$pattern = '/id="WC_CatalogSearchResultDisplay_Link_6_3"[^>]*>(.*)<\/a>/ims';

Otherwise, you'll match the end of your a-tag.

And on a side note, don't use regex to parse html/xml.

Something like this:

<?php
$dom = DOMDocument::loadHtml($response);
$xpath = new DOMXPath($dom);

$node = $xpath->query('//*[@id="WC_CatalogSearchResultDisplay_Link_6_3"]/text()')->item(0);
if ($node instanceof DOMText) {
    echo trim($node->nodeValue);
}

will also work, and will be a lot more robust.


You should encapsulate what you want to match by (). So i guess your pattern would then become

$pattern = '/id="WC_CatalogSearchResultDisplay_Link_6_3(.*)<\/a>/';

I however don't fully see how you arrived at this pattern, since it would be simpler to just match everything enclosed by a-tags.

Edit: You also need the s modifier as mentioned by Yoshi so the . matches a newline. I would thus suggest you use this code:

$pattern = '/<a[^>]*>(.+)<\/a>/si';
preg_match_all($pattern, $response, $match);
print_r($match);


You're right, it's because it's a multi-line input string.

You need to add the m and s modifiers to the regex pattern to match multiline strings:

$pattern = '/id="WC_CatalogSearchResultDisplay_Link_6_3.*<\/a>/ms';

The m modifier makes it multi-line.

The s modifier makes the . dot match newline characters as well as all others (by default it doesn't match newlines)

0

精彩评论

暂无评论...
验证码 换一张
取 消