Hi i want to retrieve certain information from a website.
This is what is display on the website with html tags.
<a href="ProductDisplay?catalogId=10051&storeId=90001&productId=258033&langId=-1" id="WC_CatalogSearchResultDisplay_Link_6_3" class="s_result_name">
SA开发者_开发知识库LT - Fine
</a>
What i want to extract is "SALT - FINE" using preg match however i do not know why i cant use it. isit because they are all on different line? cos i realise if they are on a single line i can actually retrieve what i want.
This is my code -
$pattern = '/id="WC_CatalogSearchResultDisplay_Link_6_3.*<\/a>/';
preg_match_all($pattern, $response, $match);
print_r($match);
I do not get anything in my array. if they are on a single line it works?.why is that so?
Have a look at:
http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
especially the m
and s
modifiers.
Also, I would recommend, changing the pattern to something like:
$pattern = '/id="WC_CatalogSearchResultDisplay_Link_6_3"[^>]*>(.*)<\/a>/ims';
Otherwise, you'll match the end of your a-tag
.
And on a side note, don't use regex to parse html/xml.
Something like this:
<?php
$dom = DOMDocument::loadHtml($response);
$xpath = new DOMXPath($dom);
$node = $xpath->query('//*[@id="WC_CatalogSearchResultDisplay_Link_6_3"]/text()')->item(0);
if ($node instanceof DOMText) {
echo trim($node->nodeValue);
}
will also work, and will be a lot more robust.
You should encapsulate what you want to match by ()
. So i guess your pattern would then become
$pattern = '/id="WC_CatalogSearchResultDisplay_Link_6_3(.*)<\/a>/';
I however don't fully see how you arrived at this pattern, since it would be simpler to just match everything enclosed by a-tags.
Edit:
You also need the s modifier as mentioned by Yoshi so the .
matches a newline. I would thus suggest you use this code:
$pattern = '/<a[^>]*>(.+)<\/a>/si';
preg_match_all($pattern, $response, $match);
print_r($match);
You're right, it's because it's a multi-line input string.
You need to add the m
and s
modifiers to the regex pattern to match multiline strings:
$pattern = '/id="WC_CatalogSearchResultDisplay_Link_6_3.*<\/a>/ms';
The m
modifier makes it multi-line.
The s
modifier makes the .
dot match newline characters as well as all others (by default it doesn't match newlines)
精彩评论