开发者

Extracting specific <a href> URLs out of the document

开发者 https://www.devze.com 2023-01-08 13:25 出处:网络
I think this should be elementary, but I still can\'t get my head around it. Let\'s say there\'s fair amount of HTML documents and I need to catch every image URLs 开发者_运维技巧out of them.

I think this should be elementary, but I still can't get my head around it. Let's say there's fair amount of HTML documents and I need to catch every image URLs 开发者_运维技巧out of them.

The rest of the content changes, but the base of the url is always the same for example: http://images.examplesite.com/images/,

So I want to extract every string that contains that part. the problem is that they're always mixed with <a href=''> or <img src=''> tags, so how could I drop them out? preg_match probably?


Try something like: preg_match_all('/http:\/\/images\.examplesite\.com\/images\/(.*?)"/i', $html_data, $results, PREG_SET_ORDER)


You can either use html dom parser

or use regular expression.

  preg_match_all("/http:\/\/images.examplesite.com\/images\/(.*?)\"/s", $str, $preg);
  print_r($preg);
0

精彩评论

暂无评论...
验证码 换一张
取 消