开发者

extracting image tag from a html code in php

开发者 https://www.devze.com 2023-02-13 06:30 出处:网络
I\'m trying to fetch my articles and I need to make a slider out of them. Each of my articles has an image inside it\'s text, like this:

I'm trying to fetch my articles and I need to make a slider out of them.

Each of my articles has an image inside it's text, like this:

<p>
<img src="story_img.jpg" width=120 height=80>
In the last couple o开发者_运维知识库f weeks I often had to download a lot of files, submitted to a web-based teaching platform. Downloading all these files by hand is very annoying so I implemented a short Groovy script. Since Groovy has a great support for parsing well-formed XML-like information it fails if you want to parse unstructured and nasty HTML code.
</p>

Now what I need is simple, first I should parse the image and then remove it from the text .

So that I could have 2 constants

$imgOfText = ?

$TextWithOutImg = ?

I tried different ways in php and even read this topic.

But I couldn't do that.


It's HTML so you can parse it ! Use DomDocument !

$html = '<p>';
$html.= '<img src="story_img.jpg" width=120 height=80>';
$html.= 'In the last couple of weeks I often had to download a lot ';
$html.= 'of files, submitted to a web-based teaching platform. Downloading ';
$html.= 'all these files by hand is very annoying so I implemented a short ';
$html.= 'Groovy script. Since Groovy has a great support for parsing well-';
$html.= 'formed XML-like information it fails if you want to parse ';
$html.= 'unstructured and nasty HTML code.';
$html.= '</p>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$p = $doc->getElementsByTagName('p')->item(0);
$img = $doc->getElementsByTagName('img')->item(0);
$imgOfText = $img->getAttribute('src');
$TextWithOutImg = $p->nodeValue;

Demo here


How about this Live Demo I whipped up. It's just some very basic parsing using strpos(). Im sure this could be done with regular expressions, but I never was any good at that :)

CODE

<?php

    $html = '<p>';
    $html.= '    <img src="story_img.jpg" width=120 height=80>';
    $html.= '    In the last couple of weeks I often had to download a lot ';
    $html.= 'of files, submitted to a web-based teaching platform. Downloading ';
    $html.= 'all these files by hand is very annoying so I implemented a short ';
    $html.= 'Groovy script. Since Groovy has a great support for parsing well-';
    $html.= 'formed XML-like information it fails if you want to parse ';
    $html.= 'unstructured and nasty HTML code.';
    $html.= '</p>';

    $spot = strpos($html, 'src="', strpos($html, '<img'))+5;
    $spot2 =strpos($html, '"', $spot);
    $imgOfText = substr($html, $spot, $spot2-$spot);

    $spot = strpos($html, '<img');
    $spot2 = strpos($html, '>', $spot)+1;
    $TextWithOutImg = substr($html,0,$spot).substr($html,$spot2);

    echo "Image Source: ".$imgOfText."\n\n";
    echo "Text Without Image:\n".$TextWithOutImg;

?>

OUTPUT

Image Source: story_img.jpg

Text Without Image:

<p>In the last couple of weeks I often had to download a lot of files, submitted to a web-based teaching platform. Downloading all these files by hand is very annoying so I implemented a short Groovy script. Since Groovy has a great support for parsing well-formed XML-like information it fails if you want to parse unstructured and nasty HTML code.</p>


Try this topic: PHP - remove <img> tag from string


There is a number of PHP libraries that can parse HTML, even invalid one.

PHPQuery

Simple HTML DOM

Zend DOM Query

Here is a PHPQuery example that prints all img tags appear on StackOverflow home page.

<?php

$html = file_get_contents('http://stackoverflow.com');

include('phpQuery.php');

$pq = phpQuery::newDocumentHTML($html, 'utf-8');

foreach ($pq->find('img') as $img)
{
    echo pq($img)->attr('src') .'<br>';

}

?>

Another example that extracts text of all paragraphs:

foreach ($pq->find('p') as $p)
{
    echo pq($p)->text() .'<br>';

}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号