$content = '<!--<sup><span style="font-weight:bold;color:black;">0</span></sup><br/>-->
<div class="popular-video-image">
<a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>">
<img src="/images/topvideo/1.jpg" alt=""/>
</a>
<span class="popular-video-artist ellipsis"><a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Far East Movement</a></span>
<span class="popular-video-title ellipsis"><a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Like a G6</a></span>
</div>';
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->loadHTML($content);
foreach ($dom->getElementsByTagName('a') as $node)
{
$node->setAttribute('href', 'http://mysite.ru/' . $node->getAttribute('href'));
}
$dom->formatOutput = true;
echo $dom->saveXml($dom->documentElement);
Output:
<html>
<body>
<div class="popular-video-image">
<a href="http://mysite.ru/video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement -开发者_如何学编程 Like a G6>">
<img src="/images/topvideo/1.jpg" alt=""/></a>
<span class="popular-video-artist ellipsis"><a href="http://mysite.ru/video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Far East Movement</a></span>
<span class="popular-video-title ellipsis"><a href="http://mysite.ru/video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Like a G6</a></span>
</div>
</body>
</html>
I do not want to add html and body tags. Also do not want to tag replaced to <lang>
. And
is also unnecessary.
I want to receive such content, which was at the entrance, only with modified links..
Sorry for bad english!
You are seeing
at the end of each line because your HTML has Windows-style line endings CR+LF
. To get rid of them, run this on it before you feed it into DOMDocument
— to convert them to Unix-style line endings LF
:
$content = preg_replace('/\r\n/', "\n", $content);
saveXml takes an optional parameter to allow you to specify the node to output.
$dom->saveXml($dom->documentElement->firstChild->firstChild);
This will remove the html and body tags from the output.
I guess that the <html>
and <body>
tags get placed in because you are using loadHTML
. Try using loadXML
instead.
As for <lang>
, it has to be replaced because otherwise the resulting XML would not be valid. If it is causing you problems, you should change your approach a little and work with it, not against it.
<?php
$content = '<!--<sup><span style="font-weight:bold;color:black;">0</span></sup><br/>-->
<div class="popular-video-image">
<a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>">
<img src="/images/topvideo/1.jpg" alt=""/>
</a>
<span class="popular-video-artist ellipsis"><a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Far East Movement</a></span>
<span class="popular-video-title ellipsis"><a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Like a G6</a></span>
</div>';
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->loadHTML($content);
foreach ($dom->getElementsByTagName('a') as $node)
{
$node->setAttribute('href', 'http://mysite.ru/' . $node->getAttribute('href'));
}
$dom->formatOutput = true;
echo preg_replace('#^<!DOCTYPE.+?>#', '', str_replace( array('<html>', '</html>', '<body>', '</body>', "\n\n", '<', '>'), array('', '', '', '', '', '<', '>',), $dom->saveHTML()));
精彩评论