I'm trying to create a function in PHP that would search in a string for all a href occurences and if title is not set it should replace it with the text value between > text <开发者_StackOverflow/a>
I don't know what is the best way to do it, thinking about something like:
$s = preg_replace('/< a[^>]*?href=[\'"](.*?)[\'"][^>]*?title=[\'"](.*?)[\'"][^>]*?>(.*?)<\/a>/si','< a href="$1" title="$2">$3</a>',$s);
How can I check in the regex to see if $2 is set and if it isn't replace it with $3, also $3 can be something like img src="..." alt="..." and in this case I would like to get the value of alt.
First of all I would like to know if this can be done in PHP and how, but any help would be apreciated.
The uninformative link is somehwat fitting here. That's not easily doable with regexpressions. You for example cannot use a (?!\4)
negative assertion with forward backreference to compare the title=
against the <img alt=
attribute (which adds enough difficult for extraction already).
At the very least you will have to use preg_replace_callback
and handle the replacement in a separate function. There it's easier to break out the attributes and compare alt= against title=.
If you aren't using this for output rewriting, then make the task simpler by not using regexpressions. This is performance-wise not the better choice, but easy to do with e.g. phpQuery or QueryPath:
$qp = qp($html);
foreach ($qp->find("a") as $a) {
$title = $a->attr("title");
$alt = $a->find("img")->attr("$title");
if (!$title) { $a->attr("title", $alt); }
}
$html = $qp->top()->writeHtml();
(The same can be done, only with more elaborate code, using DOMDocument...)
Maybe presume it is not going to be set and look for title=''
only:
$preg_replace("/<a[^>]*?href=[\'\"](.*?)[\'\"][^>]*?title=''>(.*?)<\/a>/i","<a href='$1' title='$2'>$2</a>","<a href='http://google.com' title=''>Google</a>");
Output:
<a href='http://google.com' title='Google'>Google</a>
Good luck.
EDIT
Sorry, not too sure what you mean by:
also $3 can be something like img src="..." alt="..." and in this case I would like to get the value of alt.
Isn't $3
in your example the link text?
精彩评论