开发者

Parsing html with phpQuery : how to handle C++ code inside a pre tag?

开发者 https://www.devze.com 2023-03-29 21:49 出处:网络
In the database I have some code like this one Some text <pre> #include <cstdio> int x = 1; </pre>

In the database I have some code like this one

Some text
<pre>
#include <cstdio> 

int x = 1;
</pre>
Some text

When I'm trying to use phpQuery to do the parsing it fails because the <cstdio> is interpreted as a tag.

I could use htmlspecialchars but to apply it only inside pre tags I still need to do some parsing. I could use regex but it will be much more difficult (I will need 开发者_开发技巧to handle the possible attributes of the pre tag) and the idea of using a parser was to avoid this kind of regex thing.

What's the best way to do what I need to do ?


Remember to do encode HTML (& > and so on) before assembly


I finally went the regex way, considering only simple attributes for the pre tag (no '>' inside the attributes) :

  foreach(array('pre', 'code') as $sTag)
     $s = preg_replace_callback("#\<($sTag)([^\>]*?)\>(.+?)\<\/$sTag\>#si",
     function($matches)
     {
        $matches[3] = str_replace(array('&amp;', '&lt;', '&gt;'), array('&', '<', '>'), $matches[3]);      
        return "<{$matches[1]} {$matches[2]}>".htmlentities($matches[3], ENT_COMPAT, "UTF-8")."</{$matches[1]}>";
     },
     $s);

It also deals with caracters being already converted to html entities (we don't want to have it twice).

Not a perfect solution but given the data I need to apply it on it will do the work.


The error is, that your database contains HTML that contains some text which is not correctly encoded already.

So, if you want to save time and have a correct solution, then you should make sure, that the HTML in your database is correctly encoded. This means, you should make sure that everything will be correctely encoded (using htmlspecialchars()) before it is saved to your database!

Otherwise you just save garbage in your database, and you will have to write some special code to "prettify that garbage".

Any other solutions are workarounds, and those will cost you precious time in your future.

So: the best solution is to make sure, that anything you write to your database is correct.

0

精彩评论

暂无评论...
验证码 换一张
取 消