Ok, so I'm currently running this script to remove all the excess spaces, linebreaks, and tabs from my final HTML output:
$html = preg_replace(array("/\t/", "/\s{2,}/", "/\n/"), array("", " ", " "), $html);
However, I'm having a problem with my code blocks, which are similar to the code blocks here, being outdented because of this. It's putting the entire code onto one line, so I was wondering how I could run the code above but only for text that is not开发者_JS百科 enclosed in <code></code>
tags which is the only element I need this for. I know how to do this if it were the text inside a code block but I'm a bit lost on how to approach it for text outside of code blocks.
The only reasonable thing I've come up with is removing all the code blocks then doing the replacement and putting the code blocks back in.
I would avoid using regular expressions alone for this. I'm sure someone will post a half-baked, regex that will be either 1) unmaintainable or 2) buggy (or both), but realistically, you'll want to lex your input into tokens and output it according to the context those tokens construct.
I have a tool that I use to create HTML entities from existing HTML. For example, it turns I'm
into I’m
as long as it's in a context where changing that entity would make sense (for example, not in a <code> block, not in a URL, etc).
I've just imported this from my old, dusty Subversion repository to Github, here: https://github.com/scoates/lexentity
Here's an example of lexentity in use: http://files.seancoates.com/lexentity/ (we use it for the articles at http://phpadvent.org/)
All of this to say that a system like this will create a much more flexible and robust solution than a pure regular expression-based system, in my opinion. You'll have to modify lexentity for your purposes, but feel free to borrow as much or as little as you need.
S
精彩评论