开发者 https://www.devze.com 2023-03-15 21:56 出处：网络

I am looking for a free (as in freedom) HTML indenter (or re-indenter) written in Python (module or command line). I don\'t need to filter HTML with a white list. I just want to indent (or re-indent)

I am looking for a free (as in freedom) HTML indenter (or re-indenter) written in Python (module or command line). I don't need to filter HTML with a white list. I just want to indent (or re-indent) HTML source to make it more readable. For example, say I have the开发者_高级运维 following code:

<ul><li>Item</li><li>Item
</li></ul>

the output could be something like:

<ul>
    <li>Item</li>
    <li>Item</li>
</ul>

Note: I am not looking for an interface to a non-Python software (for example Tidy, written in C), but a 100% Python script.

Thanks a lot.

you can use the built-in module xml.dom.minidom's toprettyxml function:

>>> from xml.dom import minidom
>>> x = minidom.parseString("<ul><li>Item</li><li>Item\n</li></ul>")
>>> print x.toprettyxml()
<?xml version="1.0" ?>
<ul>
    <li>
        Item
    </li>
    <li>
        Item
    </li>
</ul>

Using BeautifulSoup

There are a dozen ways to use the BeautifulSoup module and it's prettify function. Here are some examples to get you started.

From the Commandline

$ python -m BeautifulSoup < somefile.html > prettyfile.html

Within VIM (manually)

You don't have to write the file back to disk if you don't want to, but I included the step that would get the identical effect as the commandline example.

$ vi somefile.html
:!python -m BeautifulSoup < %
:w prettyfile.html

Within VIM (define key-mapping)

In ~/.vimrc define:

nmap =h !python -m BeautifulSoup < %<CR>

Then, when you open a file in vim and it needs beautification

$vi somefile.html
=h
:w prettyfile.html

Once again, saving the beautification is optional.

Python Shell

$ python
>>> from BeautifulSoup import BeautifulSoup as parse_html_string
>>> from os import path
>>> uglyfile = path.abspath('somefile.html')
>>> path.isfile(uglyfile)
True
>>> prettyfile = path.abspath(path.join('.', 'prettyfile.html'))
>>> path.exists(prettyfile)
>>> doc = None
>>> with open(uglyfile, 'r') as infile, open(prettyfile, 'w') as outfile:
...     # Assuming very simple case
...     htmldocstr = infile.read()
...     doc = parse_html_string(htmldocstr)
...     outfile.write(doc.prettify())

# That's it; you can manually manipulate the dom too though
>>> scripts = doc.findAll('script')
>>> meta = doc.findAll('meta')
>>> print doc.prettify()
[imagine beautiful html here]

>>> import jsbeautifier
>>> print jsbeautifier.beautify(script.string)
[imagine beautiful script here]
>>>

BeautifulSoup has a function called prettify which does this. See this question

There's also the html5print module. Key features from the description page:

Pretty print HTML as well as embedded CSS and JavaScript within it
Pretty print pure CSS and JavaScript
Try to fix fragmented HTML5
Try to fix HTML with broken unicode encoding
Try to guess encoding of the document, and in some cases manage to convert 8-bit byte code back into correct UTF-8 format
Support both Python 2 and 3

Here's my pure python solution:

from xml.dom.minidom import parseString as string_to_dom

def prettify(string, html=True):
    dom = string_to_dom(string)
    ugly = dom.toprettyxml(indent="  ")
    split = list(filter(lambda x: len(x.strip()), ugly.split('\n')))
    if html:
        split = split[1:]
    pretty = '\n'.join(split)
    return pretty

def pretty_print(html):
    print(prettify(html))

When used on your block of html:

html = """<ul><li>Item</li><li>Item</li></ul>"""
pretty_print(html)

I get:

<ul>
  <li>Item</li>
  <li>Item</li>
</ul>

HTML indenter written in Python

Using BeautifulSoup

From the Commandline

Within VIM (manually)

Within VIM (define key-mapping)

Python Shell

精彩评论

关注公众号

热门标签

图文推荐

HTML indenter written in Python

Using BeautifulSoup

From the Commandline

Within VIM (manually)

Within VIM (define key-mapping)

Python Shell

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：