on "twill" documentation page it is written:
By default, twill will run pages through tidy before processing them. This is on by default because the Python libraries that parse HTML a开发者_如何学JAVAre very bad at dealing with incorrect HTML, and will often return incorrect results on "real world" Web pages. To disable this feature, set config do_run_tidy 0
But where is this tidy program located inside twill? I have downloaded "twill 0.9" and looked into "twill" folder contents - I just can't find there such a file (or a module) that would be named "tidy"
twill uses the commandline version of tidy if installed on your system. the method that calls tidy to clean your code is located in the utils.py and named 'run_tidy
'. its called by the command 'tidy_ok
' which is defined in commands.py
if use_tidy is set to true (which it is by default) the _cleanup_html
method in ConfigurableParsingFactory calls the run_tidy
method
精彩评论