I have a collection of one thousand HTML files and need to somewhat trim them. I need to delete all the tags inside <body></body>
area of those except for one, <div.pg>
, to make them clean to be printed. the excess are nav开发者_StackOverflowigation links which make the prints messy and make the pages occupy more paper. the contents are not the same so I can't find and replace the code excerpt but the tags are the same foe example there are 3 <table>
tags to be deleted each with specific class. manipulate specific tags inside batch HTML files?
Any batch processing technique or software to do this job? What an easy solution on windows?
I would use an xslt transform on each html page you have. Batch is not the tool to manipulate html files. You can use batch as a "manager" to pass the required file to the xsl transform. Also windows have a rudimentary msxml utility which you can download and install to your machine : http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=21714
That's how I would do it. I am sure there are more options.
If it is XHTML you could use XSLT to transform your HTML to "another" format. Look for example here: http://www.w3schools.com/xsl/ or here: http://help.hannonhill.com/discussions/how-do-i/269-strip-specific-html-tag-in-xslt
精彩评论