An AJAX response is returning the full HTML page. I need to extract the fragment between the body (<body>
and </body>
) tags. This is required to be done on the client side using JavaScript. Any help will be appreciated.开发者_开发百科
The simplest but kind-of worst way would be simple string hacking on the response text.
var bodyhtml= html.split('<body>').pop().split('</body>')[0];
This is unsatisfactory in the general case, but can be feasible if you know the exact format of the HTML being returned (eg. that there's no attributes on the <body>
, that the sequences <body>
and </body>
isn't used in a comment in the middle of the page, etc).
Another still-quite-bad way is to write the whole document to the innerHTML
of a newly-created <div>
and fish out the elements you want, not caring that writing <html>
or <body>
inside a <div>
is broken. You'll be unable to reliably separate the child elements of <head>
from those in <body>
this way, but this is what eg jQuery does.
A more robust but more painful way would be to use a separate HTML document:
var iframe= document.createElement('iframe');
iframe.style.display= 'none';
document.body.insertBefore(iframe, document.body.firstChild);
var idoc= 'contentDocument' in iframe? iframe.contentDocument : iframe.contentWindow.document;
idoc.write(htmlpage);
idoc.close();
alert(idoc.body.innerHTML);
document.body.removeChild(iframe);
though this would also execute all scripts inside the document, potentially changing it, so that might not be satisfactory either.
If your HTML Page is on Web then You can use YQL.
e.g if your page url is http://xyz.com/page.html and you want everything in body element do like this
select * from html where url="http://xyz.com/page.html" and xpath='//body'
If you are new to YQL read this http://en.wikipedia.org/wiki/YQL_Page_Scraping
There is also simple way to do it using Chromyqlip Extension https://chrome.google.com/extensions/detail/bkmllkjbfbeephbldeflbnpclgfbjfmn
Hope this will help You!!!
// Get the XML object for the "body" tag from the XMLHttpRequest/ActiveXObject
// object (requestObj).
// NOTE: This assumes there is only one "body" tag in your HTML document.
var body = requestObj.responseXML.getElementsByTagName("body")[0];
// Get the "body" tag as an XML string.
var bodyXML;
// for Internet Explorer
if (body.xml) {
bodyXML = body.xml;
}
// for every other browser
if (typeof (XMLSerializer) != "undefined") {
var serializer = new XMLSerializer();
bodyXML = serializer.serializeToString(body);
}
This gets you the XML for the "body" tag, as a string. Unfortunately, it still includes the "<body>" and "</body>", so if you want only the contents of the tag you will have to strip these off.
You might want to take a look at the second example ("Example HTML code 2") on this page.
精彩评论