开发者

How to extract the fragment between the body (<body>...</body>) tags from AJAX response in JavaScript

开发者 https://www.devze.com 2023-01-17 13:52 出处:网络
An AJAX response is returning the full HTML page. I need to extract the fragment between the body (<body> and </body>) tags. This is required to be done on the client side using JavaScript

An AJAX response is returning the full HTML page. I need to extract the fragment between the body (<body> and </body>) tags. This is required to be done on the client side using JavaScript. Any help will be appreciated.开发者_开发百科


The simplest but kind-of worst way would be simple string hacking on the response text.

var bodyhtml= html.split('<body>').pop().split('</body>')[0];

This is unsatisfactory in the general case, but can be feasible if you know the exact format of the HTML being returned (eg. that there's no attributes on the <body>, that the sequences <body> and </body> isn't used in a comment in the middle of the page, etc).

Another still-quite-bad way is to write the whole document to the innerHTML of a newly-created <div> and fish out the elements you want, not caring that writing <html> or <body> inside a <div> is broken. You'll be unable to reliably separate the child elements of <head> from those in <body> this way, but this is what eg jQuery does.

A more robust but more painful way would be to use a separate HTML document:

var iframe= document.createElement('iframe');
iframe.style.display= 'none';
document.body.insertBefore(iframe, document.body.firstChild);
var idoc= 'contentDocument' in iframe? iframe.contentDocument : iframe.contentWindow.document;
idoc.write(htmlpage);
idoc.close();
alert(idoc.body.innerHTML);
document.body.removeChild(iframe);

though this would also execute all scripts inside the document, potentially changing it, so that might not be satisfactory either.


If your HTML Page is on Web then You can use YQL.

e.g if your page url is http://xyz.com/page.html and you want everything in body element do like this

select * from html where url="http://xyz.com/page.html" and xpath='//body'

If you are new to YQL read this http://en.wikipedia.org/wiki/YQL_Page_Scraping

There is also simple way to do it using Chromyqlip Extension https://chrome.google.com/extensions/detail/bkmllkjbfbeephbldeflbnpclgfbjfmn

Hope this will help You!!!


// Get the XML object for the "body" tag from the XMLHttpRequest/ActiveXObject
// object (requestObj).

// NOTE: This assumes there is only one "body" tag in your HTML document.
var body = requestObj.responseXML.getElementsByTagName("body")[0];

// Get the "body" tag as an XML string.
var bodyXML;

// for Internet Explorer            
if (body.xml) {
    bodyXML = body.xml;
}

// for every other browser
if (typeof (XMLSerializer) != "undefined") {
    var serializer = new XMLSerializer();
    bodyXML = serializer.serializeToString(body);
}

This gets you the XML for the "body" tag, as a string. Unfortunately, it still includes the "<body>" and "</body>", so if you want only the contents of the tag you will have to strip these off.

You might want to take a look at the second example ("Example HTML code 2") on this page.

0

精彩评论

暂无评论...
验证码 换一张
取 消