I'm trying to scrape information off of Facebook account settings pages for a browser app that will provide feedback to the Facebook member. I'd like to read in several pages and scrape using jQuery without displaying the pages. Unfortunately, I'm having trouble getting Facebook's BigPipe code to run as part of the page load using two of the three methods I've tried.
The third method below seems to work, but I'm still left wondering why the other two failed. Is there a way to trigger the BigPipe onPageletArrival scripts to run using either the jQuery .load() method or document.implementation.CreateDocument?
UPDATE: I didn't ask the core question as succinctly as I should have: How do I get either of the first two methods to behave like the iFrame method? Put another way: How do I get those BigPipe scripts to execute and fill in the content when I begin with either Attempt 1 or 2?
Attempt 1 (using document.implementation.createDocument):
html = getHttpPage( 'http://www.facebook.com/editprofile.php?sk=basic' );
var fbPageHdr = /^<!DOCTYPE html><html[^>]+>/,
fbPageTail = /<\/html>/;
var dt = document.implementation.createDocumentType("html", "-//W3C//DTD HTML 4.01 Transitional//EN", "http://www.w3.org/TR/html4/loose.dtd");
var doc = document.implementation.createDocument('','', dt);
var htmlElem = doc.createElement('html');
html = html.replace( fbPageHdr, "");
html = html.replace( fbPageTail, "");
htmlElem.innerHTML = html;
doc.appendChild( htmlElem );
Attempt 2 (using jQuery .load() method):
var $njqNode = $();
$njqNode.load('http://www.facebook.com/editprofile.php?sk=basic' + ' body');
Attempt 3 - this one works! (using jQuery and iFrame):
if ($('#hiddenpage').length == 0){
var $hiddenPage = $('<iframe id="开发者_高级运维hiddenpage" width="1000px" height="1000px"></iframe>');
$('body').append($hiddenPage);
}
$('#hiddenpage').attr('src', 'http://www.facebook.com/editprofile.php?sk=basic');
BigPipe prints a static header and leaves the HTTP connection open. Additional content is added to the document out-of-order by appending javascript tags. The first two approaches don't work because they only grab the flat content of the page (which is rather unremarkable) and do not execute the BigPipe document-modifying scripts against it.
The last approach does work because it behaves like a new browser window.
精彩评论