开发者

Zombie error - error in fetching http request

开发者 https://www.devze.com 2023-03-04 17:16 出处:网络
I am using NodeJs and ZombieJS to fetch url requests in a virtual browser environment. I am using the following code:

I am using NodeJs and ZombieJS to fetch url requests in a virtual browser environment.

I am using the following code:

var zombie = require('zombie'),
jsdom = require('jsdom'),
my_sandbox = require('sandbox'),
url = require('url'),
http = require('http'),
request = require('request'),
httpProxy = require('./lib/node-http-proxy'),
des = '',
util = require('util'),
colors = require('colors'),
is_host = true;

var s = new my_sandbox();
var browser = new zombie.Browser;

httpProxy.createServer(9000, 'localhost').listen(8000);

function zombieFetching(page) {
    browser.visit(page, { debug: false }, 
    function(err, browser, status) {
        if(err) {
        console.log('There is an error. Fix it');
        throw(err.message);
        } else {
           console.log('Browser visit successful') ;
        }
    });
}

var server = http.createServer(function (req, res) {
    var pathname = '';

    if(is_host) {
        dest = req.url.substr(0, req.url.length);
        pathname = dest;
        is_host = false;
    } else {
        pathname = req.url.substr(0, req.url.length);
         if(pathname.charAt(0) == "/") {
            console.log('new request');
            console.log(pathname);
            pathname = dest + pathname;
        }
    }

    request.get({uri: pathname}, function (err, response, html) {
            console.log('The pathname is:::::::::: ' + pathname);
            zombieFetching(pathname);
            res.end(html);
    });
});

server.listen(9000);

I see the following error when I try to fetch the url : "www.yahoo.com"

home/seed/Desktop/Cloud project/node_modules/zombie/node_modules/html5/lib/html5/tokenizer.js:62
                throw(e);
    ^
Error: undefined: Invalid character in tag name: ��
    at Object.createElement (/home/seed/Desktop/Cloud project/开发者_C百科node_modules/zombie/node_modules/jsdom/lib/jsdom/level1/core.js:1174:13)
    at TreeBuilder.createElement (/home/seed/Desktop/Cloud project/node_modules/zombie/node_modules/html5/lib/html5/treebuilder.js:29:25)
    at TreeBuilder.insert_element_normal (/home/seed/Desktop/Cloud project/node_modules/zombie/node_modules/html5/lib/html5/treebuilder.js:61:21)
    at TreeBuilder.insert_element (/home/seed/Desktop/Cloud project/node_modules/zombie/node_modules/html5/lib/html5/treebuilder.js:52:15)
    at Object.startTagOther (/home/seed/Desktop/Cloud project/node_modules/zombie/node_modules/html5/lib/html5/parser/in_body_phase.js:483:12)
    at Object.processStartTag (/home/seed/Desktop/Cloud project/node_modules/zombie/node_modules/html5/lib/html5/parser/phase.js:43:44)
    at EventEmitter.do_token (/home/seed/Desktop/Cloud project/node_modules/zombie/node_modules/html5/lib/html5/parser.js:94:20)
    at EventEmitter.<anonymous> (/home/seed/Desktop/Cloud project/node_modules/zombie/node_modules/html5/lib/html5/parser.js:112:30)
    at EventEmitter.emit (events.js:64:17)
    at EventEmitter.emitToken (/home/seed/Desktop/Cloud project/node_modules/zombie/node_modules/html5/lib/html5/tokenizer.js:84:7)

Also, the log statements are as follows:

The pathname is:::::::::: http://www.yahoo.com/
The pathname is:::::::::: http://l1.yimg.com/a/i/ww/news/2011/05/06/zuckhouse-sm.jpg
The pathname is:::::::::: http://l1.yimg.com/a/i/ww/news/2011/05/07/cable-sm.jpg
The pathname is:::::::::: http://l.yimg.com/a/a/1-/flash/promotions/yahoo/081120/70x50iltlb_2.jpg

Browser visit successful

Browser visit successful

Browser visit successful

Browser visit successful

The pathname is:::::::::: http://l.yimg.com/a/i/vm/2011may/bird74.jpg
The pathname is:::::::::: http://www.yahoo.com/jserror?ad=1&target=cms&data=FPAD

From what I understand, the first four get requests were successful. However, I am not sure why zombie is fetching the invalid request:

"http://www.yahoo.com/jserror?ad=1&target=cms&data=FPAD"

Also, What is causing the invalid character in tag name error ?

Thanks, Sony


favicon.ico is always requested by the browser; Zombie is emulating this behavior correctly. It isn't anywhere the HTTP protocol, but it's just what browsers tend to do, so they display that nice icon in the address bar for sites that support it. You are probably seeing the jserror? request because at some point Zombie received a 301 (redirect) to that URL, and is blindly following it, or some other element on the page is referencing it. By default, Zombie's handlers try to follow everything, that's why you are getting images and so forth, just like a browser would.

If you set browser.debug = true I think you can get a lot more info than what your log statements are giving you.

0

精彩评论

暂无评论...
验证码 换一张
取 消