Take a look at this page:
http://www.ptmytrade.com/product.asp?id=61363
It's loading fine (at least here). Now I would like to grab it with wget.
$ wget http://www.ptmytrade.com/product.asp?id=61363 --debug
DEBUG output created by Wget 1.12 on linux-gnu.
--2011-05-21 18:24:51-- http://www.ptmytrade.com/product.asp?id=61363
Resolving www.ptmytrade.com... 205.209.150.134
Caching www.ptmytrade.com => 205.209.150.134
Connecting to www.ptmytrade.com|205.209.150.134|:80... connected.
Created socket 3.
Releasing 0x0890e260 (new refcount 1).
---request begin---
GET /product.asp?id=61363 HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: www.ptmytrade.com
Connection: Keep-Alive
---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 500 Internal Server Error
Connection: keep-alive
Date: Sat, 21 May 2011 16:24:56 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Content-Length: 471822
Content-Type: text/html
Set-Cookie: ASPSESSIONIDSCACCAQA=FOCCMJODFHHMOKNKPAIHJCIL; path=/
Cache-control: private
---response end---
500 Internal Server Error
Stored cookie www.ptmytrade.com -1 (ANY) / <session> <insecure> [expiry none] ASPSESSIONIDSCACCAQA FOCCMJODFHHMOKNKPAIHJCIL
Registered socket 3 for persistent reuse.
Disabling further reuse of socket 3.
Closed fd 3
2011-05-21 18:24:57 ERROR 500: Internal Server Error.
OK, so I check the headers when fetching the page using my browser (using Live HTTP Headers add-on):
http://www.ptmytrade.com/product.asp?id=61361
GET /product.asp?id=61361 HTTP/1.1
Host: www.ptmytrade.com
User-Agent: Mozilla/5.0 (X11开发者_开发技巧; Linux i686; rv:2.0) Gecko/20100101 Firefox/4.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: ASPSESSIONIDSCACBBRA=AMPBLLNDGMFLNPNCPEBPNNLB; ASPSESSIONIDSCACCAQA=FJNBMJODLHHJNDHPFBIEEPEM
HTTP/1.1 500 Internal Server Error
Date: Sat, 21 May 2011 16:20:46 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Content-Length: 471822
Content-Type: text/html
Cache-Control: private
----------------------------------------------------------
http://www.ptmytrade.com/images/index_117.jpg
GET /images/index_117.jpg HTTP/1.1
Host: www.ptmytrade.com
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:2.0) Gecko/20100101 Firefox/4.0
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Referer: http://www.ptmytrade.com/product.asp?id=61361
Cookie: ASPSESSIONIDSCACBBRA=AMPBLLNDGMFLNPNCPEBPNNLB; ASPSESSIONIDSCACCAQA=FJNBMJODLHHJNDHPFBIEEPEM
HTTP/1.1 404 Not Found
Content-Length: 1635
Content-Type: text/html
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Date: Sat, 21 May 2011 16:20:48 GMT
I'm not sure what's going on here. The page displays just fine, but I'm getting the 500 error code in the header.
The problem was solved by using curl
(which was also getting a 500, but fetched the page just fine) instead, but I'm curious what's going here.
Hi everybody I had this huge problem too, I don't know why, but the solution was add this:
wget -U "Opera 11.0" "http://your_link" -O out.csv
I found it on
[Curl and wget return error 500 for helloworld.php on new install but browser is fine
Using this option will fix the issue:
--content-on-error
If this is set to on, wget will not skip the content when the
server responds with a http status code that indicates error.
So the command looks like this:
wget --content-on-error "https://stackoverflow.com"
NOTE: It's important to put the URL inside double-quotes, otherwise, wget will get stuck on Redirecting output to ‘wget-log’.
.
Or as stated in the comments and by OP, use curl
instead.
But I should note that curl
cannot download whole webpages (css, js, images etc.) because it cannot parse HTML. Source and Taken from.
It's a bug in the webpage. The HTTP status is indeed seemingly incorrectly set to HTTP 500. Firefox/Firebug also confirms this. Basically, you're facing a HTTP 500 error page with "normal" content.
Report it to the site admin.
Try enclosing it in quotes:
wget "http://www.ptmytrade.com/product.asp?id=61363"
instead of:
wget http://www.ptmytrade.com/product.asp?id=61363
精彩评论