开发者

curl: downloading from dynamic url

开发者 https://www.devze.com 2022-12-25 01:55 出处:网络
I\'m trying to download an html file with curl in bash. Like this site: http://www.registrar.ucla.edu/schedule/detselect.aspx?termsel=10S&subareasel=开发者_如何学JAVAPHYSICS&idxcrs=0001B+++

I'm trying to download an html file with curl in bash. Like this site: http://www.registrar.ucla.edu/schedule/detselect.aspx?termsel=10S&subareasel=开发者_如何学JAVAPHYSICS&idxcrs=0001B+++

When I download it manually, it works fine. However, when i try and run my script through crontab, the output html file is very small and just says "Object moved to here." with a broken link. Does this have something to do with the sparse environment the crontab commands run it? I found this question:

php ssl curl : object moved error

but i'm using bash, not php. What are the equivalent command line options or variables to set to fix this problem in bash?

(I want to do this with curl, not wget)

Edit: well, sometimes downloading the file manually (via interactive shell) works, but sometimes it doesn't (I still get the "Object moved here" message). So it may not be a a specifically be a problem with cron's environment, but with curl itself.

the cron entry:

* * * * * ~/.class/test.sh >> ~/.class/test_out 2>&1

test.sh:

#! /bin/bash
PATH=/usr/local/bin:/usr/bin:/bin:/sbin
cd ~/.class

course="physics 1b"
url="http://www.registrar.ucla.edu/schedule/detselect.aspx?termsel=10S<URL>subareasel=PHYSICS<URL>idxcrs=0001B+++"

curl "$url" -sLo "$course".html  --max-redirs 5

Edit: Problem solved. The issue was the stray tags in the url. It was because I was doing sed s,"<URL>",\""$url"\", template.txt > test.sh to generate the scripts and sed replaced all instances of & with the regular expression <URL>. After fixing the url, curl works fine.


You want the -L or --location option, which follows 300 series redirects. --maxredirs [n] will limit curl to n redirects.

Its curious that this works from an interactive shell. Are you fetching the same url? You could always try sourcing your environment scripts in your cron entry:

* * * * * . /home/you/.bashrc ; curl -L --maxredirs 5 ...

EDIT: the example url is somewhat different than the one in the script. $url in the script has an additional pair of <URL> tags. Replacing them with &, the conventional argument seperators for GET requests, works for me.


Without seeing your script it's hard to guess what exactly is going on, but it's likely that it's an environment problem as you surmise.

One thing that often helps is to specify the full path to executables and files in your script.

If you show your script and crontab entry, we can be of more help.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号