I have been building websites for several years now, mostly in php. Several of the sites have cronjobs, that typically run once a day. The php files that the cronjobs run are stored on the server, along with the files that deliv开发者_JS百科er the site pages.
I know that various crawlers, legitimate and not, visit various pages of my sites. Now if a crawler would visit one of my cronjob files, this would activate the cronjob, sometimes with undesirable results.
I'm pretty sure that this has never happened, and, although I'm grateful for that, I'm trying to understand why. Of course there are no links anywhere to any of my cronjob url's, but I'm pretty sure that various crawlers have visited other pages even though they were never linked to.
What do other developers do to address this issue? Put a line in the robots.txt file? Set the permissions of the cronjob-relate php files?
Thanks in advance.
Don't store any cron scripts in a publicly accessible directory.
Along with @Jeff's great answer:
The only way a search engine will crawl your page is if there is something linking to it. This might be another page on your site, a page on someone else's site, or your own sitemap.
Regardless your cron job should never be directly accessible from the outside.
精彩评论