I wrote a webcrawler which calls a web page in a do while loop amount 3 seconds
totally there are 7000 sites... i parse the data and save it in my DB.
sometimes because the script is loading for a long time, i got a timeout 开发者_如何学运维in browser,
but in background i continues. I see that on my database.
Can I prevent this?.. Now it's just possible if I stop webserver.
Thank you and best regards.
Your web page is kicking off a server-side process. Killing your browser or closing it is not going to stop this. It sounds to me like a web page to control this is the wrong approach, and you should be looking at a connected form of application like a WinForms/WPF app. There would be ways to get this to work with ASP.NET, but they are not going to be simple. I think you have just chosen the wrong technology.
Starting an intensive, long running process like this from a web page is almost never a good idea. There are lots of reasons, but the main ones are :
1) If you get a timeout in the browser (this is your scenario) the data you have harvested may not be displayed.
2) What happens if you hit refresh in the browser? Will it attepmt to start the whole process again? this is an easy target for an attacker, if he wants to tie up all your server resources.
3) Is the data you are crawling really likely to change to such an extent that you need "live" crawling? 99% of cases would be served just as well with a background timed job running the crawl, and your front end just displaying the contents of the database.
I would seriously recommend you rethink your crawling strategy to something more controllable and stable.
精彩评论