I need to parse many html files using php.
foreach($url_array as $url){
$file = file_get_contents($url);
parse_html($fil开发者_开发知识库e);
}
For some reasons (file is too big), function parse_html() take very long time to run or has memory leak in it.
I want to monitor function parse_html(). If the running time exceed a given time, should continue to parse the next url and disregard the current one.
For most of the time, my codes runs great but there are some urls can not be parsed. There is no error output and I guess it is memory leak.
This can not be done as easily as you think. Since you are running on one thread only, you cannot have any checks. If this thread is blocking, it is blocking.
You need to create some sort of multi-threaded environment where you run one worker thread for the execution of parse_html()
(to increase speed and take advantage of multi-core processors you could even spawn more worker threads) and another thread that checks and kills the workers if they are taking too much time.
Taking what @klaus said into account, you would be able to perform this check if you can edit the parse_html()
function. Within the function, there are likely either a number of calls to various subfunctions or a large number of for
repeat loops. You want to add a check somewhere in the functions, or at the head of the for
loops, to see whether the function is taking too long to execute.
Simple pseudocode example:
function parse_html()
start_time = 0;
read file
foreach element_to_be_parsed
runtime = current_time - start_time
if runtime > (whatever)
break
end
...do parsing stuff
end
end
精彩评论