开发者

Interesting questions related to lighttpd on Amazon EC2

开发者 https://www.devze.com 2022-12-29 11:48 出处:网络
This problem appeared today and I have no idea what is going on. Please share you ideas. I have 1 EC2 DB server (MYSQL + NFS File Sharing + Memcached).开发者_如何学JAVA

This problem appeared today and I have no idea what is going on. Please share you ideas.

I have 1 EC2 DB server (MYSQL + NFS File Sharing + Memcached).

开发者_如何学JAVA

And I have 3 EC2 Web servers (lighttpd) where it will mounted the NFS folders on the DB server.

Everything going smoothly for months but suddenly there is an interesting phenomenon.

In every 8 minutes to 10 minutes, PHP file will be unreachable. This will last about 1 minute and then back to normal. Normal files like .html file are unaffected. All servers have the same problem exactly at the same time.

I have spent one whole day to analysis the reason. Finally, I find out when the problem appear, the file descriptor of lighttpd suddenly increased a lot.


I used ls /proc/1234/fd | wc -l to check the number of fd.

The # of fd is around 250 in normal time. However, when the problem appeared, it will be raised to 1500 and then back to normal.

It sounds funny, right? Do you have any idea what's going on?

======================== The CPU graph of one of the web server. alt text http://pencake.images.s3.amazonaws.com/4be1055884133.jpg


Thoughts:

  • Have a look at dmesg output.
  • The number of file descriptors jumping up sounds to me like something is blocking, including the processing of connections to the lighttpd/PHP, which builds up untile the blocking condition ends.
  • When you say the PHP file is unreachable, do you mean the file is missing? Or maybe the PHP script stalls during execution or? What do the lihttpd log files say is happening on the calls to this PHP script. Are there any other hints in the lighttpd?
  • What is the maximum file descriptors for the process/user?
  • I and others have had bizarre networking behavior on EC2 instances from time to time. Give us more details on it. Maybe setup some additional monitoring of the connectivity between your instances. Consider moving your problem instance to another instance in the hopes of the problem magically disappearing. (Shot in the dark.)

And finally...

  • DOS attack? I doubt it--it would be offline or not. It is way too early in the debugging process for you to infere malice on someone elses part.
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号