Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this questionI have a system which makes use of a remote database and a web application for the front end.
The system is now at the stage where its usage is ready to be scaled up (i.e. from being tested by one person to being used by 100's开发者_如何学Go of people)...
Obviously there will be a stage for testing etc on larger groups of people but I was wondering if anyone with experience could give me the main points for consideration (i.e. common problems and solutions) when scaling a system up in such a way...
One example might be server space...ensuring that there is always enough to accommodate for the growth in system usage...
Would be interesting to hear the relative issues as this is my first big project....
The ultimate answers are unique to each application. Try to find some concrete evidence of what "normal" usage will entail and, if possible, when and what your max usage will be. Determine what acceptable performance looks like for your app, and use a load testing tool like JMeter (free) or WAPT (not free) to simulate user load and see how your app holds up. If you don't get acceptable performance numbers, throw a profiler like VisualVM (free, also bundled with the JDK) or YourKit (not free) into the mix to identify bottlenecks. Fix the most severe bottleneck, rinse, and repeat.
I see three important things to look at:
- Unexpected interactions.
- Badly tuned parameters and graceless degradation
- Unexpected resource consumption
First, you must have clear objectives. @Ryan has already alluded to this. What does acceptable performance look like - your objective is not "as fast as possible" or you will never stop tuning - you must be very clear: with specified workload patterns for specified user populations response times are ...
As you scale up the workload you are likely to hit the problems I alluded to earlier.
Unexpected Interactions: for example some user action results in a lengthy DB activity and during that certain locks are held, other users now experience unacceptable performance. Or, several users all attempt to buy the same product at the same time, an unacceptable number of optimistic lock failures occur. Or the system deadlocks.
Such problems often don't show up until the testing scales. To detect such problems you need to design your test data carefully and your test scripts to cover both normal and "unusual" peaks.
Tuning Parameters: Your infrastructure is likely to have connection pools and thread pools. The default sizes of those pools may well need to be adjusted. There's two considerations here, for your target workload what works? You increase the connection pool size, so now the database server has more open connections, so now you need to increase some database parameter or available memory ... And, what happens in unusual situations when resources run out. Suppose the system times-out waiting for a connection - does the user get a friendly error message and the system administrator gets notified or does something very unpleasant happen?
Unexpected resource consumption: what happens to resource consumption when the workload scales? Are logs now much bigger, so disk space is insufficient? Do you need to increase heap sizes? What's the long term trend over time? Memory growth? Maybe a memory leak? There are often unpleasant surprises.
精彩评论