We have developed a .NET web application that uses SQL Server as a backend. Now we would like to provide a monitoring dashboard app for the tech support team. The idea is that this monitoring app will show a global picture of the "health" of the web servers hosting the application and the database servers holding the data. This "health" measure should reflect the workload of each machine, and would be a number (between 0 and 100, let's say) computed from some inputs that I need to determine.
For the web servers, I imagine that HTTP requests per time unit must be considered, and perhaps bandwidth consumed.
For the database servers, I reckon that transactions per time unit and maybe locks or some other indicator or database concurrency should be used.
In addition, some other generic inputs, such as CPU load, memory usage and disk queue length should also be taken into account.
All these factors should be weighed as necessary to obtain the final "health" figure for each server.
Edit. The idea is that the "health" measure gives the technician a global picture view of a server's workload. If a server appears with low "health", the technician will be able to drill down and look at the details of the machine to see what specific inputs are causing the low "health".
My questions are:
- Do you think this "health" measure makes sense?
- I am thinking of using performance counters to capture the input data. Is this the best option?
- Can you suggest appropriate input开发者_如何学运维s for the web servers (IIS 7) and the database servers (SQL Server 2008)?
Thanks.
Do you think this "health" measure makes sense?
No. The first thing someone will ask if your single number is off is "what's wrong?" Also, consider the fact that trend analysis can be very important for early error detection.
I am thinking of using performance counters to capture the input data. Is this the best option?
I think that would be an excellent starting point.
Can you suggest appropriate inputs for the web servers (IIS 7) and the database servers (SQL Server 2008)?
This is a big subject for a forum post, and the answer depends heavily on the details of your app. In broad terms, you want to look at things like the frequency of error conditions, some sense/measure of throughput for each subsystem, counts for how often out-of-process calls exceed performance thresholds, etc. It's usually a good idea to show current numbers as well as historical and trends.
You might want to have a look at Microsoft's product in this area: Service Center Operations Manager (SCOM), to see the types of things they do.
First of all, I think you are designing a different dashboard than what you are telling us, tech support wants to know if machines are up/down and what to do when there is a problem.
Requests and transactions per second are useful for capacity planning and/or system and application tuning, not for tech support.
Also, I believe a single figure makes no sense and helps nobody, because what would 87,75% mean?
So, I believe you want a dashboard for sysadmins and app developers, where this type of measurement makes sense, to tune the OS or know when to add a new machine or which query is bogging down SQL Server.
That said, performance counters already store much of the information you want to present so that does make sense. Additionally you can use SQL Server traces to measure performance data about the queries, the traces should not be run constantly, but at defined intervals.
Now, if you really wanted a dashboard for tech support, two type of monitors would be enough: Server up/down - Application responsive/unresponsive
SQL Server 2008 comes with a performance collection and data warehouse out-of-the-box, see SQL Server 2008 Data Collections and the Management Data Warehouse. Also SQL 2005 has a similar Performance Dashboard. I'm not saying you should use these as your dashboard necessarily (although you could), but you should look at these two SQL dashboards to see what the MS team considered important to put in a dashboard.
精彩评论