What are the first issues to check while optimizing an existing database?_问答_开发者

What are the top issues and in which order of importance to look into while optimizing (performance tuning, troubleshooting) an existing (but unknown to you) database?

Which actions/measures in your previous optimizations gave the most effect (with possibly the minimum of work) ?

I'd like to partition this question into following categories (in order of interest to me):

one needs to show the performance boost (improvements) in the shortest time. i.e. most cost-effective methods/actions;
non-intrusive or least-troublesome most effective methods (without changing existing schemas, etc.)
intrusive methods

Update:

Suppose I have a copy of a database on dev machine without access to production environment to observe stats, most used queries, performance counters, etc. in real use.

This is development-related but not DBA-related q开发者_JS百科uestion.

Update2:

Suppose the database was developed by others and was given to me for optimization (review) before it was delivered to production.

It is quite usual to have outsourced development detached from end-users.

Besides, there is a database design paradigm that a database, in contrast to application data storage, should be a value in itself independently on specific applications that use it or on context of its use.

Update3: Thanks to all answerers! You all pushed me to open subquestion

How do you stress load dev database (server) locally?

Create a performance Baseline (non-intrusive, use performance counters)
Identify the most expensive queries (non-intrusive, use SQL Profiler)
Identify the most frequently run queries (non-intrusive, use SQL Profiler)
Identify any overly complex queries, or those using slowly performing constructs or patterns. (non-intrusive to identify, use SQL Profiler and/or code inspections; possibly intrusive if changed, may require substantial re-testing)
Assess your hardware
Identify Indexes that would benefit the measured workload (non-intrusive, use SQL Profiler)
Measure and compare to your baseline.
If you have very large databases, or extreme operating conditions (such as 24/7 or ultra high query loads), look at the high end features offered by your RDBMS, such as table/index partitioning.

This may be of interest: How Can I Log and Find the Most Expensive Queries?

If the database is unknown to you, and you're under pressure, then you may not have time for Mitch's checklist which is good best practice to monitor server health.

You also need access to production to gather real info from assorted queries you can run. Without this, you're doomed. The server load pattern is important: you can't reproduce many issue yourself on a development server because you won't use the system like an end user.

Also, focus on "biggest bang for the buck". An expensive query running once daily at 3am can be ignored. A not-so-expensive one running every second is well worth optimising. However, you may not know this without knowing server load pattern.

So, basic steps..

Assuming you're firefighting:

server logs
SQL Server logs
sys.sysprocesses eg ASYNC_NETWORK_IO waits

Slow response:

profiler, with a duration filter. What runs often and is lengthy
most expensive query, weighted for how often used
open transaction with plan
weighted missing index

Things you should have:

Backups
Tested restore of aforementioned backups
Regular index and statistic maintenance
Regular DBCC and integrity checks

Edit: After your update

Static analysis is best practices only: you can't optimise for usage. This is all you can do. This is marc_s' answer.
You can guess what the most common query may be, but you can't guess how much data will be written or how badly a query scales with more data
In many shops developers provide some support, either directly or as *3rd line"
If you've been given a DB for review by another team that you hand over to another team to deploy: that's odd.

If you're not interested in the runtime behavior of the database, e.g. what are the most frequently executed queries and those that consume the most time, you can only do a "static" analysis of the database structure itself. That has a lot less value, really, since you can only check for a number of key indicators of bad design - but you cannot really tell much about the "dynamics" of the system being used.

Things I would check for in a database that I get as a .bak file - without the ability to collect live and actual runtime performance statistics - would be:

normalization - is the table structure normalized to third normal form? (at least most of the time - there might be some exceptions)
do all tables have a primary key? ("if it doesn't have a primary key, it's not a table", after all)
For SQL Server: do all the tables have a good clustering index? A unique, narrow, static, and preferably ever-increasing clustered key - ideally an INT IDENTITY, and most definitely not a large compound index of many fields, no GUID's and no large VARCHAR fields (see Kimberly Tripp's excellent blog posts on the topics for details)
are there any check and default constraints on the database tables?
are all the foreign key fields backed up by a non-clustered index to speed up JOIN queries?
are there any other, obvious "deadly sins" in the database, e.g. overly complicated views, or really badly designed tables etc.

But again: without actual runtime statistics, you're quite limited in what you can do from a "static analysis" point of view. The real optimization can only really happen when you have a workload from a regular day of operation, to see what queries are used frequently and put the most stress on your database --> use Mitch's checklist to check those points.

The most important thing to do is collect up-to-date statistics. Performance of a database depends on:

the schema;
the data in the database; and
the queries being executed.

Looking at any of those in isolation is far less useful than the whole.

Once you have collected the statistics, then you start identifying operations that are sub-par.

For what it's worth, the vast majority of performance problems we've fixed have been by either adding indexes, adding extra columns and triggers to move the cost of calculations away from the select to the insert/update, and tactfully informing the users that their queries are, shall we say, less than optimal :-)

They're usually pleased that we can just give them an equivalent query that runs much faster.