Huge transaction in Sql Server, are there any problems?_问答_开发者

Huge transaction in Sql Server, are there any problems?

开发者 https://www.devze.com 2023-03-11 22:35 出处：网络

I have a program which does many bulk operations on an SQL Server 2005 or 2008 database (drops开发者_开发百科 and creates indexes, creates columns, full table updates etc), all in one transaction.

Are there any problems to be expected?

I know that the transaction log expands even in Simple recovery mode.
This program is not executed during normal operation of the system, so locking and concurrency is not an issue.

Are there other reasons to split the transaction into smaller steps?

In short,

Using smaller transactions provides more robust recovery from failure.
Long transactions may also unnecessarily hold locks on objects for extended periods of time that other processes may require access to i.e. blocking.

Consider that if at any point between the time the transaction started and finished, your server experienced a failure, in order to be bring the database online SQL Server would have to perform the crash recovery process which would involve rolling back all uncommitted transactions from the log.

Supposing you developed a data processing solution that is intelligent enough to pick up from where it left off. By using a single transaction this would not be an option available to you because you would need to start the process from the begging once again.

If the transaction causes too many database log entries (updates) the log can hit what is known as the "high water mark". It's the point at which the log reaches (about) half of its absolute maximum size, when it must then commence rolling back all updates (which will consume about the same amount of disk as it took to do the updates.

Not rolling back at this point would mean risking eventually reaching the maximum log size and still not finishing the transaction or hitting a rollback command, at which point the database is screwed because there's not enough log space to rollback.

It isn't really a problem until you run out of disk space, but you'll find that rollback will take a long time. I'm not saying to plan for failure of course.

However, consider the process not the transaction log as such. I'd consider separating:

DDL into a separate transaction
Bulk load staging tables with a transaction
Flush data from staging to final table in another transaction

If something goes wrong I'd hope that you have rollback scripts and/or a backup.

Is there really a need to do everything atomically?

Depending on the complexity of your update statements, I'd recommend to do this only on small tables of, say, a few 100 rows. Especially if you have only a small amount of main memory available. Otherwise, for instance, updates on big tables can take a very long time and even appear to hang. Then it's difficult to figure out what the process (spid) is doing and how long it might take.

I'm not sure whether "Drop index" is transaction-logged operation anyway. See this question here on stackoverflow.com.