开发者

Long-running-to-death migration / find_each

开发者 https://www.devze.com 2023-03-02 17:28 出处:网络
Running Rails 3 with PostgreSQL, I\'ve a migration, updating millions of small records. Record.find_each do |r|

Running Rails 3 with PostgreSQL,

I've a migration, updating millions of small records.

Record.find_each do |r|
  r.doing_incredibly_complex_stuff
  r.save!
  puts "#{r.id} updated"
end

Since I think ActiveRecord wraps such updates in a transaction, the "commit" time is very long and the memory taken is HUGE, while every record has been "printed" on screen in the above example.

So, could I run this find_each outside a transaction -- while it is quite safe--, saving a lot of "commit" time and memory?

A kind of ActiveRecord::Base.without_transaction do ... ; end I guess :-)

OR : I'm wrong, migrations are not wrapped into transactions, and the time I see is just SQL update statements applying?

EDIT : It seems threre is no link with transactions, here's the stack trace I got once I interrupt the migration, when all have been printed on screen and the RAM decreasing from 500MB free to ~30MB :

IRB::Abort: abort then interrupt!!
from /Users/clement/.rvm/gems/ruby-1.9.2-p136@gemset/gems/activesupport-3.0.4/lib/active_support/whiny_nil.rb:46:in `call'
from /Users/clement/.rvm/gems/ruby-1.9.2-p136@gemset/gems/activesupport-3.0.4/lib/active_support/whiny_nil.rb:46:in `method_missing'
from /Users/clement/.rvm/gems/ruby-1.9.2-p136@gemset/gems/activerecord-3.0.4/lib/active_record/connection_adapters/postgresql_adapter.rb:978:in `flatten'
from /Users/clement/.rvm/gems/ruby-1.9.2-p136@gemset/gems/activerecord-3.0.4/lib/active_record/connection_adapters/postgresql_adapter.rb:978:in `block in select'
from /Users/clement/.rvm/gems/ruby-1.9.2-p136@gemset/gems/activerecord-3.0.4/lib/active_record/connection_adapters/postgresql_adapter.rb:977:in `map'
from /Users/clement/.rvm/gem开发者_StackOverflows/ruby-1.9.2-p136@gemset/gems/activerecord-3.0.4/lib/active_record/connection_adapters/postgresql_adapter.rb:977:in `select'
from /Users/clement/.rvm/gems/ruby-1.9.2-p136@gemset/gems/activerecord-3.0.4/lib/active_record/connection_adapters/abstract/database_statements.rb:7:in `select_all'
from /Users/clement/.rvm/gems/ruby-1.9.2-p136@gemset/gems/activerecord-3.0.4/lib/active_record/connection_adapters/abstract/query_cache.rb:56:in `select_all'
from /Users/clement/.rvm/gems/ruby-1.9.2-p136@gemset/gems/activerecord-3.0.4/lib/active_record/base.rb:467:in `find_by_sql'
from /Users/clement/.rvm/gems/ruby-1.9.2-p136@gemset/gems/activerecord-3.0.4/lib/active_record/relation.rb:64:in `to_a'
from /Users/clement/.rvm/gems/ruby-1.9.2-p136@gemset/gems/activerecord-3.0.4/lib/active_record/relation.rb:356:in `inspect'
from /Users/clement/.rvm/gems/ruby-1.9.2-p136@gemset/gems/railties-3.0.4/lib/rails/commands/console.rb:44:in `start'
from /Users/clement/.rvm/gems/ruby-1.9.2-p136@gemset/gems/railties-3.0.4/lib/rails/commands/console.rb:8:in `start'
from /Users/clement/.rvm/gems/ruby-1.9.2-p136@gemset/gems/railties-3.0.4/lib/rails/commands.rb:23:in `<top (required)>'
from script/rails:6:in `require'
from script/rails:6:in `<main>'

EDIT(2): Wow. It turned out it was very long because find_each returns all elements after it iterated.

I tried :

Record.tap do |record_class|
  record_class.find_each do |r|
    r.doing_incredibly_complex_stuff
    r.save!
    puts "#{r.id} updated"
  end
end
=> Record(id: integer, ...)

So it gave back the console instantly as expected. :)

But then I still see a strange behavior : RAM doesn't free. Instead once I exited term, RAM still plunges...

Maybe my solution with tap is not satisfying? Is it still mass selecting? How would I avoid the mass select after the find_each?

Thanks!


Neither ActiveRecord::Migration nor find_each do anything to wrap your code in a database transaction. The r.save! will be wrapped in an individual transaction that covers any cascading effects of the save.

As in the comments above, using update_all or a raw execute will be faster for mass updating. I have no way to tell if that would be appropriate for what you're doing. Also, if you're having memory issues, you should be able to tweak the batch size on find_each and see if it has an effect. If not, you may be holding onto those objects somewhere.


Perhaps you could structure the method to add a last statement to serve as a return value, rather than returning the return value of find_each. You could replace the last "end" with

end ; nil
0

精彩评论

暂无评论...
验证码 换一张
取 消