EDIT: I have totally rewritten this question for clarity. I got no comments and no answers earlier.
I am maintaining a 2.x Rails app with plenty of statistical data. Some data is real and some is estimated for the future years. Every year I need to update estimated data with real data and calculate new estimates.
I have been using BIG yml-files and migrations for load开发者_运维技巧ing the data into the app every year. My migrations are full of estimation calculations and data corrections.
Problem
My migrations are full of none-schema related material and I can't even dream of doing db:migrate:reset without waiting few hours (if it even works). I'd love to see my migrations nice and clean - only with schema related modifications. But how I am suppose to update the data every year if not using migrations?
Help needed
I'd like to hear your comments and answers. I'm not looking for a silver bullet - more like best practises and ideas how people are handling similar situation.
It sounds like you have a large operation (data load using yml files) once a year but smaller operations once a month.
From my experience with statistical data you will probably end up doing more and more of these operations to clean and add more data.
I would use a job processing framework like resque and resque scheduler.
You can schedule the jobs to run once a month, year, day or constantly running. A job is something like loading yml files (or sets of yml files) or cleaning up data. You can control parameters to send to your job so you can use one class but alternate how it updates or cleans your data based on the way you enqueue or schedule the job.
First of all, I have to say that this is a very interesting question. As far as i know, it isn't a good idea loading data from migrations. Generally speaking you should use db/seeds.rb for data loading in your db and I think it could be a good idea to write a little class helper to put in your lib dir and then call it from db/seeds.rb. I image you could organize you files in the following way:
lib/data_loader.rb
lib/years/2009.rb
lib/years/2010.rb
Obviously, you should clear your migrations and write code for lib/data_loader.rb in the way you should prefer but I was only trying to offer a general idea of how I'd organize my code if I have to face a problem like that.
I'm not sure I've replied to your question in a way that helps but I hope it does.
If I were you I would go with creating custom rake task. You will have access to all you models and activerecord connections and once a year you will end up doing:
rake calculate
I have a situation where I need to load data from CSV files that change infrequently and update data from the Internet daily. I'll include a somewhat complete example on how to do the former.
First I have a rake file in lib/tasks/update.rake
:
require 'update/from_csv_files.rb'
namespace :update do
task :csvfiles => :environment do
Dir.glob('db/static_data/*.csv') do |file|
Update::FromCsvFiles.load(file)
end
end
end
The => :environment
means we will have access to the database via the usual models.
Then I have code in the lib/update/from_csv_files.rb
file to do the actual work:
require 'csv'
module Update
module FromCsvFiles
def FromCsvFiles.load(file)
csv = CSV.open(file, 'r')
csv.each do |row|
id = row[0]
s = Statistic.find_by_id(id)
if (s.nil?)
s = Statistic.new
s.id= id
end
s.survey_area = row[1]
s.nr_of_space_men = row[2]
s.save
end
end
end
end
Then I can just run rake update:csvfiles
whenever my CSV files changes to load the new data. I also have another task that is set up in a similar way to update my daily data.
In your case you should be able to write some code to load your YML files or make your calculations directly. To handle your smaller corrections you could make a generic method for loading YML files and call it with specific files from the rake task. That way you only need to include the YML file and update the rake file with a new task. To handle execution order you can make a rake task that calls the other rake tasks in the appropriate order. I'm just throwing around some ideas now, you know better than me.
精彩评论