开发者

Periodically import data from files on Heroku

开发者 https://www.devze.com 2023-03-09 20:08 出处:网络
I need to periodically import some data into my rails app on Heroku. The task to execute is split into the following par开发者_如何学JAVAts:

I need to periodically import some data into my rails app on Heroku.

The task to execute is split into the following par开发者_如何学JAVAts: * download a big zip file (e.g. ~100mb) from a website * unzip the file (unzipped space is ~1.50gb) * run a rake script that reads those file and create or update records using my active record models * cleanup

How can I do this on heroku? Is it better to use some external storage (e.g. S3). How would you approach such a thing?

Ideally this needs to run every night.


I have tried exact same thing couple of days back and the conclusion that I came up with was this can't be done because of memory limit restrictions that heroku imposes on each process. (I build a data structure with the files that I read from the internet and try to push to DB)

I was using a rake task that would pull and parse couple of big file and then populate the database.

As a work around I run this rake task in my local machine now and push the database to S3 and issue a heroku command from my local machine to restore the heroku DB instance.

"heroku pgbackups:restore 'http://s3.amazonaws.com/#{yourfilepath}' --app  #{APP_NAME} --confirm #{APP_NAME}"

You could push to S3 using fog library

require 'rubygems'
require 'fog'
connection = Fog::Storage.new(
    :provider              => 'AWS',
    :aws_secret_access_key => "#{YOUR_SECRECT}",
    :aws_access_key_id     => "#{YOUR_ACCESS_KEY}"
)

directory = connection.directories.get("#{YOUR_BACKUP_DIRECTORY}")

# upload the file
file = directory.files.create(
    :key    => '#{REMOTE_FILE_NAME}',
    :body   => File.open("#{LOCAL_BACKUP_FILE_PATH}"),
    :public => true
)

The command that I use to make a pgbackup on my local machine is

system "PGPASSWORD=#{YOUR_DB_PASSWORD} pg_dump -Fc --no-acl --no-owner -h localhost -U #{YOUR_DB_USER_NAME} #{YOUR_DB_DATABSE_NAME} > #{LOCAL_BACKUP_FILE_PATH}"

I have put a rake task that automates all these steps.

After thing your might try is use worker(DelayedJob). I guess you can configure your workers to run every 24 hours. I think workers don't have the restriction of 30 seconds limit. But I am not sure about the memory usage.

0

精彩评论

暂无评论...
验证码 换一张
取 消