开发者

Best cloud hosted database solution for 900,000 row database that has to be updated daily? [closed]

开发者 https://www.devze.com 2023-03-12 16:07 出处:网络
Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed. This question is off-topic. It is not currently accepting answers.

Want to improve this question? Update the question so it's on-topic for Stack Overflow.

Closed 11 years ago.

Improve this question

A company we deal with sends us a ~900,000 row CSV daily of their product listings.

I want to store this in the cloud with someone else handling patching, administration, etc. The underlying engine does not matter (mysql, sql server, mongo couchdb, etc.).

The major requirement though is that there is some way to automatically flush and load the database from CSV without doing 900,000 INSERT statements or the equivalent every day. Like with SQL Server, we could use bcp, or with mySQL, we could do mysqlimport. The listings change so much from day to day, that doing a diff of today's vs. yesterday's doesn't make sense.

It will only be queried 400-500 times per day and not concurrently. Just a one off query about 400-500 times per day. But the data all has to be there and updated daily.

Any suggestions? We're looking into mongohq, windows azure, xero开发者_开发知识库und, and stuff like that.


If there's only 4-500 queries a day, do you have control over when they happen? 900,000 rows is not a lot by todays standards.

If it were me, I'd simply load the table in to an existing DB under the name table_new, and then once it's loaded, I'd rename the original table to table_old and finally the table_new to table.

Your switch over takes minimal time, and you have no downtime waiting for the table to load. While it's loading, the original table remains in play. Finally, when it's all done, drop table_old.

If you have relationships to the rename table, the simplest solution is to simply drop them for production. Keep them up for development and testing, but simply strive to ensure that the relations are always consistent so the DB doesn't have to. No big deal.

The modern SQL DBs support this, can't say about the others really.


You could try mongolab as well


I'd go with MongoDB. There are lots of MongoDB hosting options including cloud services.
Mongo has a bulk import tool mongoimport which can load data directly from a CSV file.

Depending on the size of your records, it should take about a minute to import 900,000 rows into MongoDB, then another minute or so to create the necessary index for your query.

To minimize downtime, import into a new collection products.import, then once the import has finished drop the old products collection and rename products.import to products.

0

精彩评论

暂无评论...
验证码 换一张
取 消