开发者

What is the best way to migrate data from BigTable/GAE Datastore to RDBMS?

开发者 https://www.devze.com 2023-04-12 20:48 出处:网络
Now that Google has announced availability of Cloud SQL storage for app engine, what will be the best way to migrate existing data from BigTable/GAE Datastore to MySQL?

Now that Google has announced availability of Cloud SQL storage for app engine, what will be the best way to migrate existing data from BigTable/GAE Datastore to MySQL?

To respond to the excellent questions brought up by Peter:

  • In my particular scenario, I kept my data model very relational.
  • I am fine with taking my site down for a few hours to make the transition, or at least warning people that any changes they make for the next few hours will be lost due to database maintenance, etc.
  • My data set is not very large - the main dashboard for my app says .67gb, and the datastore statistics page says it's more like 200mb.
  • I am using python.
  • I am not using the blobstore (although I think that is a separate question from a pure datastore migration - one could migrate datastore usage to MySql while maintaining the blobstore).
  • I would be fine with paying a reasonable amount (say, less than $100).
  • I believe my application is Master/Slave - it was created during the preview period of App Engine. I can't seem to find an easy way to verify that though.

It seems like the bulk uplo开发者_StackOverflowader should be able to be used to download the data into a text format that could then be loaded with mysqlimport, but I don't have any experience with either technology. Also, it appears that Cloud SQL only supports importing mysqldumps, so I would have to install MqSQL locally, mysqlimport the data, then dump it, then import the dump?

An example of my current model code, in case it's required:

class OilPatternCategory(db.Model):
    version = db.IntegerProperty(default=1)
    user = db.UserProperty()
    name = db.StringProperty(required=True)
    default = db.BooleanProperty(default=False)

class OilPattern(db.Model):
    version = db.IntegerProperty(default=2)
    user = db.UserProperty()
    name = db.StringProperty(required=True)
    length = db.IntegerProperty()
    description = db.TextProperty()
    sport = db.BooleanProperty(default=False)
    default = db.BooleanProperty(default=False)
    retired = db.BooleanProperty(default=False)
    category = db.CategoryProperty()

class League(db.Model):
    version = db.IntegerProperty(default=1)
    user = db.UserProperty(required=True)
    name = db.StringProperty(required=True)
    center = db.ReferenceProperty(Center)
    pattern = db.ReferenceProperty(OilPattern)
    public = db.BooleanProperty(default=True)
    notes = db.TextProperty()

class Tournament(db.Model):
    version = db.IntegerProperty(default=1)
    user = db.UserProperty(required=True)
    name = db.StringProperty(required=True)
    center = db.ReferenceProperty(Center)
    pattern = db.ReferenceProperty(OilPattern)
    public = db.BooleanProperty(default=True)
    notes = db.TextProperty()

class Series(db.Model):
    version = db.IntegerProperty(default=3)
    created = db.DateTimeProperty(auto_now_add=True)
    user = db.UserProperty(required=True)
    date = db.DateProperty()
    name = db.StringProperty()
    center = db.ReferenceProperty(Center)
    pattern = db.ReferenceProperty(OilPattern)
    league = db.ReferenceProperty(League)
    tournament = db.ReferenceProperty(Tournament)
    public = db.BooleanProperty(default=True)
    notes = db.TextProperty()
    allow_comments = db.BooleanProperty(default=True)
    complete = db.BooleanProperty(default=False)
    score = db.IntegerProperty(default=0)

class Game(db.Model):
    version = db.IntegerProperty(default=5)
    user = db.UserProperty(required=True)
    series = db.ReferenceProperty(Series)
    score = db.IntegerProperty()
    game_number = db.IntegerProperty()
    pair = db.StringProperty()
    notes = db.TextProperty()
    entry_mode = db.StringProperty(choices=entry_modes, default=default_entry_mode)


Have you considered using the Map Reduce framework? You could write mappers that store the datastore entities in CloudSQL. Do not forget to add a column for the datastore key, this might help you avoiding duplicate rows or identifying missing rows.

You might have a look at https://github.com/hudora/gaetk_replication for an inspiration on the mapper functions.

0

精彩评论

暂无评论...
验证码 换一张
取 消