开发者

Improve performance when importing data to MySQL?

开发者 https://www.devze.com 2023-04-13 06:51 出处:网络
I\'m using Django to build a website with a MySQL (MyISAM) backend. The database data is imported from a number of XML files that a开发者_C百科n external script process and output as a JSON-file. Wh

I'm using Django to build a website with a MySQL (MyISAM) backend.

The database data is imported from a number of XML files that a开发者_C百科n external script process and output as a JSON-file. Whenever a new JSON file differ from the old one, I need to wipe the old MySQL-db and recreate it using manage.py loaddata, (at least that's the easy way to do it, I guess I could check the differences between the JSON files and apply those to the database, but I haven't figured out a good solution for this (I'm neither a very good coder nor a web developer)).

Anyway, the JSON file is around 10 Mb, and ends up being about 21,000 rows of SQL (It's not expected to grow significantly). There are 7 tables, and they all look something like this:

class Subnetwork(models.Model):
   SubNetwork = models.CharField(max_length=50)
   NetworkElement = models.CharField(max_length=50)
   subNetworkId = models.IntegerField()
   longName = models.CharField(max_length=50)
   shortName = models.CharField(max_length=50)
   suffix = models.CharField(max_length=50)

It takes up to a minute (sometimes only 30 seconds) to import it into MySQL. I don't know if this is to be expected from a file of this size? What can I do (if anything) to improve perfomance?

For what it's worth, here's some profiler output https://gist.github.com/1287847


There are a couple of solutions, same decent than others, but here is a workaround to keep your system's "downtime" minimal, without needing to write a db synchronize mechanism (which would probably be a better solution in most times).:

  • Create a custom settings_build.py file, with from settings import * that chooses a random name for a new db (probably with the date in the db name), creates it by calling mysqladmin, and update the name into DATABASES.
  • Create a custom django management command (let's call it builddb) by either cloning the loaddata command or calling it, and on successful result, it should write the db name to a dbname text file with one line and executes a shell command that reloads your django (apache/gunicorn/?) server.
  • Modify your settings.py to load the database name from the text file.

And now run your build process like this:

./manage.py builddb --settings=settings_build


I solved it by exporting the processed XML-files to csv instead of json, and then used a separate script that called mysqlimport to do the importing.

0

精彩评论

暂无评论...
验证码 换一张
取 消