开发者

How do you find the limiting factor of a program?

开发者 https://www.devze.com 2023-03-10 06:54 出处:网络
I\'ve created a program that parses data from a file and imports it into a relational postgresql database.

I've created a program that parses data from a file and imports it into a relational postgresql database. The program has been running for 2 weeks and looks like it has a few more days left. It is averaging ~150 imports a second. How can I find the limiting factor and make it go faster? The CPU for my program does not go above 10%, The Memory does not go above 7%. The Postgresql database CPU does not go above 10%, and 25% Memory.

I'm guessing that the limiting factor is the hard-disk write speed, but how can I verify, and if the case; improve it? (short of buying a faster hard drive)

This is the output of "iostat -d -x":

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.59     3.23    0.61    1.55    23.15    38.37    28.50     0.01    5.76   1.04   0.22
sdb               0.02   308.37   21.72  214.53   706.41  4183.68    20.70     0.56    2.38   2.24  52.89

As you can likely guess, the database is on sdb.

EDIT: The file I am parsing is ~7GB. For most (but not all) of the data in the file I go line by line, here is an example"

  1. Return the ID of partA in ta开发者_开发百科bleA.
    • If ID does not exist insert partA into tableA returning ID
  2. Return the ID of partB in tableB.
    • If ID does not exist insert partB into tableB returning ID
  3. Return the ID of the many-to-many relationship of partA and partB.
    • If the ID of the relationship does not exist create it.
    • else update the relationship (with a date id)
  4. move onto the next line.

To save many queries, I save the IDs of inserted PartA and PartB items in memory to reduce lookups.

Here is a copy of my postgresql.conf file: http://pastebin.com/y9Ld2EGz The only things I changed where the default data directory, and the memory limits.


You should have killed this process many, many days ago and asked not how to find the limiting factor of the app but rather:

Is there a faster way to import data into pgSQL?

Take a look at this question and then the pgSQL COPY documentation. It's possible that the import process you're running could be achieved in hours rather than weeks with the proper tools.

By the way, regardless of what RDBMS you're using, programmatic insertions of data are never as performant as native tools provided by the RDBMS's vendor to handle bulk operations such as these. For example: SQL Server has bcp, DTS/SSIS and a few other options for bulk data import/export. Oracle has its own etc.

0

精彩评论

暂无评论...
验证码 换一张
取 消