开发者

How to improve the speed of a loop containing a sqlalchemy query statement as conditional

开发者 https://www.devze.com 2022-12-25 23:00 出处:网络
This loop checks if a record is in the sqlite database and builds a list of dictionaries for those records that are missing and then executes a multiple insert statement with the list. This works but

This loop checks if a record is in the sqlite database and builds a list of dictionaries for those records that are missing and then executes a multiple insert statement with the list. This works but it is very slow (at least i think it is slow) as it takes 5 minutes to loop over 3500 queries. I am a complete newbie in python, sqlite and sqlalchemy so I wonder if there is a faster way of doing this.

list_dict = []

session = Session()

for data in data_list:
    if session.query(Class_object).filter(Class_object.column_name_01 == data[2]).filter(Class_object.column_name_00 == an_id).count() == 0:
        list_dict.append({'column_name_00':a_id,
       开发者_开发问答                   'column_name_01':data[2]})

conn = engine.connect()
conn.execute(prices.insert(),list_dict)
conn.close()
session.close()

edit: I moved session = Session() outside the loop. Did not make a difference.

SOLUTION:

thanks to mcabral answer I modified the code as:

existing_record_list = []
list_dict = []

conn = engine.connect()
s = select([prices.c.column_name_01], prices.c.column_name_00==a_id)
result = conn.execute(s) 
for row in result:       
    existing_record_list.append(row[0])

for data in raw_data['data']:
    if data[2] not in existing_record_list:
        list_dict.append({'column_name_00':a_id,
                          'column_name_01':data[2]}

conn = engine.connect()
conn.execute(prices.insert(),list_dict)
conn.close()

This now takes 6 seconds. That is some improvement!!


3500 queries seems a big number,

Have you consider fetching all entities in one query? Then you will be iterating over a list in memory, and not querying the database for each item.


Glad you found something that works, as an extra 2 cents:

I agree with mcabral. As a general rule, if you are putting a query inside of a loop, you are asking for trouble. Popular SQL DBs are generally optimized for data acquisition. Looping through a query generally indicates that you are procedurally doing something that should/could be done with either a single query or a string a queries that put data into each other.

There are exceptions to this, but from my experience they are usually few and far between... Every time I ran a query through a loop, I regretted it later.

0

精彩评论

暂无评论...
验证码 换一张
取 消