Performance of table access_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2022-12-13 14:58 出处：网络

We have an application which is completely written in C. For table access inside the code like fetching some values from a table we use Pro*C. And to increase the performance of the application we also preload some tables for fetching the data. We take some input fields and fetch the output fields from the table in general.

We usually have around 30000 entries in the table and max it reaches 0.1 million some times.

But if the table entries increase to around 10 million entries, I think it dangerously affects the performance of the application.

Am I wrong somewhere? If it really affects the pe开发者_运维技巧rformance, is there any way to keep the performance of the application stable?

What is the possible workaround if the number of rows in the table increases to 10 million considering the way the application works with tables?

If you are not sorting the table you'll get a proportional increase of search time... if you don't code anything wrong, in your example (30K vs 1M) you'll get 33X greater search times. I'm assumning you're incrementally iterating (i++ style) the table.

However, if it's somehow possible to sort the table, then you can greatly reduce search times. That is possible because an indexer algorithm that searchs sorted information will not parse every element till it gets to the sought one: it uses auxiliary tables (trees, hashes, etc), usually much faster to search, and then it pinpoints the correct sought element, or at least gets a much closer estimate of where it is in the master table.

Of course, that will come at the expense of having to sort the table, either when you insert or remove elements from it, or when you perform a search.

maybe you can go to 'google hash' and take a look at their implementation? although it is in C++

It might be that you have too many cache misses once you increase over 1MB or whatever your cache size is.

If you iterate table multiple times or you access elements randomly you can also hit lot of cache misses.

http://en.wikipedia.org/wiki/CPU_cache#Cache_Misses

Well, it really depends on what you are doing with the data. If you have to load the whole kit-and-kabootle into memory, then a reasonable approach would be to use a large bulk size, so that the number of oracle round trips that need to occur is small.

If you don't really have the memory resources to allow the whole result set to be loaded into memory, then a large bulk size will still help with the Oracle overhead. Get a reasonable size chunk of records into memory, process them, then get the next chunk.

Without more information about your actual run time environment, and business goals, that is about as specific as anyone can get.

Can you tell us more about the issue?