The situation:
- I need to do a procedural sort of a query result set.
- The data set size/access frequency does not allow this sort to occur in application memory.
- I want a shared library written in C to function as the ORDER BY parameter in the query. It should accept some fields from the row being sorted and assigns a score, with the result dependent on what has been read already.
So: how to handle heap data in a 开发者_开发问答PostgreSQL shared library which should persist within a query but not between them?
The DBMS will determine whether the ORDER BY clause means that the data is kept in memory or spilled to disk. It is highly unlikely that you can alter that by a stored procedure invoked in the ORDER BY clause of your query. It is also completely unclear to me whether your hypothetical procedure would try to keep the data in memory or spill it to disk. You should let the DBMS do the sorting; its sort is usually fairly well tuned. You just need to ensure that it (the DBMS) can do the comparison you need.
Unfortunately the sort is procedural; SQL just can't do it. The data in question is memo data for the procedural sort and PostgreSQL has no knowledge of its existence.
If you can write a stored procedure (or C function) that takes in the 'memo data' and generates a sortable string (or other type, but string is most plausible), then you can evaluate the function on the data in the select-list, and have SQL sort by the result value. The procedure will have to determine a stable value for the string based solely on one row at a time.
SELECT t.id, t.memo_data, magic_function(t.memo_data) AS sortable
FROM SomeTable AS T
ORDER BY sortable;
You might have to specify the function in the ORDER BY clause, or fall back on the 'ordinal position' sort (ORDER BY 3). You write the C code that SQL knows as magic_function()
.
Note that this function must operate on only a single value (or, more accurately, the arguments it is passed from a single row of data at a time). It is not usually feasible to make it depend on any other rows. It must be an invariant function - given the same input, it must always produce the same output. If you don't do that, you are going to get quasi-random results.
You may need to look up 'memory duration'. You might, conceivably, be able to allocate memory with 'statement duration', which the function could use, but you then need to consider how that is initialized and released. You might need to look at the manual on Memory Management.
精彩评论