开发者

long running queries: observing partial results?

开发者 https://www.devze.com 2022-12-18 05:53 出处:网络
As part of a data analysis project, I will be issuing some long ru开发者_C百科nning queries on a mysql database. My future course of action is contingent on the results I obtain along the way. It woul

As part of a data analysis project, I will be issuing some long ru开发者_C百科nning queries on a mysql database. My future course of action is contingent on the results I obtain along the way. It would be useful for me to be able to view partial results generated by a SELECT statement that is still running.

Is there a way to do this? Or am I stuck with waiting until the query completes to view results which were generated in the very first seconds it ran?

Thank you for any help : )


In general case the partial result cannot be produced. For example, if you have an aggregate function with GROUP BY clause, then all data should be analysed, before the 1st row is returned. LIMIT clause will not help you, because it is applied after the output is computed. Maybe you can give a concrete data and SQL query?


One thing you may consider is sampling your tables down. This is good practice in data analysis in general to get your iteration speed up when you're writing code.

For example, if you have table create privelages and you have some mega-huge table X with key unique_id and some data data_value

If unique_id is numeric, in nearly any database

create table sample_table as
select unique_id, data_value
  from X
 where mod(unique_id, <some_large_prime_number_like_1013>) = 1

will give you a random sample of data to work your queries out, and you can inner join your sample_table against the other tables to improve speed of testing / query results. Thanks to the sampling your query results should be roughly representative of what you will get. Note, the number you're modding with has to be prime otherwise it won't give a correct sample. The example above will shrink your table down to about 0.1% of the original size (.0987% to be exact).

Most databases also have better sampling and random number methods than just using mod. Check the documentaion to see what's available for your version.

Hope that helps, McPeterson


It depends on what your query is doing. If it needs to have the whole result set before producing output - such as might happen for queries with group by or order by or having clauses, then there is nothing to be done.

If, however, the reason for the delay is client-side buffering (which is the default mode), then that can be adjusted using "mysql-use-result" as an attribute of the database handler rather than the default "mysql-store-result". This is true for the Perl and Java interfaces: I think in the C interface, you have to use an unbuffered version of the function that executes the query.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号