I am dealing with an application with huge SQL queries. They are so complex that when I finish understanding one I have already forgotten how it all started.
I was wondering if it will be a good practice to pull more data from database and make t开发者_开发百科he final query in my code, let's say, with Python. Am I nuts? Would it be that bad for performance?
Note, results are huge too, I am talking about an ERP in production developed by other people.
Let the DB figure out how best to retrieve the information that you want, else you'll have to duplicate the functionality of the RDBMS in your code, and that will be way more complex than your SQL queries.
Plus, you'll waste time transferring all that unneeded information from the DB to your app, so that you can filter and process it in code.
All this is true because you say you're dealing with large data.
I would have the business logic in the application, as much as possible. Complex business logic in queries are difficult to maintain. (when I finish understanding one I have already forgotten how it all started)Complex logic in stored procedures are ok. But with a typical python application, you would want your business logic to be in python.
Now, the database is way better in handling data than your application code. So if your logic involves huge amount of data, you may get better performance with the logic in the database. But this will be for complex reports, bookkeeping operations and such, that operate on a large volume of data. You may want to use stored procedures, or systems that specialize in such operations (a data warehouse for reports) for these types of operations.
Normal OLTP operations do not involve much of data. The database may be huge, but the data required for a typical transaction will be (typically) a very small part of it. Querying this in a large database may cause performance issues, but you can optimize this in several ways (indexes, full text searches, redundancy, summary tables... depends on your actual problem).
Every rule has exceptions, but as a general guideline, try to have your business logic in your application code. Stored procedures for complex logic. A separate data warehouse or a set of procedures for reporting.
My experience is that you should let the database do the processing. It will be much faster than retrieving all data from the db first and have some code to join and filter the results.
The hard part here is to document your queries, so someone else (or even you) will understand what is going on if you look at them after some time. I found that most databases allow comments in SQL. Sometimes between /* comment */ tags and sometimes commenting a line with -- comment.
A documented query can look like this
select name, dateofbirth from (
-- The next subquery will retrieve ....
select ....
) SR /* SR SubResults */
....
@Nivas is generally correct.
These are pretty common patterns
Division of labour - the DBAs have to return all the data the business need, but they only have a database to work with. The developers could work with the DBAs to do it better but departmental responsbilities make it nearly impossible. So SQL to do morethan retrieve data is used.
lack of smaller functions. Could the massive query be broken down into smaller stages, using working tables? Yes, but I have known environments where a new table needs reams of approavals - a heavy Query is just written
So, in general, getting data out of the database - thats down to the database. But if a SQL query is too long its going to be hard for the RDBMS to optimise, and it probably means the query is spanning data, business logic and even presentation in one go.
I would suggest a saner approach is usually to seperate out the "get me the data" portions into stored procedures or other controllable queries that populate staging tables. Then the business logic can be written into a scripting language sitting above and controlling the stored procedures. And presentation is left elsewhere. In essence solutions like cognos try to do this anyway.
But if you are looking at an ERP in production, the constraints and the solutions above probably already exist - are you talking to the right people?
One of my app (B) use tempdb to split complex queries to batch. Another app (B) use complex queries without that.
App A is more effective for large DB.
But to take a look to your situation you can exeute DMV script like that:
-- Get top total worker time queries for entire instance (Query 38) (Top Worker Time Queries)
SELECT TOP(50) DB_NAME(t.[dbid]) AS [Database Name], t.[text] AS [Query Text],
qs.total_worker_time AS [Total Worker Time], qs.min_worker_time AS [Min Worker Time],
qs.total_worker_time/qs.execution_count AS [Avg Worker Time],
qs.max_worker_time AS [Max Worker Time], qs.execution_count AS [Execution Count],
qs.total_elapsed_time/qs.execution_count AS [Avg Elapsed Time],
qs.total_logical_reads/qs.execution_count AS [Avg Logical Reads],
qs.total_physical_reads/qs.execution_count AS [Avg Physical Reads],
qp.query_plan AS [Query Plan], qs.creation_time AS [Creation Time]
FROM sys.dm_exec_query_stats AS qs WITH (NOLOCK)
CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS t
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp
ORDER BY qs.total_worker_time DESC OPTION (RECOMPILE);
then you can open query plan for heavies query and try to search something like:
StatementOptmEarlyAbortReason="TimeOut" or StatementOptmEarlyAbortReason="MemoryLimitExceeded"
These facts tell you to split your complex query into batch+tempdb
PS. It works for good queries without index/table scans, missing indexes and so on.
精彩评论