I am in need of database help once again. Yes Efficiency matters A LOT.
I have a table
Lets say. where ID is the primary key.
Id Activity Time
1. Lunch 2:00
2. Lunch 开发者_开发问答 1:00
4. Lunch 4:00
5. Meeting 1:00
6. Meeting 12:00
7. Meeting 3:00
8. break 1:00
I would like the query to return.
Every single activity once with the latest time.
So the answer would be.
4. Lunch 4:00
7. Meeting 3:00
8. break 1:00
I have done it with
Nesting select where I would return the whole table and then selectfirst_value(table) over (partition by activity order by time desc)
But I am looking for a more efficient way to do this
Thank you.
Yes
select activity, max(time) from mytable group by activity
would work perfectly, but it also kills the amount of time it takes to retrieve data, is there any way that is MUCH faster?
Re-writing simple queries to make them faster is almost never the answer. The question you want to ask is "what other changes can I make to the database that will make this query faster."
As others have suggested, you can always create a materialized view. However, since this is an aggregate, you won't be able to fast-refresh it. That means you'll probably have to decide if letting the result get out-of-sync periodically is acceptable.
Another possibility is adding an index on ID, activity and time, which will effectively pre-sort the data, allowing the optimizer to jump between groups, rather than scanning each row.
Comments response: You can create an index with syntax like create index ak_activities_id_activ_time (id, activity, time);
. To understand why this would help, you need to first understand some database fundamentals.
Databases store the table data without any organization at all. If you query a table for a specific value, it literally has to look at every row in the table to see if it contains that value. An aggregate query has to do a little more work: it has to sort the data into groups, then apply the aggregate function to the appropriate column(s) to find the answer your looking for.
Indexes improve this by creating a table-like object behind the scenes where it keeps an ordered set of every unique value from the specified column, along with the address of the rows that value is found in. Once you have that index, a search for a value in that column can be looked up in the index without looking at every row. Since the values are ordered, the database can use search logic rather than looking at every value.
When you have multiple columns in an index, it creates a tree of values. Each value in the first column listed is only found in the index once. Each value in the second column appears once for every value in the first column that it is paired with. This pattern continues for as many columns as you list.
This helps when your performing an aggregate, because it removes the need for ordering and grouping: that's already done by the index. It can further help when you're looking for minimums or maximums because, by definition that's the first or last entry in that branch of the index. Please note that this is only true if all of the grouping columns are present in the index.
So, why wouldn't you index everything? The answer is that indexes are a trade-off. A well designed index will speed up some queries, but every index will take up disk space and slow down inserts and updates (the index will have to be modified along with the table). Usually this isn't too significant, but creating too many indexes on a table will cause noticable performance issues.
Now that I've explained some basic database concepts, I have a recommendation: hire someone that knows what they're doing to work on your database. Tuning databases is an entirely different skill set than programming and asking an application programmer to do that work is usually a waste of time and money. Even if it's just a consultant that can come in to work on specific problems, having someone available that can access your database and understands the theory behind it will be far more productive than feeling your way in the dark.
select activity, max(time) from mytable group by activity
select max(id), activity, max(time) from yourTable group by activity
this is enough is your table is organized as you have shown. however, if your ids or times are not sequential, it will break.
精彩评论