I am using python/django as for programming language/framework. What i need to know is totally about postgresql and indexing...
For those who uses django probably know Content Type and Django Admin Log. But shortly, Admin log is logging user action. I also use it for logging all actions performed within the site. So it has 1.000.000+ records. I use sql queries to filter results, Thats ok up to here...
Problem is, i use two fields to select data from diffrent tables. One of them is, content type, which stores the related database table info and the field is indexed...
Other field is, object id, which stores id of the related开发者_如何学编程 object, the field type is varchar and field is not indexed...
Examle of usage is :
Select from django_admin_log where content_type_id=15 and object_id="12343545";
Since content_type_id=15 points my blog_texts table and the id of the related object is 12343545, i can easily get related data...
But object_id is not indexed, and table have 1.000.000+ records, a query like i write above takes a lot of execution time.
What will be the benefits and drawbacks of using index in object_id. Will the benefits be greated than drawbacks or not?
UPDATE: I have no updates on admin log table. It just logs each of all user actions... 40.000-45.000 records are inserted to the table each day. And system is really busy during 2/3 of the day, about 15-16 hours (morning to evening). So 45.000 records are inserted during 8.00am to 11.00pm...
So at this point of view, will it cause too much database overwight if i create indexes?
UPDATE 2: One more question. Another table with 2.000.000+ records with a boolean field. Field is something like "wwill it be displayed", and it is used with other filter criteria. Is it logical to create an index for a such boolean field.
Second conditin is, indexing a boolean and a datetime fields together in a table with 1.000.000 records...
Using index for these two condition is a good idea or not?
Just for clarification....
For this particular SQL you should use one index that includes both columns (content_type_id
and object_id
)--concatenated index.
In that case you can drop the existing index that is on content_type_id
only--the new index will be able to server queries that only filter for content_type_id
as well as queries where both columns are filtered.
Two indexes--the existing one and a new one on object_id
only--will probably not give the best result for this query.
EDIT: if you extend the existing index by the object_id
column, the performance penalty for the insert will be negligible but your select will work much faster.
EDIT 2: if you have statements like this
WHERE bool = true
and other one like that:
WHERE bool = true AND date > something
I would suggest an concatenated index on BOOL first then DATE.
columns that are used with inequality comparisons should be moved towards the end of the index.
However, depending on your data it might make sense NOT to index the BOOL field. e.g. if 95% of all rows have true, the above statements would not filter very much. In that case, an index can potentially decrease performance for the select statement. However, a good optimizer will just ignore the index. still the insert/update/delete cost would remain.
Read more about concatenated indexes in my free eBook.
What will be the benefits and drawbacks of using index in object_id.
Benefits will be faster queries. Drawbacks will be slower inserts.
Will the benefits be greater than drawbacks or not?
I dare say that yes, they will, since the increase in Google Ads revenue from faster page loads will certainly be worth the extra second you'll have to wait while posting the blog entry.
Update:
From your post I can conclude that the table receives about 4 records per second at peak hours.
4 records a second on a 1M
table is not a problem for any decent server, so I believe you can safely create the index.
It will be better to create a composite index on (object_id, content_type_id)
.
精彩评论