开发者

Naming conventions for non-normalized fields

开发者 https://www.devze.com 2023-02-19 15:54 出处:网络
Is it a common practice to use special naming conventions when you\'re denormalizing for performance?

Is it a common practice to use special naming conventions when you're denormalizing for performance?

For example, let's say you have a customer table with a date_of_birth column. You might then add an age_range column because sometimes it's too expensive to calculate that customer's age range on the fly. However, one could see this getting messy because it's not abundantly clear which val开发者_StackOverflow中文版ues are authoritative and which ones are derived. So maybe you'd want to name that column denormalized_age_range or something.

Is it common to use a special naming convention for these columns? If so, are there established naming conventions for such a thing?

Edit: Here's another, more realistic example of when denormalization would give you a performance gain. This is from a real-life case. Let's say you're writing an app that keeps track of college courses at all the colleges in the US. You need to be able to show, for each degree, how many credits you graduate with if you choose that degree. A degree's credit count is actually ridiculously complicated to calculate and it takes a long time (more than one second per degree). If you have a report comparing 100 different degrees, it wouldn't be practical to calculate the credit count on the fly. What I did when I came across this problem was I added a credit_count column to our degree table and calculated each degree's credit count up front. This solved the performance problem.


I've seen column names use the word "derived" when they represent that kind of value. I haven't seen a generic style guide for other kinds of denormalization.

I should add that in every case I've seen, the derived value is always considered secondary to the data from which it is derived.


In some programming languages, eg Java, variable names with the _ prefix are used for private methods or variables. Private means it should not be modified/invoked by any methods outside the class.

I wonder if this convention can be borrowed in naming derived database columns.

In Postgres, column names can start with _, eg _average_product_price.

It can convey the meaning that you can read this column, but don't write it because it's derived.


I'm in the same situation right now, designing a database schema that can benefit from denormalisation of central values. For example, table partitioning requires the partition key to exist in the table. So even if the data can be retrieved by following some levels of foreign keys, I need the data right there in most tables.

Maybe the suffix "copy" could be used for this. Because after all, the data is just a copy of some other location where the primary data is stored. Since it's a word, it can work with all naming conventions, like .NET PascalCase which can be mapped to SQL snake_case, e. g. CompanyIdCopy and company_id_copy. And it's a short word so you don't have to write too much. And it's not an abbreviation so you don't have to spell it or ever wonder what it means. ;-)

I could also think of the suffix "cache" or "cached" but a cache is usually filled on demand and invalidated some time later, which is usually not the case with denormalised columns. That data should exist at all times and never be outdated or missing.

The word "derived" is just a bit longer than "copy". I know that one special DBMS, an expensive one, has a column name limit of 30 characters, so that could be an issue.


If all of the values required to derive the calculation are in the table already, then it is extremely unlikely that you will gain any meaningful (or even measurable) performance benefit by persisting these calculated values.

I realize this doesn't answer the question directly, but it would seem that the premise is faulty: if such conditions existed for the question to apply, then you don't need to denormalize it to begin with.

0

精彩评论

暂无评论...
验证码 换一张
取 消