Here‘s an example of a problem that is probably caused by too much denormalization. In many cases designers are trying to get fast access to an aggregated number by storing it in a seperate table — from my own observation it’s usually a SUM(), but here we have a MAX() . Same principle.
Often an attempt is made to use a trigger to maintain a higher-level table by reading the table on which the trigger acts. Here’s some issues with this.
- Obviously there is a mutating table condition here, which requires complex code to work around. In this case the requirement could actually be fulfilled by just querying the aggregate table of course, to see if the new version number is higher than that already stored there.
- It is not very obvious to a reader of the application code how the higher level aggregate is maintained, because the code that performs the maintenance is nowhere near the code that modifies the triggered table. So once you have established the practice of using triggers as part of an application you have instantly made the maintenance job more difficult.
- To get an accurate value stored in the aggregate table you must serialise the application on the update operation.
- Oh, there’s just so many more. I don’t like triggers for application code.
If the aim is to be able to quickly work out what the aggregate value is for a particular key or set of key values then scanning the detail records may actually be a fast option. An index range scan is likely to be used (providing that the appropriate indexing is in place of course) and the “trick” to good performance is to ensure that the rows for the detail records are co-located in the table. And how do we ensure such a thing? Well, typically with a hash cluster. In fact if the detail table is often subjected to range scans on the some key value (as might be the case with an invoice detail table, or an employee’s pay records, to take two examples), then a hash cluster may be beneficial in many other queries.
You’d be a fool to go taking my word* for this of course, so if this is all jibba-jabba to you then go and take half-an-hour to read the relevant sections of the Concepts Guide (link in side panel), then go play around with the structure. Pay attention to the possible negative impacts of such a beast also — remember, if hash clusters were 100% advantageous and 0% disadvantageous then every table would have one. Evidently, that is not the case.
* To be fair to me, I should say “don’t take anyone’s word for it”. :)