I was asked in an email this morning about how to improve materialized view refresh performance, and it’s something that appears in Google hits for the blog quite frequently. So I thought I’d write a quick-and-dirty posting to summarise some of the research (and the resulting practical applications) that I’ve been fiddling with recently.
Here are some bullet points:
- The materialized view fast refresh mechanism is a one-size-fits-all solution, and is probably not efficient for 99% of summary table maintenance operations.
- The join of the aggregated change data to the MV is function-based, as the columns of both relations are wrapped in the Sys_Op_Map_NonNull() function that allows “null = null” joins. I think that it is extremely unlikely that anyone has nullable attribute columns in their fact or summary tables, so this (and the composite function-based index required to support it) are a waste of resources.
- Because of the nature of the join it seems to be extremely unlikely that partition pruning of the summary table could take place.
- The join mechanism promotes nested loop joins, where a hash join is probably more efficient (that’s technically an outer join in the merge, of course).
- The refresh mechanism assumes that a merge will be required, when sometimes an insert is not only possible but is very much more efficient.
If performance (and robustness, IMHO) are an issue for you then I would advise that you do the following:
- Use materialized views only for enabling query rewrite (which means creating them on a prebuilt summary table, and unless you are loading to the summary by partition exchange then you have to drop the MV, refresh the summary table, and recreate the MV). In 10g it is much more easy to use the DBMS_Advanced_Rewrite package instead of MV’s.
- Write your own refresh code, based on the usual principles of writing good SQL. If you don’t need a merge then don’t use it. If you don’t need to join to dimension tables to get higher attributes then don’t do it.
- Leverage different levels of aggregation to help produce higher levels. For a series of hierarchical summaries, multi-level aggregations can be extremely beneficial.
- Consider storing the refresh and MV definition SQL in CLOB columns of a summary management table, so they can be tuned and edited without needing to open up package code to do so.
- Consider using a complete refresh, either through MV’s or manually, for higher aggregation levels, particularly when you can reference another summary table to do so.
My practical experience of taking this path delivered a reduction in the refresh time of a set of seven materialized views based on a single fact table, from a couple of hours down to six minutes. The publication of this is an example of what my client’s technical lead calls “polishing the crown” :D
Anyway, this is all a sort-of abstract from a whitepaper that I’m working on right now, which will include all sorts of proofs and demonstrations of the above, plus some process and risk management advice, but it doesn’t look like it’ll be ready very soon. Feel free to comment or ask questions on the points raised above though — I don’t want anyone to be kept in suspenders ’till the paper is finished.