Here is a thought that is brought on by this posting at the Dizwell forum.
The question seem to me to boil down to this essential issue: “Can a large memory allocation compensate for poor I/O performance in a data warehouse?”. It also seems to me that the answer to this question is a partly-qualified “No”, for the following reasons.
Firstly, parallel query reads blocks directly from disk, and does not look for them in the buffer cache. Serial reads would therefore be required to populate the buffer cache, and a serial process with wait times of between 180 and 250ms and multiblock reads of 1MB (to take a generous number) is going to read around 5-6MB/sec — at that rate it would take in the order of two and a half hours to populate the buffer cache.
Secondly there is little hope of caching a respectable portion of the fact tables of a 1.4 TB data warehouse. A 50GB buffer cache only represents a few percent of the total data set.
So if a disk upgrade is not feasible, what can be done to provide acceptable performance? My advice in this situation would be to throw all of the team’s efforts into the creation of summary tables. In fact I’d suggest that this is a critical issue — there is no way that the system as described is going to be usable without the enormous reduction in required disk bandwidth that summary tables represent.
Now I also believe that what is required in a system where summary tables represent a critical performance enhancement is an entire structure for their creation and maintenance, comprising the following elements:
- Identify opportunities for summarisation
- Define summary queries
- Create the summary and its supporting maintenance code
- Monitor the summary usage
There are some automated tools in Oracle for handling the first two stages, for example through the DBMS_OLAP.Recommend_MView_Strategy procedure. However I haven’t used them, and have prefered instead to use reporting metrics to find reports that generally take a long time to execute or are executed frequently (the metric of report executions multiplied by report duration is a handy one), then looked for opportunities for providing summary definitions to improve their performance.
For efficient summary maintenance I default to hand-crafted approaches, with SQL statements stored in CLOB columns of an MV metadata table for handling the loading of new data, rather than relying on the standard materialized view fast refresh mechanisms. There are a few articles on the blog about this already.
In terms of monitoring the summaries I like to use segment statistics. There is generally a rough correlation between the logical i/o’s required against a summary table and those that would have been required against the master table(s), and an “effectiveness ratio” can be estimated by comparing the two — just use a hint to prevent rewrite against the summary table for a sample of rewritable queries and see how much work is being saved. In the case of cached summary data versus uncached fact table data where the disk subsystem is known to be very poorly performing the absolute number of physical reads can be more significant. Another consideration is that a given summary table may provide extremely high savings in comparison to a non-rewritten execution but only marginal savings in comparison to a rewrite against a different summary. In that case the value of maintaining the summary might be questionable.
So, the qualification to the original answer of “No” is due to the much smaller size of the summary tables, which make it more feasible to avoid using parallel query against them, and thus there are benefits to be gained from a larger buffer cache. However as a guide to right-sizing the cache I would be relying much more on Oracle’s built in buffer cache advice functionality.