As part of our load procedure we have various packaged Oracle procedures called by Informatica, in order to perform such tasks as:
- Analyzing objects
- Moving tables
- Rebuilding indexes
When a patch from 126.96.36.199 to 188.8.131.52 was installed, something deep within the bowels of the package went *twang* and the jobs that run the analyze statements started to fail. Or in other words, stopped to work. Or … never mind. It broke.
The problem was a Method_Opt clause of “FOR TABLE”. What that? Well, it’s not in the documentation, but in 184.108.40.206 and 10.1.0.2 it appears to be equivalent to “FOR ALL COLUMNS SIZE AUTO”. I’m not sure what we’re doing using it, but I’m sure that it seemed like a good idea at the time. Anyway, it’s ouda here, and while I’m sniffing around the code I’m going to be running some 10046 traces to see what’s really going on with DBMS_STATS and some of it’s options.
For one thing I want to get rid of collecting histograms for SKEWONLY columns, and just go with all of them. Small-scale tests suggest that the optimizer gets much better statistics to work with when all indexed columns on the fact table are analyzed, and I suspect that the amount of work involved in detecting column value skew is not much different that that involved in collecting the statistics themselves.
Another issue is parallelism. Since a parallelised full table scan (for reasonably large tables anyway) uses direct i/o, there is no caching of the table’s blocks. Fine for tables large enough to flood the block buffers, because physical reads would just keep occuring anyway, but where a table is around three-quarters the size of the block buffers I’m going to experiment with non-parallel statistics gathering, and hopefully this will reduce physical i/o during “the gathering”.
Actually I’ve been doing a smiliar thing already, by setting parallelism on tables based on what proportion of the block buffers their total size, or in some cases their average partition size, represents. The tweaking of the DBMS_STATS parallelism just represents a modification of that principle to reflect the multiple back-to-back reads of the analyzed segment. Maybe that’s another posting though.
Anyhoo, more later.