As a followup to my previous post on the importance of gathering partition-level statistics on partition key columns it occured to me that if a table is list partitioned with a single value per partition then Oracle theoretically does not need statistics on that column to infer that only a single value is present.
Also I didn’t give any scripts to demonstrate the benefit of partition-level column statistics so I thought I’d combine the two matters into a single demonstration script.
In the script I create two dimension tables and a range-partitioned fact table, which is populated with 12,000 row per partition. The fact table is analyzed, excluding the collection of statistics on the partitioning key of MONTH_OF_SALE.
A query is executed to select all rows from a single partition of the fact table, then column statistics are gathered for the table, and the same query re-executed.
This is repeated for the same table structure and data except that list partitioning is used instead of range partitioning.
Here is the result on 10.2.0.1.0, in which the absence of partition key column statistics leads to an underestimation of the cardinality by a factor of 100 whether the partitioning mechanism of list or range .
So I conclude that using list partitioning has no effect on the need to gather statistics for partition key columns.
To be fair though, this is a rather special case as list partitioning does not have to be confined to a single value per partition, and for multiple values it would obviously be necessary to collect partition-level column statistics. Furthermore list partitioning is currently much more restrictive than range partitioning, which allows multicolumn partition keys and composite partitioning, so if there was an optimisation in place for this scenario then it would be of less importance.
Finally, on this topic, it is trivial to devise a range partitioning scheme in which each partition is logically confined to a single possible value but it seems very doubtful that the optimizer would recognise this as a special case. I didn’t test it.
An Oddly Harmful Constraint
Moving along, I thought it would be interesting to see what effect the application of a check constraint might have on this scenario. You’ll note that the month_of_sale attribute is a date in which the day is set to the first of the month. There are, of course, many other representations that can be used, (“2006-03″, “March 2006″, “Y2006 M03″ etc) which each have their own pros and cons, but what would be the effect of applying a check constraint to this date column to prevent other day numbers from being used? Will Oracle recognise that the presence of the constraint reduces the theoretical range of values that the column can hold, and modify the cardinality in some way?
I modified the test script by adding appropriate check constraints to the dimension and fact table and included an additional SELECT: Predicating on the fact table month_of_sale instead of the dimension table month_of_sale — just a speculative test, and not expecting anything to change.
In some ways this shouldn’t be a surprise. See comments on this previous post for more information on functions affecting estimated cardinality.
However, what I do find to be a little odd and rather disturbing about this effect is that when the optimizer applies the check constraint as one or more additional filter predicates on the query it appears to infer that the query is only going to access a subset of rows in the table — those matching the check constraint. It seems to me that it fails to take into account that the table is constrained to store only such values as meet the check condition, therefore the filter predicate ought at best to be “cardinality neutral” in this case. The optimizer ought to be inferring from this constraint “fewer unique values than you expect”, but in fact it is inferring the opposite.
On the other hand it correctly infers a reduction in the cardinality if the value of the user-supplied predicate on the month_of_sale does not meet the check constraint, as this script and it’s result demonstrate. It might go a little further and infer zero rows instead of one, and maybe avoid all the unpleasantness of actually accessing the database at all. In this case it does perform the query, though — taking the tablespace offline gives a “ORA-00376: file 9 cannot be read at this time”.