More on Partition Key Statistics, and an Oddly Harmful Constraint

As a followup to my previous post on the importance of gathering partition-level statistics on partition key columns it occured to me that if a table is list partitioned with a single value per partition then Oracle theoretically does not need statistics on that column to infer that only a single value is present.

Also I didn’t give any scripts to demonstrate the benefit of partition-level column statistics so I thought I’d combine the two matters into a single demonstration script.

In the script I create two dimension tables and a range-partitioned fact table, which is populated with 12,000 row per partition. The fact table is analyzed, excluding the collection of statistics on the partitioning key of MONTH_OF_SALE.

A query is executed to select all rows from a single partition of the fact table, then column statistics are gathered for the table, and the same query re-executed.

This is repeated for the same table structure and data except that list partitioning is used instead of range partitioning.

Here is the result on 10.2.0.1.0, in which the absence of partition key column statistics leads to an underestimation of the cardinality by a factor of 100 whether the partitioning mechanism of list or range .

So I conclude that using list partitioning has no effect on the need to gather statistics for partition key columns.

To be fair though, this is a rather special case as list partitioning does not have to be confined to a single value per partition, and for multiple values it would obviously be necessary to collect partition-level column statistics. Furthermore list partitioning is currently much more restrictive than range partitioning, which allows multicolumn partition keys and composite partitioning, so if there was an optimisation in place for this scenario then it would be of less importance.

Finally, on this topic, it is trivial to devise a range partitioning scheme in which each partition is logically confined to a single possible value but it seems very doubtful that the optimizer would recognise this as a special case. I didn’t test it.

An Oddly Harmful Constraint

Moving along, I thought it would be interesting to see what effect the application of a check constraint might have on this scenario. You’ll note that the month_of_sale attribute is a date in which the day is set to the first of the month. There are, of course, many other representations that can be used, (“2006-03″, “March 2006″, “Y2006 M03″ etc) which each have their own pros and cons, but what would be the effect of applying a check constraint to this date column to prevent other day numbers from being used? Will Oracle recognise that the presence of the constraint reduces the theoretical range of values that the column can hold, and modify the cardinality in some way?

I modified the test script by adding appropriate check constraints to the dimension and fact table and included an additional SELECT: Predicating on the fact table month_of_sale instead of the dimension table month_of_sale — just a speculative test, and not expecting anything to change.

Here is the new test script, and here is the result showing a decrease in the estimated cardinality of the result.

In some ways this shouldn’t be a surprise. See comments on this previous post for more information on functions affecting estimated cardinality.

However, what I do find to be a little odd and rather disturbing about this effect is that when the optimizer applies the check constraint as one or more additional filter predicates on the query it appears to infer that the query is only going to access a subset of rows in the table — those matching the check constraint. It seems to me that it fails to take into account that the table is constrained to store only such values as meet the check condition, therefore the filter predicate ought at best to be “cardinality neutral” in this case. The optimizer ought to be inferring from this constraint “fewer unique values than you expect”, but in fact it is inferring the opposite.

On the other hand it correctly infers a reduction in the cardinality if the value of the user-supplied predicate on the month_of_sale does not meet the check constraint, as this script and it’s result demonstrate. It might go a little further and infer zero rows instead of one, and maybe avoid all the unpleasantness of actually accessing the database at all. In this case it does perform the query, though — taking the tablespace offline gives a “ORA-00376: file 9 cannot be read at this time”.

About these ads

5 thoughts on “More on Partition Key Statistics, and an Oddly Harmful Constraint

  1. hello David…I understand that you are a master of table partioning….plz take my question…the Oracle Literature (oracle 9i – datawarehousing) glorifies the Table Partion-ing a lot to improvise on performance (DWH as well as OLTP system)…but I am told by some that there are many hardware constraints in the partioning implementation…like there needs to be multi-processor for eg? Am not clear with the hardware specification reqmnt, to partion tables…plz enlighten….

  2. Well I wouldn’t say there were that many constraints, certainly not on processors. In DWh we use partitioning mostly for partition pruning, ie. for accessing parts of a table through multiblock reads rather than through index-based single block reads. Whether that means a higher of a lower processor load is tricky to say — if there are high processor loads they tend to follow from use of parallelism (which you don’t want to be too aggressive over) and through calculating many metrics through aggregation. Partitioning doesn’t really impact that..

  3. Pingback: Log Buffer #26: a Carnival of the Vanities for DBAs · Steve Karam · The Oracle Alchemist

  4. Pingback: A Fix for Check Constraints That Harm Cardinalities « The Oracle Sponge

  5. Could you plz advice me on
    1) How can we split up the “catchall” to get it back to the regular structure? What is the impact? Does this table have to remain unused while we take the required steps?
    2) How do we maintain this going forward. Since the partitions are built on predictable keys, can we build them into the future and do that periodically?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s