Partition Pruning Prevented by Functions on Predicates

A rather straightforward issue here, prompted by a question on the forum at DBA Support.

When you place a predicate on a function of a partition key column then the optimizer is prevented from pruning unless there is a check constraint to indictate logical equivalence between the column and the function of the column.

Here is a test script

Here is the result on 10.2

What I particularly like about this demonstration is the way that it shows both regular check constraints and partition definitions acting together — as Jonathan Lewis demonstrated here the partitioning scheme acts effectively as a series of check constraints on the individual partitions that allow table access to be avoided when it can be deduced that a predicate does not resolve to any partitions at all.

About these ads

44 thoughts on “Partition Pruning Prevented by Functions on Predicates

  1. You actually had two ways of defining partition views – one was the check constraint method you’ve mentioned, the other was to include a where clause on every table in the UNION ALL.

    The check constraint strategy is why I make the point that predicate generation through transitive closure is a feature that worked in 7, but got turned off in 8.

  2. Dave, looking back I have seen some slow queries of this form with a customer – they partition by day and all date values are trunc(sales_dt) which makes the use of trunc in the query redundant…

    Is there merit in using a rely check constraint? it would avoid the need to check the value on data load

  3. Jonathan – true the cardinality of the constraint version is ghastly, but when you get 700 odd partitions there is a strong benefit in just hitting the one you want.

    Of course the ideal is not having functions applied to partition keys – now how to convince users of that?

  4. I was going to wait until I had had another flick through your book and the Hotsos 2006 training day notes before commenting on that :)

    I’m assuming that some generic estimation takes over from measured statistics … there’s really nothing to recommend gratuituous functions on columns, prune-wise or statistics-wise. Also it ought to be demonstrable that Oracle’s too low estimate of the cardinality of the table’s result set causes it to perform a hash join the wrong way round — by hashing the fact table data instead of the dimension table — with the “right” data set.

    Even without such a demonstration the poor statistics ought to be red flag enough to right-minded folk.

  5. I might get away with the odd cardinality estimate as the tables where I store day level detail are partitioned by day – so I would expect 1 day per partition.

    One of my team is running some query tests against three months of production volumne data to see what happens for single and mutiple day selects.

  6. If anyone is interested – on a production volume, but limited date range ( just 112 days) data set on 9.2.0.8 we ran a query to find the total sales value for 1 store out of 200 for a single day. The source table is not indexed.

    If you select on SALES_DT the query takes just under 2 seconds with or without the check constraint. Change to TRUNC(SALES_DT) and the constrained table take about 3.5 seconds and without the check constraint over 6 minutes.

  7. Ah, interesting. No partition pruning on the last of those I suppose. Do you have a trace that shows the difference between the 2 sec and 3.5 sec queries — CPU time in particular?

  8. Not today – and I doubt that we would fit that in until until the middle of next week. But a wild guess would be that the effort to change the time component of the date from midnight to midnight would account for some of the time

  9. David,
    On Generic Estimation: it looks like Oracle is using the same 5% that it uses for col > {unknown} to deal with the calculation of trunc(date_col) > {constant}, hence the cardinality is 1/400 of the number of the rows in the table (0.05 * 0.05 since “between” turns into a combination of “less than” and “greater than”.

  10. Pingback: A weak-end post « Pete-s random notes

  11. This give me some thought…

    I’m trying to optimize a query on a view (well…hope I can make it a partitioned view) based on 3 list-partitioned tables, we have no stats on any , the optimizer is set to CHOOSE.

    The query is like (tab_pkey is the actual partition column of the tables):
    select * from View
    where tab_pkey in (select data from …
    where col=(select max(col1)…
    )
    )

    The thought are:
    - we don’t see table partition pruning: maybe this is because of the 5% rule…
    - if I rewrite the query on a “per table” base, I can prune partition! *STRANGE* or 5% rule?

    Any idea?

  12. Well firstly I’d make sure that I have statistics, global andpartition, on those tables.

    Your view, is it a straight UNION ALL of the three list partitioned tables, or are there additional predicates in there?

    How does the estimated cardinality of that subquery look? How many rows is it returning? Have you tried rewriting the subquery as a materialized subquery factoring clause, or as an inline view?

  13. It’s a straight UNION ALL: actually we don’t see view pruning, but with the add of a fixed column in the view we can make it happen.

    Cardinalities are completely wrong: main table 25M rows, estimated by optimizer 2K…we can not analyze the tables: it’s a diktat..for now.

    I’ll check materialized subquery factoring clause (where to look? any info?), because the inline view doesn’t “prune”…

    Thank you!

  14. You’re almost certainly suffering from bad diktat syndrome there then. Is there some rationale for that? It really amounts to nothing other than sabotaging the system.

  15. It’s only a problem like: What? 4h for the analyze? NEVER!

    But, after a bit testing, it seems that:
    if you got a partitioned view based on partitioned tables the optimizer can’t prune…

    My test was:
    create a 4 list-partitioned table (2 columns: the 1st number part-key, 2nd varchar2 20), table name PART, values for the partition key: 1,2,3,4

    create a one column normal table, table name PART_KEY.

    Analyze them:

    begin
    sys.dbms_stats.gather_table_stats(ownname=>’MTPN’,tabname=>’PART’);
    end;

    the query:
    Select * From (
    Select * From PART
    union all
    Select * From PART
    )
    where pkey=(Select p_value From part_key where p_value=3)

    Result in a partition list all…

    While the “equivalent” query:

    Select * From (
    Select * From PART
    union all
    Select * From PART
    )
    where pkey=3

    resolve in a single partition access…
    Oracle version 9.2.0.5.0

  16. Antonio

    I am a little confused. Are you talking about partition tables or partition views? Because I do not follow what is going on with the UNION ALL select. I would have thought that as written you would be doubling up on the rows returned as both selects in the union all would return the same data.

    When you use the constant where pkey=3 you are could be seeing the predicate being pushed into the in-line view by the CBO which would explain the single partition access

  17. >> It’s only a problem like: What? 4h for the analyze? NEVER!

    How about using a sampling percent like 2 or 3, and considering block based sampling if your data is reasonably homogeneous? I really feel that without statistics that are even vaguely representative then you’re in a very difficult situation.

  18. @Peter
    You’re right!
    When I use pkey=3 I see a single partition access.
    The UNION ALL with the same table is only a test!

    The problem is with a view like this:

    Select * from table_1
    union all
    Select * from table_2

    both table_1 and table_2 are list-partitioned.

    So a query like

    Select * From (
    Select * From table_1
    union all
    Select * From table_2
    )
    where pkey=(Select p_value From part_key where p_value=3)

    does not prune TABLE partition! I find this a bit strange…is’n it?

    It’s like the optimizer can prune view partition (it appends when I add a condition on another column…) but then stop, it doesn’t look for table partition elimination.

    Hope it’s (a bit) more clear now

  19. The latter query uses dynamic partition pruning, and there are some restrictions on when that will be used. The restrictions are heavily dependent on good statistics on the fact table.

  20. @David

    I feel like Don Chichotte and the windmills.

    I keep saying that we must analyze, the answers are (in turn):
    - too expensive (in term of time)
    - too bad result (when I was trying to analyze with some percent value)
    - why analyze? an oracle person say that with we don’t need, we need hints! Find out the good one!

    I understand that dynamic pruning depend on statistics but on a test db even with computed statistics I wasn’t able to prune!

    If I don’t use the view and and join the table (for example I query table_1 and not the view) I can see dynamic pruning

    I repeat: it’s like the optimizer stops at view level and says: I can not go any further, I’m sure that partition pruning can’t be done-> ALL PARTITION ACCESS.

  21. >> I keep saying that we must analyze, the answers are (in turn):

    Sadly you work in a dysfunctional environment, it seems. Your Oracle person is unfortunately not competent if their answer is to avoid statistics and use hints. Have hints demonstrated their worth in this this scenario?

  22. @David

    Absolutely…NOT! :)

    The idea is behind this is:
    - don’t do index: you will save space and the query are all the same “select a large portion of date and then sum”…so index are useless!
    - don’t analyze: you will save time…and the query are all the same “select a large portion of date and then sum”…so analyze is useless!

  23. Well I’m inclined to wonder whether a partition view is amenable to dynamic partition pruning in the way that a partitioned table is? Well I’ll see if a test or two will resolve it.

  24. @David

    Many thanks for your interest.

    I made some “stupid” test and it seems that Oracle doesn’t prune table partition if the table is inside a partitioned-view and the conditions are a bit complex.

    Maybe more testing is required…

  25. Antonio, the difficulty here is deciding what constitutes “a bit complex”. I remember having to deal with a system running 8.1.6, using a partition view of 3 partitioned tables – and it was always a problem trying to get all by the most trivial (filter predicate push) queries to do table partition elimination.

  26. @Jonathan

    You’re right as always there’s not an “absolute complex”, and many times you loose your time only because “Oracle you’re so stupid, can not you see that optimization?”…then most of the time it’s not an Oracle fault…sadly :-(

  27. Partion pruning? Union all views? Surely you jest…

    I’m thinking about the union all view queries I see daily – quarterly partitions, check constraints, millions of rows per quarter. I don’t recall ever seeing even one plan that does partition elimination. It just doesn’t happen where I work. Of course we are using so called daft data types ’20051257′, ’20060111′, etc.

    On the other hand I have been paying attention to the big hitters. It might be interesting to mine v$sql_plan or dba_hist_sqlplan for distinct plans that include references to the one or more tables in the partitioned view.

  28. @Paul

    Same here!

    Now we are working on a different strategy.

    The basic idea is to “cheat” by creating the view with some “information” added.

    More or less:

    Instead of the basic union-all view:
    select * from (
    Select * from tabA
    union all
    Select * from tabB
    ) where pkey=(select etc…)

    we move the table containing the keys inside the partitioned view, like this:

    select * from (
    Select * from tabA where pkey=(select etc…)
    union all
    Select * from tabB where pkey=(select etc…)
    ) where pkey

  29. Sorry if this is too far off topic. If it is, I will try to behave myself in the future.

    I have found the following technique to be useful, and I would be interested if anyone else had any opinions about this. It might fit better in a thread called “one way to force a star join from snowflaked dimensions to a union all view”.

    1. pre-join all of the snowflaked dimensions in an inline view, use no_merge to keep Oracle from mucking with what I know is a good plan

    2. hash join the pre-joined dimensions to the generated inline union all view containing only the relevant tables – this inline view also has a no_merge hint – “hands off, Oracle”
    I don’t like generating the inline view, but it saves several minutes per eliminated partition.

    3. Also use the hint use_hash ( dims, fact ).

    This technique seems useful for large retrievals where partition full table scans are appropriate. For smaller retrievals I’m pretty sure it would also work well – maybe without the use_hash hint – in cases where the tables in the fact union-all view have an index whose leading columns consist of the dimension keys from all three dimensions.

  30. In my case it’s more a legacy thing. We have a partitioned view that was introduced years ago sometime between 1996-1998 I think. The view contains a few years of quarterly partitions of sparse daily sales-and-other-transactions by 400 or so locations by a couple of levels of product keys – mild denormalization of product. The benefit was administration – easy roll off old quarters, add new quarters.

    Check constraints are defined on the quarters, but we tend to not get partition elimination, perhaps because we use a lot of range predicates on partition keys, or maybe the queries are just too complicated. For a particularly greedy extract I’m using an inline union-all view generated at run-time that includes only the relevant quarters. The benefit is several minutes per eliminated quarter per execution. It can add up to as much as 40 minutes or so per execution.

    I happened to use the prejoined snowflake first in a partitioned context, so I posted here.
    At the risk of hijacking the thread – it seems reasonable to expect that prejoining the snowflaked dimensions will work well with a non-partitioned fact table.

    The technique seems to go like this: find an efficient way to join up the all of the dimensions, put that join in an inline view with no_merge hints, join the dimensions inline view to the fact table which in this case happens to be an inline view.

    It seems pretty easy to build up a join of snowflaked dimensions a table at a time, get it running well. More than likely in that set of data you have a full set of keys that can be applied to the fact table – you don’t get Oracle applying 5 tables of the snowflake up front to the fact table and then doing some major reduction on the back end using two dimension tables from the remaining wing of the flake. In this case it’s nice to see a plan that does a bunch of dimension stuff up front and then hash joins to a full table scan of the fact table – in this case some subset of partitions. It also doesn’t hurt that this technique does with one session in a couple of hours what they were using 30 sessions to accomplish in 4 hours not to mention the reduction in load on the database servers.

    Sorry this was so off topic. I will try to stifle myself in the future. Maybe a couple of tattoos across the backs of my hands – Stifle – Yourself.

  31. However since I don’t have the tattoos yet, I’ll answer my own question briefly.

    If the fact table is sparse and the inline view of dimensions is dense then you get a great deal of throwaway involving the dimensions. It might not be a good idea to build up a 49 million row cartesian product of some high level product key X time X location when you know that say 15 million of them are imaginary and will be thrown away. In such a case it might be better to hit the fact table with a subset of dimensions, then hash join to the remaining dimensions.

    Ok, that’s really the end.

  32. David ,
    I landed in this site when I was searching a solution for my problem..which is similar but kinda reversed.

    I have a Range partitioned table in 10.1.0.3, partitioned on a date column. (say one date per partition). When I tried a query, explain plan showed wrong estimated on cardinality and poking into more, I figured there is a check constraint on the table level (partition_key=trunc(partition_key))..

    Here is the script (same behavior in 10gR2 too)

    Create table test_part(
    Edate date,
    sid number,
    pid number,
    c1 char(10)
    )
    partition by range(Edate)
    (
    partition p1 values less than (to_date(’01/30/2007′,’MM/DD/YYYY’))
    tablespace users,
    partition p2 values less than (to_date(’02/01/2007′,’MM/DD/YYYY’))
    tablespace users,
    partition p3 values less than (to_date(’02/02/2007′,’MM/DD/YYYY’))
    tablespace users,
    partition p4 values less than (MAXVALUE)
    tablespace users
    )
    /

    Create index test_part_idx1 on test_part(sid)
    local tablespace users nologging
    /

    Create unique index test_part_idx2 on test_part(edate,sid,pid)
    local tablespace users nologging;
    /

    alter table test_part add constraint datechk check
    (trunc(edate)= edate) enable
    /

    alter table test_part add constraint testcons
    primary key(edate,sid,pid)
    using index enable
    /

    insert /*+ APPEND */ into test_part
    select trunc(sysdate)+mod(rownum,4), rownum sid,rownum pid,rpad(‘x’,10)
    from sys.all_source a, sys.all_source b
    where rownumUSER,
    tabname=>’TEST_PART’,
    cascade=>dbms_stats.AUTO_CASCADE,
    estimate_percent=>dbms_stats.AUTO_SAMPLE_SIZE,
    block_sample=>FALSE
    );
    end;
    /

    – run the query & test explain plan

    set autotrace on exp
    set lines 1000

    select count(*) from test_part p
    where p.edate=to_date(’01/29/2007′,’MM/DD/YYYY’)
    /
    (explain plan shows 150 rows for me instead of ~15000 rows reality)

    set autotrace off
    alter table test_part drop constraint datechk
    /

    set autotrace on exp
    select count(*) from test_part p
    where p.edate=to_date(’01/29/2007′,’MM/DD/YYYY’)
    /

    (explain plan now shows correct cardinality – 14999 for me)

    set autotrace off
    alter table test_part add constraint datechk check
    (trunc(edate)= edate) enable;

    set autotrace on exp
    same query

    -back to wrong estimates..

    Any clues..?

    Thanks in advance

  33. Here are my results (when I ran the scripts above)

    – check constraint enabled and stats in place
    – ——————————————-
    SQL> select count(*) from test_part p
    2 where p.edate=to_date(’01/29/2007′,’MM/DD/YYYY’)
    3 /

    COUNT(*)
    ———-
    14999

    Execution Plan
    ———————————————————-
    Plan hash value: 2176230086

    —————————————————————————————————-
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    —————————————————————————————————-
    | 0 | SELECT STATEMENT | | 1 | 8 | 15 (14)| 00:00:01 | | |
    | 1 | SORT AGGREGATE | | 1 | 8 | | | | |
    | 2 | PARTITION RANGE SINGLE| | 150 | 1200 | 15 (14)| 00:00:01 | 1 | 1 |
    |* 3 | INDEX FAST FULL SCAN | TEST_PART_IDX2 | 150 | 1200 | 15 (14)| 00:00:01 | 1 |
    —————————————————————————————————-

    Predicate Information (identified by operation id):
    —————————————————

    3 – filter(TRUNC(INTERNAL_FUNCTION(“EDATE”))=TO_DATE(’2007-01-29 00:00:00′, ‘yyyy-mm-dd
    hh24:mi:ss’) AND “P”.”EDATE”=TO_DATE(’2007-01-29 00:00:00′, ‘yyyy-mm-dd hh24:mi:ss’))

    – delete the stats and let dynamic sampling kick-in
    – ————————————————–
    Execution Plan
    ———————————————————-
    Plan hash value: 4135154925

    —————————————————————————————————-
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    —————————————————————————————————-
    | 0 | SELECT STATEMENT | | 1 | 9 | 19 (0)| 00:00:01 | | |
    | 1 | SORT AGGREGATE | | 1 | 9 | | | | |
    | 2 | PARTITION RANGE SINGLE| | 14411 | 126K| 19 (0)| 00:00:01 | 1 | 1 |
    |* 3 | TABLE ACCESS FULL | TEST_PART | 14411 | 126K| 19 (0)| 00:00:01 | 1 | 1 |
    —————————————————————————————————-

    Predicate Information (identified by operation id):
    —————————————————

    3 – filter(“P”.”EDATE”=TO_DATE(’2007-01-29 00:00:00′, ‘yyyy-mm-dd hh24:mi:ss’) AND
    TRUNC(INTERNAL_FUNCTION(“EDATE”))=TO_DATE(’2007-01-29 00:00:00′, ‘yyyy-mm-dd hh24:mi:ss’))

    Note
    —–
    – dynamic sampling used for this statement

    – rerun stats gatheration & disable/drop check constraint
    – ———————————————————

    Execution Plan
    ———————————————————-
    Plan hash value: 2176230086

    —————————————————————————————————-
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
    —————————————————————————————————-
    | 0 | SELECT STATEMENT | | 1 | 8 | 14 (8)| 00:00:01 | | |
    | 1 | SORT AGGREGATE | | 1 | 8 | | | | |
    | 2 | PARTITION RANGE SINGLE| | 14999 | 117K| 14 (8)| 00:00:01 | 1 | 1 |
    |* 3 | INDEX FAST FULL SCAN | TEST_PART_IDX2 | 14999 | 117K| 14 (8)| 00:00:01 | 1 |
    —————————————————————————————————-

    Predicate Information (identified by operation id):
    —————————————————

    3 – filter(“P”.”EDATE”=TO_DATE(’2007-01-29 00:00:00′, ‘yyyy-mm-dd hh24:mi:ss’))

  34. Yes, this is something I wrote about here: http://oraclesponge.wordpress.com/2006/12/18/more-on-partition-key-statistics-and-an-oddly-harmful-constraint/

    There seems to be no way of switching off this functionality of infering cardinality based on check constraints.

    If you think it is safe to do so then I’d drop that constraint completely, or only enable it during loads and disable at all other times. Relying on dtnamic statistics might also be a option if the overhead is not significant.

  35. @David Aldridge

    On your question of why you would want to UNION ALL two partitioned tables. A “better late than never” possible answer…

    One reason is a Kimball type real time data warehouse design where you have a large static partitioned Fact table with Bitmap Indexes and then a small unindexed Fact table (cached) which receives your frequent minute to minute updates. A transfer from small Fact table to large Fact table is then made overnight (or weekly, monthly for small amounts of data).

    Unfortunately as mentioned by a couple of people here, dynamic partition elimination does not get passed through the view easily.

    Cheers,
    Steve.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s