More on Redundant Materialized Views
Posted by David Aldridge on 2006-08-14
I wrote before on the subject of why you might find it beneficial to create materialized views that are apparantly completely redundant, and here are some more notes (and a syntatical curiosity, I think) on the subject. What I didn’t demonstrate in my previous writing was the way that the optimizer can use a summary table to quickly andefficiently execute queries with a “having” clause, nor how the DBMS_Advanced_Rewrite package can be even more efficient.
I present below a series of scripts to demonstrate the following:
- Use of a materialized view to allow a GROUP BY clause to be eliminated and a HAVING clause to be rewritten to a WHERE clause.
- Consequent use an index on a materialized view to optimize the original HAVING clause.
- Use of a HAVING clause in the materialized view to reduce the size of the result set for groups of particular interest.
- Use of partitioning in the materialized view to further optimize the original HAVING clause.
- Use of the DBMS_Advanced_Rewrite package to remove the need for materializing a result set, and allow the original GROUP BY/HAVING query to be rewritten against the original data set.
We start the demonstration by creating a sales table that has a primary key declared on the columns of (product_cd, location_cd, date_of_sale).
SQL*Plus: Release 10.2.0.2.0 - Production on Mon Aug 14 14:50:27 2006 Copyright (c) 1982, 2005, Oracle. All Rights Reserved. Enter password: Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Production With the Partitioning, OLAP and Data Mining options SQL> drop table sales 2 / Table dropped. SQL> SQL> create table sales 2 ( 3 product_cd not null, 4 location_cd not null, 5 date_of_sale not null, 6 sales_amt not null 7 ) 8 pctfree 0 compress nologging 9 as 10 select mod(rownum,101), 11 mod(rownum,5), 12 date '2005-01-01' + mod(rownum-1,701), 13 dbms_random.value(1,1000) 14 from dual 15 connect by level <= 101*5*701 16 / Table created. SQL> SQL> alter table sales 2 add constraint pk_sales 3 primary key(product_cd,location_cd,date_of_sale) 4 / Table altered. SQL> SQL> exec dbms_stats.gather_table_stats(user,'sales') PL/SQL procedure successfully completed.
We execute a query to find very high sales by product, location and day.
SQL> select product_cd,
2 location_cd,
3 date_of_sale,
4 sum(sales_amt) sum_sales_amt
5 from sales
6 group by product_cd,
7 location_cd,
8 date_of_sale
9 having sum(sales_amt) > 999.9
10 /
Execution Plan
----------------------------------------------------------
Plan hash value: 2867331465
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 88005 | 3007K| | 2085 (4)| 00:00:26 |
|* 1 | FILTER | | | | | | |
| 2 | HASH GROUP BY | | 88005 | 3007K| 20M| 2085 (4)| 00:00:26 |
| 3 | TABLE ACCESS FULL| SALES | 351K| 11M| | 363 (4)| 00:00:05 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SUM("SALES_AMT")>999.9)
(Note: I edited the plan to remove dumb “more than 80 column” formatting issues)
So the execution plan requires a full table access, a hash group by (this being 10gR2, otherwise you’d expect a SORT (GROUP BY) there), and finally a filter. A total cost of 2,086 (your result may vary).
Let us assume that this is a very commonly executed query. How can we optimize it?
We create a materialized view that contains identical data to the sales table, but of course has the additional metadata associated with the MV’s defining select statement. Implicit to this metadata is the information that the data set is unique on the key columns of product_cd, location_cd and date_of_sale — information that the optimizer had access to through the definition of the primary key on the sales table but which it did not act on.
SQL> drop materialized view sales_mv
2 /
drop materialized view sales_mv
*
ERROR at line 1:
ORA-12003: materialized view "DAVE"."SALES_MV" does not exist
SQL>
SQL> create materialized view sales_mv
2 build immediate
3 enable query rewrite
4 as
5 select product_cd,
6 location_cd,
7 date_of_sale,
8 sum(sales_amt) s_sales_amt
9 from sales
10 group by product_cd,
11 location_cd,
12 date_of_sale
13 /
Materialized view created.
SQL> create index sales_mv_sales_amt
2 on sales_mv (s_sales_amt)
3 /
Index created.
SQL>
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 'sales_mv',
5 method_opt => 'for all indexed columns size 254');
6 end;
7 /
PL/SQL procedure successfully completed.
SQL> select product_cd,
2 location_cd,
3 date_of_sale,
4 sum(sales_amt) sum_sales_amt
5 from sales
6 group by product_cd,
7 location_cd,
8 date_of_sale
9 having sum(sales_amt) > 999.9
10 /
Execution Plan
----------------------------------------------------------
Plan hash value: 2514656059
-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 32 | 1664 | 36 (0)| 00:00:01 |
| 1 | MAT_VIEW REWRITE ACCESS BY INDEX ROWID| SALES_MV | 32 | 1664 | 36 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | SALES_MV_SALES_AMT | 32 | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("SALES_MV"."S_SALES_AMT">999.9)
As you can see the query can now rewrite against the materialized view, and in addition the HAVING clause has been rewritten to a WHERE clause (see the predicate information line). Consequently the optimizer used an index range scan to satisfy the SUM(SALES_AMT) > 999.9 clause. The cost estimate of the rewritten query is 36 — only 1.7% of the 2,085 of the unrewritten query.
Let us suppose that business users are only interested in very high sales. We create a materialized view that has its own HAVING clause, to filter the result set to those key combinations with a sum of sales greater than or equal to 999 (note: not 999.9, as in the test query — we’re leaving some wiggle room here).
SQL> drop materialized view sales_mv
2 /
Materialized view dropped.
SQL> drop materialized view sales_mv_hi_sales
2 /
drop materialized view sales_mv_hi_sales
*
ERROR at line 1:
ORA-12003: materialized view "DAVE"."SALES_MV_HI_SALES" does not exist
SQL>
SQL> create materialized view sales_mv_hi_sales
2 build immediate
3 enable query rewrite
4 as
5 select product_cd,
6 location_cd,
7 date_of_sale,
8 sum(sales_amt) s_sales_amt
9 from sales
10 group by product_cd,
11 location_cd,
12 date_of_sale
13 having sum(sales_amt) >= 999
14 /
Materialized view created.
SQL>
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 'sales_mv_hi_sales',
5 method_opt => 'for all indexed columns size 254');
6 end;
7 /
PL/SQL procedure successfully completed.
SQL> select product_cd,
2 location_cd,
3 date_of_sale,
4 sum(sales_amt) sum_sales_amt
5 from sales
6 group by product_cd,
7 location_cd,
8 date_of_sale
9 having sum(sales_amt) > 999.9
10 /
Execution Plan
----------------------------------------------------------
Plan hash value: 98437802
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost(%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 17 | 595 | 3 (0)| 00:00:01 |
|* 1 | MAT_VIEW REWRITE ACCESS FULL| SALES_MV_HI_SALES | 17 | 595 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("SALES_MV_HI_SALES"."S_SALES_AMT">999.9)
We have confined the materialized view to so few rows that indexing is not required to bring the cost estimate down to 3.
Next we assume that they may be interested in a few different ranges. We create a partitioned materialized view that allows pruning based on the sum of sales.
SQL> drop materialized view sales_mv_part;
Materialized view dropped.
SQL> drop materialized view sales_mv_hi_sales
2 /
Materialized view dropped.
SQL>
SQL> create materialized view sales_mv_part
2 (
3 product_cd,
4 location_cd,
5 date_of_sale,
6 s_sales_amt
7 )
8 partition by range (s_sales_amt)
9 (partition very_low_sales values less than (1),
10 partition low_sales values less than (250),
11 partition medium_sales values less than (750),
12 partition high_sales values less than (999),
13 partition very_high_sales values less than (maxvalue))
14 build immediate
15 enable query rewrite
16 as
17 select product_cd,
18 location_cd,
19 date_of_sale,
20 sum(sales_amt) s_sales_amt
21 from sales
22 group by product_cd,
23 location_cd,
24 date_of_sale
25 having sum(sales_amt) >= 999
26 /
Materialized view created.
SQL> begin
2 dbms_stats.gather_table_stats(
3 user,
4 'sales_mv_part',
5 method_opt => 'for all indexed columns size 254');
6 end;
7 /
PL/SQL procedure successfully completed.
SQL> select product_cd,
2 location_cd,
3 date_of_sale,
4 sum(sales_amt) sum_sales_amt
5 from sales
6 group by product_cd,
7 location_cd,
8 date_of_sale
9 having sum(sales_amt) > 999.9
10 /
Execution Plan
----------------------------------------------------------
Plan hash value: 2776446017
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 17 | 595 | 3(0) | 00:00:01 | | |
| 1 | PARTITION RANGE SINGLE | | 17 | 595 | 3(0) | 00:00:01 | 5 | 5 |
|* 2 | MAT_VIEW REWRITE ACCESS FULL| SALES_MV_PART | 17 | 595 | 3(0) | 00:00:01 | 5 | 5 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("SALES_MV_PART"."S_SALES_AMT">999.9)
Thus we demonstrate that the optimizer can also use partition pruning to optimize the rewritten query.
Finally, we turn our attention to the advantages that we can gain from the DBMS_Advanced_Rewrite package (10g+).
SQL> drop materialized view sales_mv_part
2 /
Materialized view dropped.
SQL> begin
2 begin
3 sys.DBMS_Advanced_Rewrite.Drop_Rewrite_Equivalence (
4 Name => 'SALES_PK_OPTIMIZE');
5 exception
6 when others then null;
7 end;
8 sys.DBMS_Advanced_Rewrite.Declare_Rewrite_Equivalence (
9 Name => 'SALES_PK_OPTIMIZE',
10 Source_Stmt => 'select product_cd,
11 location_cd,
12 date_of_sale,
13 sum(sales_amt) s_sales_amt
14 from sales
15 group by product_cd,
16 location_cd,
17 date_of_sale',
18 Destination_Stmt => 'select product_cd,
19 location_cd,
20 date_of_sale,
21 sales_amt
22 from sales',
23 Validate => TRUE,
24 Rewrite_Mode => 'GENERAL');
25 end;
26 /
PL/SQL procedure successfully completed.
SQL> select product_cd,
2 location_cd,
3 date_of_sale,
4 sum(sales_amt) sum_sales_amt
5 from sales
6 group by product_cd,
7 location_cd,
8 date_of_sale
9 having sum(sales_amt) > 999.9
10 /
Execution Plan
----------------------------------------------------------
Plan hash value: 781590677
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 12 | 420 | 366 (5)| 00:00:05 |
|* 1 | TABLE ACCESS FULL| SALES | 12 | 420 | 366 (5)| 00:00:05 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("SALES_AMT">999.9)
Thus we demonstrate that with this new package we do not need to materialized a new copy of the data and rewrite against it. Instead we give the optimizer the information that it can avoid the group by entirely, and if we add the appropriate indexes and partitions to the original table then we gain similar performance benefits as demonstrated for the materialized views — an exercise for the reader :D
Last, that syntactical curiosity I promised …
SQL> create materialized view sales_mv 2 build immediate 3 enable query rewrite 4 as 5 select product_cd, 6 location_cd, 7 date_of_sale, 8 sum(sales_amt) 9 from sales 10 group by product_cd, 11 location_cd, 12 date_of_sale 13 / Materialized view created. SQL> desc sales_mv Name Null? Type ----------------------------------------- -------- ---------------------------- PRODUCT_CD NOT NULL NUMBER LOCATION_CD NOT NULL NUMBER DATE_OF_SALE NOT NULL DATE SUM(SALES_AMT) NUMBER
Fun, huh? I wasn’t expecting that.
2006-08-15 at 8:50 am
desc sales_mv ???
Makes me glad to use aliases! :-)
2006-08-15 at 9:29 am
Yes, aliases are A Good Thing. It seems to me that such code used to raise an error, and I only found this by chance. I am an alias kind-of guy myself.
2006-08-24 at 10:46 pm
[...] More on Redundant Materialized Views [...]