The Oracle Sponge

Oracle Data Warehouse Design and Architecture

Archive for the ‘Oracle’ Category

Has Everyone Forgotten “Keep It Simple, Stupid”?

Posted by David Aldridge on 2008-07-02

OK, we all love a clever piece of code to do something a little bit tricky, but has everyone lost sight of the benefits of simplicity?

My case study for the day, a thread on the Oracle forums in which various all_objects, user_objects, connect by, MODEL, xmltable, and pipelined function techniques are suggested for generating a list of days.

Well, my comments are in the thread. Just use a table … that’s why we have them. Keep … it … simple.

 

Posted in Oracle | 4 Comments »

The “OVERLAPS” Predicate

Posted by David Aldridge on 2008-06-12

I was browsing a pretty interesting “e-book” this evening: “Developing Time-Oriented Database Applications in SQL” By Richard T. Snodgrass. I quietly bemoaned the missing functionality of Oracle in some regards, for instance the rather neat OVERLAPS predicate that returns true when two intervals … well, overlap. We tend to jump through hoops a little to achieve this sometimes, I felt.

But you never know, things have moved on since the Oracle 8, so I searched the 10g and 11g documentation to see if I’dbeen negligent in not reading the New Features Guide. Alas, nothing in the SQL Reference, but I did turn up this interesting entry:

ORA-30085: syntax error was found in overlaps predicate
Cause: A syntax error was found during parsing an overlaps predicate.
Action: Correct the syntax.

“How”, I asked myself, “can there be an error message relating to a non-existent feature”?

“What if”, I continued, “said feature is not missing, but merely undocumented?”

So I tested it.

And it works.

SQL*Plus: Release 10.2.0.1.0 - Production on Thu Jun 12 23:14:03 2008
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
Enter user-name: XXX/XXX@XXX
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production
With the Partitioning, Oracle Label Security and Data Mining Scoring Engine options
SQL> select * from dual
  2  where          (date '2007-01-01', date '2008-01-01')
  3        overlaps (date '2005-01-01', date '2006-01-01')
  4  /
no rows selected
SQL>
SQL> select * from dual
  2  where          (date '2007-01-01', date '2008-01-01')
  3        overlaps (date '2005-01-01', date '2007-01-02')
  4  /
D
-
X
SQL>
SQL> select * from dual
  2  where          (date '2007-01-01', interval '5' year)
  3        overlaps (date '2005-01-01', date '2007-01-02')
  4  /
D
-
X
SQL>
SQL> select * from dual
  2  where          (date '2007-01-01', interval '5' year)
  3        overlaps (date '2005-01-01', interval '10' year)
  4  /
D
-
X
SQL>
SQL> select * from dual
  2  where NOT         (date '2007-01-01', date '2008-01-01')
  3           overlaps (date '2005-01-01', date '2006-01-01')
  4  /
D
-
X
SQL>
SQL>
SQL> with test_data as
  2  (select date '2007-01-01' from_dt, date '2007-12-01' to_dt from dual union all
  3   select date '2007-04-01' from_dt, date '2007-08-01' to_dt from dual union all
  4   select date '2007-07-01' from_dt, date '2007-08-01' to_dt from dual union all
  5   select date '2007-10-01' from_dt, date '2008-02-01' to_dt from dual union all
  6   select date '2008-01-01' from_dt, date '2008-12-01' to_dt from dual)
  7  select *
  8  from   test_data t1,
  9         test_data t2
 10  where  (t1.from_dt,t1.to_dt) overlaps (t2.from_dt,t2.to_dt)
 11  /
FROM_DT   TO_DT     FROM_DT   TO_DT
--------- --------- --------- ---------
01-JAN-07 01-DEC-07 01-JAN-07 01-DEC-07
01-JAN-07 01-DEC-07 01-APR-07 01-AUG-07
01-JAN-07 01-DEC-07 01-JUL-07 01-AUG-07
01-JAN-07 01-DEC-07 01-OCT-07 01-FEB-08
01-APR-07 01-AUG-07 01-JAN-07 01-DEC-07
01-APR-07 01-AUG-07 01-APR-07 01-AUG-07
01-APR-07 01-AUG-07 01-JUL-07 01-AUG-07
01-JUL-07 01-AUG-07 01-JAN-07 01-DEC-07
01-JUL-07 01-AUG-07 01-APR-07 01-AUG-07
01-JUL-07 01-AUG-07 01-JUL-07 01-AUG-07
01-OCT-07 01-FEB-08 01-JAN-07 01-DEC-07
FROM_DT   TO_DT     FROM_DT   TO_DT
--------- --------- --------- ---------
01-OCT-07 01-FEB-08 01-OCT-07 01-FEB-08
01-OCT-07 01-FEB-08 01-JAN-08 01-DEC-08
01-JAN-08 01-DEC-08 01-OCT-07 01-FEB-08
01-JAN-08 01-DEC-08 01-JAN-08 01-DEC-08
15 rows selected.
SQL>
SQL>  with test_data as
  2  (select date '2007-01-01' from_dt, date '2007-12-01' to_dt from dual union all
  3   select date '2007-04-01' from_dt, date '2007-08-01' to_dt from dual union all
  4   select date '2007-07-01' from_dt, date '2007-08-01' to_dt from dual union all
  5   select date '2007-10-01' from_dt, date '2008-02-01' to_dt from dual union all
  6   select date '2008-01-01' from_dt, date '2008-12-01' to_dt from dual)
  7  select *
  8  from   test_data t1,
  9         test_data t2
 10  where  NOT (t1.from_dt,t1.to_dt) overlaps (t2.from_dt,t2.to_dt)
 11  /
FROM_DT   TO_DT     FROM_DT   TO_DT
--------- --------- --------- ---------
01-JAN-07 01-DEC-07 01-JAN-08 01-DEC-08
01-APR-07 01-AUG-07 01-OCT-07 01-FEB-08
01-APR-07 01-AUG-07 01-JAN-08 01-DEC-08
01-JUL-07 01-AUG-07 01-OCT-07 01-FEB-08
01-JUL-07 01-AUG-07 01-JAN-08 01-DEC-08
01-OCT-07 01-FEB-08 01-APR-07 01-AUG-07
01-OCT-07 01-FEB-08 01-JUL-07 01-AUG-07
01-JAN-08 01-DEC-08 01-JAN-07 01-DEC-07
01-JAN-08 01-DEC-08 01-APR-07 01-AUG-07
01-JAN-08 01-DEC-08 01-JUL-07 01-AUG-07
10 rows selected.
SQL> explain plan for
  2  with test_data as
  3  (select date '2007-01-01' from_dt, date '2007-12-01' to_dt from dual union all
  4   select date '2007-04-01' from_dt, date '2007-08-01' to_dt from dual union all
  5   select date '2007-07-01' from_dt, date '2007-08-01' to_dt from dual union all
  6   select date '2007-10-01' from_dt, date '2008-02-01' to_dt from dual union all
  7   select date '2008-01-01' from_dt, date '2008-12-01' to_dt from dual)
  8  select *
  9  from   test_data t1,
 10         test_data t2
 11  where  (t1.from_dt,t1.to_dt) overlaps (t2.from_dt,t2.to_dt)
 12  /
Explained.
SQL> set linesize 200
SQL> select * from table(dbms_xplan.display)
  2  /
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
----------------------------------------
Plan hash value: 1640239735
----------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                        | Rows  | Bytes | Cost (%CPU)| Time  |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                             |     1 |    36 |    22  (37)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION |                             |       |       |            |       |
|   2 |   LOAD AS SELECT           |                             |       |       |            |       |
|   3 |    UNION-ALL               |                             |       |       |            |       |
|   4 |     FAST DUAL              |                             |     1 |       |     2   (0)| 00:00:01 |
|   5 |     FAST DUAL              |                             |     1 |       |     2   (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
----------------------------------------
|   6 |     FAST DUAL              |                             |     1 |       |     2   (0)| 00:00:01 |
|   7 |     FAST DUAL              |                             |     1 |       |     2   (0)| 00:00:01 |
|   8 |     FAST DUAL              |                             |     1 |       |     2   (0)| 00:00:01 |
|   9 |   NESTED LOOPS             |                             |     1 |    36 |    12   (0)| 00:00:01 |
|  10 |    VIEW                    |                             |     5 |    90 |     2   (0)| 00:00:01 |
|  11 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6602_EE7BE7EC |     5 |    90 |     2   (0)| 00:00:01 |
|* 12 |    VIEW                    |                             |     1 |    18 |     2   (0)| 00:00:01 |
|  13 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6602_EE7BE7EC |     5 |    90 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
----------------------------------------
---------------------------------------------------
  12 - filter((INTERNAL_FUNCTION("T1"."FROM_DT"),INTERNAL_FUNCTION("T1"."TO_DT"))OVERLAPS(INTERNAL
              _FUNCTION("T2"."FROM_DT"),INTERNAL_FUNCTION("T2"."TO_DT")))
26 rows selected.

So not only does it let you check two pairs of dates to see if they overlap, it lets you check if they do not overlap (pretty simple stuff), and it also accepts a date and interval expression in place of date pairs.

It doesn’t work for pairs of numbers though, it seems:

SQL> select * from dual
  2  where          (1,5)
  3        overlaps (3,8)
  4  /
where          (1,5)
                *
ERROR at line 2:
ORA-00932: inconsistent datatypes: expected DATE got NUMBER

SQL>

Still, very cool! Undocumented, but still cool.

Does it make it more cool if it is undocumented?

Posted in Oracle | 21 Comments »

A Fix for Check Constraints That Harm Cardinalities

Posted by David Aldridge on 2008-05-21

I wrote a while ago about the harmful effect of check constraints on query optimization. I’ll pause while you catch up on that …

Anyway, this appears to have been addressed in 10.2.0.4, and it’s now safe to go back into the water on applying check constraints. Bug 5891471.

The bug description also notes that you can set event 10195 to disable transitive predicate generation using check constraints.

I haven’t tested any of this — I will if I get time. Unless someoneelse wants to …

 

Posted in Oracle | 3 Comments »

Testing a No-statistics Environment: Part II

Posted by David Aldridge on 2008-05-07

Following on from the previous post, a little glitch in the plans: the instance appeared to crash late last night during the load. Hopefully not some exotic bug associated with dynamic sampling.

I modified the delete-and-lock strategy yesterday after remembering that we have a number of indexes that are created with the “compute statistics” option. Since DBMS_STATS.LOCK_TABLE_STATS is intended to prevent the modification of any statistics associated with a table I ran a quick test to see if compute statistics can be specified for an index created against a statistics-locked table, and indeed it cannot.

SQL Error: ORA-38029: object statistics are locked
38029. 00000 -  “object statistics are locked”
*Cause:    An attept was made to modify optimizer statistics of the object.
*Action:   Unlock statistics with the DBMS_STATS.UNLOCK_TABLE_STATS procedure
           on base table(s). Retry the operation if it is okay to update statistics.

Well, it was probably safer not to anyway for a first run.

Now, whatever happened to that database instance?

 

Posted in Oracle | Leave a Comment »

Testing a No-statistics Environment.

Posted by David Aldridge on 2008-05-06

I am absolutely sick and tired of dealing with issues where misleading partition statistics (for example) cause ETL queries to go crazy and try to sort-merge tables with tens of millions of rows each.

Since we’re about to run a test load on production data I’m going to drop all table and index statistics and lock them. OK, actually I’m going to back the statistics up first, and then I’m going to drop and lock them.

We’ll see how we do on a 100% dynamic sampling-based system …

Posted in Oracle | 15 Comments »

An Empassioned and Meaningless Debate

Posted by David Aldridge on 2008-04-25

“See-qwel” or “S-Q-L”? Have your say on this highly important matter here, and help shape the future of our industry.

Posted in Oracle | 11 Comments »

ETL Powered by Rice and Beans

Posted by David Aldridge on 2008-04-17

I’m told that it’s very important to keep one’s brain in shape as one grows older. Or “drifts gently towards the twilight” as I prefer to think of it. Trying to look on the optimistic side, obviously. Fortunately I end each day with my brain wrung out like a well-used sponge courtesy of some ETL development work on a largeish data warehouse development.

Although the regular ETL development is outsourced, we are responsible for the coding that processes historical data, so on deployment we will run our historical load code and then the “other team’s” regular load will take over. Yes, it is a strange arrangement. And by “we” I really mean my collegues somewhere in the pestilent jungles of Brazil, surrounded by poisonous frogs and who-knows-what other kind of other fearsome thing, and living on a diet of rice and beans in their houses on stilts.

My own role is not so bad. I don’t have to get too involved in the detailed busness logic that the code has to implement, and thank goodness for that because frankly I’ve not seen anything so complex for … well, forever. So I sit here in a metaphorically lofty perch, making myself popular by criticising the hard work of others and encouraging them to greater efforts by calling them lazy if they stop work before midnight (the traditional management role). Yes, it’s a good life alright. I constantly remind them that the real aim of the project is to produce a process for loading five years of historical data that is faster than the “B-Team’s” load of single day.

I think our methodology works quite well though. When you are down in the weeds of the logic (did I mention that it is particularly complex logic?) it is always difficult to see the slightly larger picture of where you’re taking the data next and how you optimise both specific ETL stages and how you can adjust the overall process to make everything more efficient. It’s a little like pair programming, I guess.

And we do get to do the fun bit — 10 million records joining 0-M:0-N through eight OR’d join conditions to 10 million other records, with a couple of billion joins resulting that we then artfully winnow down to a few million. The optimization process is a little mind bending, with all sorts of issues that we wouldn’t normally worry about too much becoming vitally important in an effort to avoid blowing out the host server’s memory with a degree of parallelism of 40. I have to say that there was initially some FUD over the historical process, with estimates of the load duration being bandied about that were in the order of months — we never thought it would take more than five days, and it’s now looking like it’ll run in under one day, which means that we’d be able to start the daily load process directly afterwards without altering it. And I think we still have a few more tricks up our sleeves as well in case we need to buy an extra hour somehow.

In fact it’s turning into something of a stroll through the SQL Reference book. The list of features we’re touching on include analytic functions (which have become a second language for everyone now, along with the “PCTFREE 0 NOLOGGING COMPRESS” which I nearly appended to my sandwich order the other day), connect by clauses, and even some rather exotic stuff with finding the lowest set bit in a something-or-other that I don’t understand yet but which Heitor thought of while lying in bed one night and which he used to help reduce an eight hour process to fifteen minutes.  They say that they don’t have siestas in Brazil, but maybe they should start.

 

Oh well, another morning … Brazil is online … back to work!

Posted in Data Warehousing, Oracle, Performance | 9 Comments »

Choosing An Optimal Stats Gathering Strategy

Posted by David Aldridge on 2008-04-13

Greg Rahn has an excellent entry on statistics gathering and the use and abuse of dbms_stats and initialization parameters at the Structured Data blog, one of my favourites. I added a comment on dynamic sampling, of which I am a big proponent, and Greg has another recent post on a case where dynamic sampling saved the day. Hey, I commented on that too …

Posted in Oracle | 2 Comments »

Indexing Options for Change Data Capture

Posted by David Aldridge on 2008-04-08

I just posted this question to the Oracle list, but thought I might post it here in case that reaches a wider audience. It also may be a situation that others in the DW arena have faced before.

 

I have a large and busy OLTP table, 100GB or so, against which there is a need to capture changes. Until an asynchronous CDC solution is in place we have to rely on two columns: create_date and update_date (null until the first update), both being of DATE type of course.

These are currently unindexed, but there is a desire to index them to improve change capture performance for queries such as:

 
select …
from   …
where     (create_dt >= trunc(sysdate)-1 and create_dt < trunc(sysdate))
       or (update_dt >= trunc(sysdate)-1 and update_dt < trunc(sysdate))
 
The desire is obviously to provide the maximum performance benefit while reducing the impact on the OLTP system.
 
I thought of four different indexing options:
 
i) Indexing the columns separately, leading in the best case to an unusual execution plan where the indexes are range scanned and the results merged before accessing the table.
ii) A single composite index (create_date,update_date),leading to a fast full index scan.
iii) A single composite index (update_date,create_date), rewriting the query predicate as …
   (create_dt >= trunc(sysdate)-1 and create_dt < trunc(sysdate) and update_date is null)
or (update_dt >= trunc(sysdate)-1 and update_dt < trunc(sysdate))
… and leading to two index range scans. (not sure about this)
iv) A single-column function based index on (Nvl(update_dt,create_dt)) and rewriting the predicate appropriately.
Whichever of these is chosen the growth pattern for the data naturally tends towards index contention as all the new values are equal to sysdate.
 
So the nub of my problem comes down to these questions:
 
Is it possible to mitigate the index contention problem with a suitably high initrans values? Does it seem likely that implementing a reverse index would be worthwhile in reducing contention (I could probably take the pain of the fast full scan if it was, although it would lead to a larger index due to the 50/50 block splits).
 
Would you regard implementing a function-based index as risky in any significant way on major tables of a busy OLTP system? How about a reverse function based index?
 
Basically, “what would you do”?
 
Maybe I’m overthinking it, but I’d like to go to the sytem developers with a fully thought out set of options (even if they do tell me to take a running jump anyway).

 

 

Posted in Data Warehousing, Oracle, Performance | 26 Comments »

Stored Procedure are Wonderful/Evil

Posted by David Aldridge on 2008-04-07

The same old “stored procedures are wonderful/evil” debate is again being practiced at the Joel on Software forums.

Enjoy.

Posted in Forums, Oracle | Leave a Comment »