The Oracle Sponge

Oracle Data Warehouse Design and Architecture

42

Posted by David Aldridge on 2009-01-05

Well, 2009 is here, and it’s a time of change for sure.

There’s that whole US election thing of course, which made me a happy non-participant in the democratic process, but more importantly it’s a time of change in the Sponge household. After 10 (is that right? maybe more) years of life in the United States, experincing Dayton OH, Colorado Springs CO, and Fairfax VA, we’re heading back to the Old Country at the end of March or thereabouts. Not out in some gods-forsaken wilderness either, but just a shade west of London. Ten years ago I’d had enough of London, and maybe I still couldn’t take living there and commuting into the city five days a week, but it’s a fine place to be near and I’m looking forward to be back close enough for a stroll around Hampstead Heath, or Greenwich, or the Inns of Court.

Mixed feelings in the house about it of course. Nervousness for the kids (5, 7 and 9), who still miss the mountains and sunshine of Colorado. Despite the inherent dangers of the area (yes Jennifer, “escaped lion”  makes me react with “it’s a little concerning” also), Numbers 2 and 3 sons have decided to go to college there and Number 3 will live in Number 2’s house. I’m not sure that I planned ahead that far when I was their ages, but it’ll be nice to have somewhere sunny for me to visit when I’m in my dotage.

Well, it’s been hard to be so far away from family for all these years and it’ll be good to be back in England, just a drive away from siblings and their own families, and a short flight away from parents. I smell an extended summer vacation  for three lucky boys and four mixed-fortune grandparents! If they need me, I’ll be windsurfing or summit.

Anyhoo, in retrospect it was a simple business to move over here, though it didn’t seem so at the time. There was the paperwork and bureaucratic waiting game of the visa process for me, but that aside there was just some furniture, some boxes, and an airplane ticket. In ten years it seems that you can pick up a lot of possessions — a cat, a couple of cars, a TV or two, three kids, an artificial Christmas tree … the list seems to be endless. The tree aside they all require their own confusing set of paperwork or other accoutrements. Importing a car is no easy matter, and anything electrical requires transformers and converters and whatnot. We have to find a school for the cat, get the kids vaccinated and into quaranteen kennels, and get new passports for both. It’s a confusing business that will take our organisational skills to a whole new level.

Plus, we’ll be missing out on several years’ worth of East Enders when we move, but with the local PBS station only broadcasting two episodes a week we weren’t catching up. Just to give you an idea of how far behind we are, Tom just died trying to save whats-his-name from the fire in the Slaters’ house that Little Mo accidentally started after what-his-name doused the place in petrol with the intention of scaring Little Mo into staying with him. The bastard. Still, Tom had that tumor so he wasn’t going to be around much longer. And that new manager at the Old Vic is obviously a bad sort.

Trevor, that was his name. The guy with the petrol.

So, there is enough to be done with this move that I didn’t feel that I could juggle personal and work commitments for the next three months, so I left XM Satellite Radio almost on the last day of the year and I now kid myself that I will be a Gentleman of Leisure for a while. XM and Sirius satellite radio companies are now but a single entity so there’s a certain amount of merging and sorting and whatnot to be done, and many challenges ahead that I really couldn’t have put all of my energies into. It’s been interesting to see how two companies that were pretty much the only ones of their kind in the country are similar in some ways and poles-apart in others. I can certainly say that there were many organisational and philisophical differences in the two IT organisations, which were almost complete opposites of each other in many aspects. And that’s all I have to say about that :)

I’ll miss much of it, of course. I worked with some people there who I hope to be in touch with for many years, some of them in strange foreign countries full of poison frogs and monkeys. Oh, they’d deny it if you asked them straight, but if you work ‘em for 40 hours straight they’ll confess it all. Really, we’d lure people to lovely Washington DC with promises of an easy couple of weeks getting to know us and helping out with a few things here and there, then work them relentlessly every single day and return a wornout husk of a person to their next of kin — a mere hollow shell. “On the way out I passed Henrique at Sao Paolo airport — I thought he was a homeless guy! What did you do to him?”. A real and unsolicited testimonial from one of our wonderful developers in Curitiba.

Good times. I’ll miss those crazy Brazilians and the special Anglo-Saxon vocabulary we used on a daily basis to keep ourselves sane.

Moving along … aside from that whole “relocating to a different continent” thing, I have a number of other pursuits to keep me entertained.

Firstly, I have a little list of technical topics to write about. Some parallelism things,  some SQL stuff. I’d also like to get more familiar with Oracle’s other data warehousing technologies, so I’ll be fiddling with them as well. I had a brief and not very technical encounter with OBIEE a year or so ago (the same project that destroyed poor Henrique, I think) and the architecture was very appealing (Henrique might differ). It’s one of those “why yes, that seems like it would work very nicely” ways of doing things that I’m keen to know more about. Also, after around ten years of working with Informatica it will be interesting to see how Oracle’s own data warehouse ETL toolset handles. I dipped my toe in that water many years ago but at that stage I think that I was looking at a less than mature application, and I’ve been hearing good things about itfor a while now. Time to take that plunge.

Secondly, there’s that whole  ”gained X pounds in weight while living in America” thing to deal with. Giving up the smokes, working too many sedentary hours, and the dining-out culture have really done a number on my waistline. A sparkly new membership at the enormous local fitness club makes me wonder why I didn’t do this before, and hopefully will help me get to grips with that. I’m no fan of that sort of thing (”what is this strange salty fluid exuding from my pores?”), but it has to be done. It’s been OK so far. You know, bearable.

Lastly, the kids will be out of day care and under Direct Parental Supervision. Now that should really be interesting.

However before all of that begins I’m off to the UK and Spain for a couple of weeks first, reminding friends and family what it will be like to have us back within unexpected-visit range and giving them a last chance to change names and move to a different city. I like to think of it as a Mental Health Vacation for me to decompress from a couple of years of rather hard and stressful work, and to get me refreshed for the next few months of what will doubtless be one of my more interesting years of family life.

“Happy New Year”, everyone.

Posted in Personal | 8 Comments »

Has Everyone Forgotten “Keep It Simple, Stupid”?

Posted by David Aldridge on 2008-07-02

OK, we all love a clever piece of code to do something a little bit tricky, but has everyone lost sight of the benefits of simplicity?

My case study for the day, a thread on the Oracle forums in which various all_objects, user_objects, connect by, MODEL, xmltable, and pipelined function techniques are suggested for generating a list of days.

Well, my comments are in the thread. Just use a table … that’s why we have them. Keep … it … simple.

 

Posted in Oracle | 4 Comments »

The “OVERLAPS” Predicate

Posted by David Aldridge on 2008-06-12

I was browsing a pretty interesting “e-book” this evening: “Developing Time-Oriented Database Applications in SQL” By Richard T. Snodgrass. I quietly bemoaned the missing functionality of Oracle in some regards, for instance the rather neat OVERLAPS predicate that returns true when two intervals … well, overlap. We tend to jump through hoops a little to achieve this sometimes, I felt.

But you never know, things have moved on since the Oracle 8, so I searched the 10g and 11g documentation to see if I’dbeen negligent in not reading the New Features Guide. Alas, nothing in the SQL Reference, but I did turn up this interesting entry:

ORA-30085: syntax error was found in overlaps predicate
Cause: A syntax error was found during parsing an overlaps predicate.
Action: Correct the syntax.

“How”, I asked myself, “can there be an error message relating to a non-existent feature”?

“What if”, I continued, “said feature is not missing, but merely undocumented?”

So I tested it.

And it works.

SQL*Plus: Release 10.2.0.1.0 - Production on Thu Jun 12 23:14:03 2008
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
Enter user-name: XXX/XXX@XXX
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production
With the Partitioning, Oracle Label Security and Data Mining Scoring Engine options
SQL> select * from dual
  2  where          (date '2007-01-01', date '2008-01-01')
  3        overlaps (date '2005-01-01', date '2006-01-01')
  4  /
no rows selected
SQL>
SQL> select * from dual
  2  where          (date '2007-01-01', date '2008-01-01')
  3        overlaps (date '2005-01-01', date '2007-01-02')
  4  /
D
-
X
SQL>
SQL> select * from dual
  2  where          (date '2007-01-01', interval '5' year)
  3        overlaps (date '2005-01-01', date '2007-01-02')
  4  /
D
-
X
SQL>
SQL> select * from dual
  2  where          (date '2007-01-01', interval '5' year)
  3        overlaps (date '2005-01-01', interval '10' year)
  4  /
D
-
X
SQL>
SQL> select * from dual
  2  where NOT         (date '2007-01-01', date '2008-01-01')
  3           overlaps (date '2005-01-01', date '2006-01-01')
  4  /
D
-
X
SQL>
SQL>
SQL> with test_data as
  2  (select date '2007-01-01' from_dt, date '2007-12-01' to_dt from dual union all
  3   select date '2007-04-01' from_dt, date '2007-08-01' to_dt from dual union all
  4   select date '2007-07-01' from_dt, date '2007-08-01' to_dt from dual union all
  5   select date '2007-10-01' from_dt, date '2008-02-01' to_dt from dual union all
  6   select date '2008-01-01' from_dt, date '2008-12-01' to_dt from dual)
  7  select *
  8  from   test_data t1,
  9         test_data t2
 10  where  (t1.from_dt,t1.to_dt) overlaps (t2.from_dt,t2.to_dt)
 11  /
FROM_DT   TO_DT     FROM_DT   TO_DT
--------- --------- --------- ---------
01-JAN-07 01-DEC-07 01-JAN-07 01-DEC-07
01-JAN-07 01-DEC-07 01-APR-07 01-AUG-07
01-JAN-07 01-DEC-07 01-JUL-07 01-AUG-07
01-JAN-07 01-DEC-07 01-OCT-07 01-FEB-08
01-APR-07 01-AUG-07 01-JAN-07 01-DEC-07
01-APR-07 01-AUG-07 01-APR-07 01-AUG-07
01-APR-07 01-AUG-07 01-JUL-07 01-AUG-07
01-JUL-07 01-AUG-07 01-JAN-07 01-DEC-07
01-JUL-07 01-AUG-07 01-APR-07 01-AUG-07
01-JUL-07 01-AUG-07 01-JUL-07 01-AUG-07
01-OCT-07 01-FEB-08 01-JAN-07 01-DEC-07
FROM_DT   TO_DT     FROM_DT   TO_DT
--------- --------- --------- ---------
01-OCT-07 01-FEB-08 01-OCT-07 01-FEB-08
01-OCT-07 01-FEB-08 01-JAN-08 01-DEC-08
01-JAN-08 01-DEC-08 01-OCT-07 01-FEB-08
01-JAN-08 01-DEC-08 01-JAN-08 01-DEC-08
15 rows selected.
SQL>
SQL>  with test_data as
  2  (select date '2007-01-01' from_dt, date '2007-12-01' to_dt from dual union all
  3   select date '2007-04-01' from_dt, date '2007-08-01' to_dt from dual union all
  4   select date '2007-07-01' from_dt, date '2007-08-01' to_dt from dual union all
  5   select date '2007-10-01' from_dt, date '2008-02-01' to_dt from dual union all
  6   select date '2008-01-01' from_dt, date '2008-12-01' to_dt from dual)
  7  select *
  8  from   test_data t1,
  9         test_data t2
 10  where  NOT (t1.from_dt,t1.to_dt) overlaps (t2.from_dt,t2.to_dt)
 11  /
FROM_DT   TO_DT     FROM_DT   TO_DT
--------- --------- --------- ---------
01-JAN-07 01-DEC-07 01-JAN-08 01-DEC-08
01-APR-07 01-AUG-07 01-OCT-07 01-FEB-08
01-APR-07 01-AUG-07 01-JAN-08 01-DEC-08
01-JUL-07 01-AUG-07 01-OCT-07 01-FEB-08
01-JUL-07 01-AUG-07 01-JAN-08 01-DEC-08
01-OCT-07 01-FEB-08 01-APR-07 01-AUG-07
01-OCT-07 01-FEB-08 01-JUL-07 01-AUG-07
01-JAN-08 01-DEC-08 01-JAN-07 01-DEC-07
01-JAN-08 01-DEC-08 01-APR-07 01-AUG-07
01-JAN-08 01-DEC-08 01-JUL-07 01-AUG-07
10 rows selected.
SQL> explain plan for
  2  with test_data as
  3  (select date '2007-01-01' from_dt, date '2007-12-01' to_dt from dual union all
  4   select date '2007-04-01' from_dt, date '2007-08-01' to_dt from dual union all
  5   select date '2007-07-01' from_dt, date '2007-08-01' to_dt from dual union all
  6   select date '2007-10-01' from_dt, date '2008-02-01' to_dt from dual union all
  7   select date '2008-01-01' from_dt, date '2008-12-01' to_dt from dual)
  8  select *
  9  from   test_data t1,
 10         test_data t2
 11  where  (t1.from_dt,t1.to_dt) overlaps (t2.from_dt,t2.to_dt)
 12  /
Explained.
SQL> set linesize 200
SQL> select * from table(dbms_xplan.display)
  2  /
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
----------------------------------------
Plan hash value: 1640239735
----------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                        | Rows  | Bytes | Cost (%CPU)| Time  |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                             |     1 |    36 |    22  (37)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION |                             |       |       |            |       |
|   2 |   LOAD AS SELECT           |                             |       |       |            |       |
|   3 |    UNION-ALL               |                             |       |       |            |       |
|   4 |     FAST DUAL              |                             |     1 |       |     2   (0)| 00:00:01 |
|   5 |     FAST DUAL              |                             |     1 |       |     2   (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
----------------------------------------
|   6 |     FAST DUAL              |                             |     1 |       |     2   (0)| 00:00:01 |
|   7 |     FAST DUAL              |                             |     1 |       |     2   (0)| 00:00:01 |
|   8 |     FAST DUAL              |                             |     1 |       |     2   (0)| 00:00:01 |
|   9 |   NESTED LOOPS             |                             |     1 |    36 |    12   (0)| 00:00:01 |
|  10 |    VIEW                    |                             |     5 |    90 |     2   (0)| 00:00:01 |
|  11 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6602_EE7BE7EC |     5 |    90 |     2   (0)| 00:00:01 |
|* 12 |    VIEW                    |                             |     1 |    18 |     2   (0)| 00:00:01 |
|  13 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6602_EE7BE7EC |     5 |    90 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
----------------------------------------
---------------------------------------------------
  12 - filter((INTERNAL_FUNCTION("T1"."FROM_DT"),INTERNAL_FUNCTION("T1"."TO_DT"))OVERLAPS(INTERNAL
              _FUNCTION("T2"."FROM_DT"),INTERNAL_FUNCTION("T2"."TO_DT")))
26 rows selected.

So not only does it let you check two pairs of dates to see if they overlap, it lets you check if they do not overlap (pretty simple stuff), and it also accepts a date and interval expression in place of date pairs.

It doesn’t work for pairs of numbers though, it seems:

SQL> select * from dual
  2  where          (1,5)
  3        overlaps (3,8)
  4  /
where          (1,5)
                *
ERROR at line 2:
ORA-00932: inconsistent datatypes: expected DATE got NUMBER

SQL>

Still, very cool! Undocumented, but still cool.

Does it make it more cool if it is undocumented?

Posted in Oracle | 21 Comments »

A Fix for Check Constraints That Harm Cardinalities

Posted by David Aldridge on 2008-05-21

I wrote a while ago about the harmful effect of check constraints on query optimization. I’ll pause while you catch up on that …

Anyway, this appears to have been addressed in 10.2.0.4, and it’s now safe to go back into the water on applying check constraints. Bug 5891471.

The bug description also notes that you can set event 10195 to disable transitive predicate generation using check constraints.

I haven’t tested any of this — I will if I get time. Unless someoneelse wants to …

 

Posted in Oracle | 3 Comments »

Testing a No-statistics Environment: Part II

Posted by David Aldridge on 2008-05-07

Following on from the previous post, a little glitch in the plans: the instance appeared to crash late last night during the load. Hopefully not some exotic bug associated with dynamic sampling.

I modified the delete-and-lock strategy yesterday after remembering that we have a number of indexes that are created with the “compute statistics” option. Since DBMS_STATS.LOCK_TABLE_STATS is intended to prevent the modification of any statistics associated with a table I ran a quick test to see if compute statistics can be specified for an index created against a statistics-locked table, and indeed it cannot.

SQL Error: ORA-38029: object statistics are locked
38029. 00000 -  “object statistics are locked”
*Cause:    An attept was made to modify optimizer statistics of the object.
*Action:   Unlock statistics with the DBMS_STATS.UNLOCK_TABLE_STATS procedure
           on base table(s). Retry the operation if it is okay to update statistics.

Well, it was probably safer not to anyway for a first run.

Now, whatever happened to that database instance?

 

Posted in Oracle | Leave a Comment »

Testing a No-statistics Environment.

Posted by David Aldridge on 2008-05-06

I am absolutely sick and tired of dealing with issues where misleading partition statistics (for example) cause ETL queries to go crazy and try to sort-merge tables with tens of millions of rows each.

Since we’re about to run a test load on production data I’m going to drop all table and index statistics and lock them. OK, actually I’m going to back the statistics up first, and then I’m going to drop and lock them.

We’ll see how we do on a 100% dynamic sampling-based system …

Posted in Oracle | 15 Comments »

An Empassioned and Meaningless Debate

Posted by David Aldridge on 2008-04-25

“See-qwel” or “S-Q-L”? Have your say on this highly important matter here, and help shape the future of our industry.

Posted in Oracle | 11 Comments »

ETL Powered by Rice and Beans

Posted by David Aldridge on 2008-04-17

I’m told that it’s very important to keep one’s brain in shape as one grows older. Or “drifts gently towards the twilight” as I prefer to think of it. Trying to look on the optimistic side, obviously. Fortunately I end each day with my brain wrung out like a well-used sponge courtesy of some ETL development work on a largeish data warehouse development.

Although the regular ETL development is outsourced, we are responsible for the coding that processes historical data, so on deployment we will run our historical load code and then the “other team’s” regular load will take over. Yes, it is a strange arrangement. And by “we” I really mean my collegues somewhere in the pestilent jungles of Brazil, surrounded by poisonous frogs and who-knows-what other kind of other fearsome thing, and living on a diet of rice and beans in their houses on stilts.

My own role is not so bad. I don’t have to get too involved in the detailed busness logic that the code has to implement, and thank goodness for that because frankly I’ve not seen anything so complex for … well, forever. So I sit here in a metaphorically lofty perch, making myself popular by criticising the hard work of others and encouraging them to greater efforts by calling them lazy if they stop work before midnight (the traditional management role). Yes, it’s a good life alright. I constantly remind them that the real aim of the project is to produce a process for loading five years of historical data that is faster than the “B-Team’s” load of single day.

I think our methodology works quite well though. When you are down in the weeds of the logic (did I mention that it is particularly complex logic?) it is always difficult to see the slightly larger picture of where you’re taking the data next and how you optimise both specific ETL stages and how you can adjust the overall process to make everything more efficient. It’s a little like pair programming, I guess.

And we do get to do the fun bit — 10 million records joining 0-M:0-N through eight OR’d join conditions to 10 million other records, with a couple of billion joins resulting that we then artfully winnow down to a few million. The optimization process is a little mind bending, with all sorts of issues that we wouldn’t normally worry about too much becoming vitally important in an effort to avoid blowing out the host server’s memory with a degree of parallelism of 40. I have to say that there was initially some FUD over the historical process, with estimates of the load duration being bandied about that were in the order of months — we never thought it would take more than five days, and it’s now looking like it’ll run in under one day, which means that we’d be able to start the daily load process directly afterwards without altering it. And I think we still have a few more tricks up our sleeves as well in case we need to buy an extra hour somehow.

In fact it’s turning into something of a stroll through the SQL Reference book. The list of features we’re touching on include analytic functions (which have become a second language for everyone now, along with the “PCTFREE 0 NOLOGGING COMPRESS” which I nearly appended to my sandwich order the other day), connect by clauses, and even some rather exotic stuff with finding the lowest set bit in a something-or-other that I don’t understand yet but which Heitor thought of while lying in bed one night and which he used to help reduce an eight hour process to fifteen minutes.  They say that they don’t have siestas in Brazil, but maybe they should start.

 

Oh well, another morning … Brazil is online … back to work!

Posted in Data Warehousing, Oracle, Performance | 9 Comments »

Choosing An Optimal Stats Gathering Strategy

Posted by David Aldridge on 2008-04-13

Greg Rahn has an excellent entry on statistics gathering and the use and abuse of dbms_stats and initialization parameters at the Structured Data blog, one of my favourites. I added a comment on dynamic sampling, of which I am a big proponent, and Greg has another recent post on a case where dynamic sampling saved the day. Hey, I commented on that too …

Posted in Oracle | 2 Comments »

Indexing Options for Change Data Capture

Posted by David Aldridge on 2008-04-08

I just posted this question to the Oracle list, but thought I might post it here in case that reaches a wider audience. It also may be a situation that others in the DW arena have faced before.

 

I have a large and busy OLTP table, 100GB or so, against which there is a need to capture changes. Until an asynchronous CDC solution is in place we have to rely on two columns: create_date and update_date (null until the first update), both being of DATE type of course.

These are currently unindexed, but there is a desire to index them to improve change capture performance for queries such as:

 
select …
from   …
where     (create_dt >= trunc(sysdate)-1 and create_dt < trunc(sysdate))
       or (update_dt >= trunc(sysdate)-1 and update_dt < trunc(sysdate))
 
The desire is obviously to provide the maximum performance benefit while reducing the impact on the OLTP system.
 
I thought of four different indexing options:
 
i) Indexing the columns separately, leading in the best case to an unusual execution plan where the indexes are range scanned and the results merged before accessing the table.
ii) A single composite index (create_date,update_date),leading to a fast full index scan.
iii) A single composite index (update_date,create_date), rewriting the query predicate as …
   (create_dt >= trunc(sysdate)-1 and create_dt < trunc(sysdate) and update_date is null)
or (update_dt >= trunc(sysdate)-1 and update_dt < trunc(sysdate))
… and leading to two index range scans. (not sure about this)
iv) A single-column function based index on (Nvl(update_dt,create_dt)) and rewriting the predicate appropriately.
Whichever of these is chosen the growth pattern for the data naturally tends towards index contention as all the new values are equal to sysdate.
 
So the nub of my problem comes down to these questions:
 
Is it possible to mitigate the index contention problem with a suitably high initrans values? Does it seem likely that implementing a reverse index would be worthwhile in reducing contention (I could probably take the pain of the fast full scan if it was, although it would lead to a larger index due to the 50/50 block splits).
 
Would you regard implementing a function-based index as risky in any significant way on major tables of a busy OLTP system? How about a reverse function based index?
 
Basically, “what would you do”?
 
Maybe I’m overthinking it, but I’d like to go to the sytem developers with a fully thought out set of options (even if they do tell me to take a running jump anyway).

 

 

Posted in Data Warehousing, Oracle, Performance | 26 Comments »