Following on from my reply about code comments at AskTom, here is a lovely example from The Daily WTF of how not to do it. It looks like we have something to learn from both ends of the spectrum.
Does nobody do code reviews any more?
Posted by David Aldridge on 2009-09-15
Following on from my reply about code comments at AskTom, here is a lovely example from The Daily WTF of how not to do it. It looks like we have something to learn from both ends of the spectrum.
Does nobody do code reviews any more?
Posted in Oracle | 6 Comments »
Posted by David Aldridge on 2009-07-17
I was intrigued by this article in the New York Times. In brief, it reviews some tests on the relationship between colour and cognitive performance, and highlights the ways in which the brain is affected by colour in the environment. For example studies seem to suggest that exposure to the colour red promotes accuracy, attention to detail, and fact recall, whereas a blue environment promotes creative thinking and imagination.
Yellow seems to make people eat more food. Go figure.
I have a weak spot for studies like this. I recall reading of one that demonstrated quite convincingly that creativity in finding an elegant solution to a software development problem was stifled by the playing of music through headphones. How many developers do we know who do just that very thing, albeit it in many cases to drown out the babble of the marketing department that they are colocated with? That’s workplace creativity being strangled, that is.
What I find interesting about this colour study is that two desirable aspects of the software development process — creativity and accuracy — are apparantly promoted by different colours, and these aspects are also desirable at different stages in the development cycle.
Designing an ETL process? Creativity … get yourself in the blue room, and those intuitive leaps that suddenly simplify the whole process will flow naturally to your pen. Implementing the design as code? Red room for you, and you’ll be focused on syntax and detail, with fewer bugs and faster development.
And keep yellow out of the work place. The sedentary office life and donuts in the morning are already our worst enemy.
Posted in Oracle | 10 Comments »
Posted by David Aldridge on 2008-07-02
OK, we all love a clever piece of code to do something a little bit tricky, but has everyone lost sight of the benefits of simplicity?
My case study for the day, a thread on the Oracle forums in which various all_objects, user_objects, connect by, MODEL, xmltable, and pipelined function techniques are suggested for generating a list of days.
Well, my comments are in the thread. Just use a table … that’s why we have them. Keep … it … simple.
Posted in Oracle | 6 Comments »
Posted by David Aldridge on 2008-06-12
I was browsing a pretty interesting “e-book” this evening: “Developing Time-Oriented Database Applications in SQL” By Richard T. Snodgrass. I quietly bemoaned the missing functionality of Oracle in some regards, for instance the rather neat OVERLAPS predicate that returns true when two intervals … well, overlap. We tend to jump through hoops a little to achieve this sometimes, I felt.
But you never know, things have moved on since the Oracle 8, so I searched the 10g and 11g documentation to see if I’dbeen negligent in not reading the New Features Guide. Alas, nothing in the SQL Reference, but I did turn up this interesting entry:
ORA-30085: syntax error was found in overlaps predicate Cause: A syntax error was found during parsing an overlaps predicate. Action: Correct the syntax.
“How”, I asked myself, “can there be an error message relating to a non-existent feature”?
“What if”, I continued, “said feature is not missing, but merely undocumented?”
So I tested it.
And it works.
SQL*Plus: Release 10.2.0.1.0 - Production on Thu Jun 12 23:14:03 2008 Copyright (c) 1982, 2005, Oracle. All rights reserved. Enter user-name: XXX/XXX@XXX Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production With the Partitioning, Oracle Label Security and Data Mining Scoring Engine options SQL> select * from dual 2 where (date '2007-01-01', date '2008-01-01') 3 overlaps (date '2005-01-01', date '2006-01-01') 4 / no rows selected SQL> SQL> select * from dual 2 where (date '2007-01-01', date '2008-01-01') 3 overlaps (date '2005-01-01', date '2007-01-02') 4 / D - X SQL> SQL> select * from dual 2 where (date '2007-01-01', interval '5' year) 3 overlaps (date '2005-01-01', date '2007-01-02') 4 / D - X SQL> SQL> select * from dual 2 where (date '2007-01-01', interval '5' year) 3 overlaps (date '2005-01-01', interval '10' year) 4 / D - X SQL> SQL> select * from dual 2 where NOT (date '2007-01-01', date '2008-01-01') 3 overlaps (date '2005-01-01', date '2006-01-01') 4 / D - X SQL> SQL> SQL> with test_data as 2 (select date '2007-01-01' from_dt, date '2007-12-01' to_dt from dual union all 3 select date '2007-04-01' from_dt, date '2007-08-01' to_dt from dual union all 4 select date '2007-07-01' from_dt, date '2007-08-01' to_dt from dual union all 5 select date '2007-10-01' from_dt, date '2008-02-01' to_dt from dual union all 6 select date '2008-01-01' from_dt, date '2008-12-01' to_dt from dual) 7 select * 8 from test_data t1, 9 test_data t2 10 where (t1.from_dt,t1.to_dt) overlaps (t2.from_dt,t2.to_dt) 11 / FROM_DT TO_DT FROM_DT TO_DT --------- --------- --------- --------- 01-JAN-07 01-DEC-07 01-JAN-07 01-DEC-07 01-JAN-07 01-DEC-07 01-APR-07 01-AUG-07 01-JAN-07 01-DEC-07 01-JUL-07 01-AUG-07 01-JAN-07 01-DEC-07 01-OCT-07 01-FEB-08 01-APR-07 01-AUG-07 01-JAN-07 01-DEC-07 01-APR-07 01-AUG-07 01-APR-07 01-AUG-07 01-APR-07 01-AUG-07 01-JUL-07 01-AUG-07 01-JUL-07 01-AUG-07 01-JAN-07 01-DEC-07 01-JUL-07 01-AUG-07 01-APR-07 01-AUG-07 01-JUL-07 01-AUG-07 01-JUL-07 01-AUG-07 01-OCT-07 01-FEB-08 01-JAN-07 01-DEC-07 FROM_DT TO_DT FROM_DT TO_DT --------- --------- --------- --------- 01-OCT-07 01-FEB-08 01-OCT-07 01-FEB-08 01-OCT-07 01-FEB-08 01-JAN-08 01-DEC-08 01-JAN-08 01-DEC-08 01-OCT-07 01-FEB-08 01-JAN-08 01-DEC-08 01-JAN-08 01-DEC-08 15 rows selected. SQL> SQL> with test_data as 2 (select date '2007-01-01' from_dt, date '2007-12-01' to_dt from dual union all 3 select date '2007-04-01' from_dt, date '2007-08-01' to_dt from dual union all 4 select date '2007-07-01' from_dt, date '2007-08-01' to_dt from dual union all 5 select date '2007-10-01' from_dt, date '2008-02-01' to_dt from dual union all 6 select date '2008-01-01' from_dt, date '2008-12-01' to_dt from dual) 7 select * 8 from test_data t1, 9 test_data t2 10 where NOT (t1.from_dt,t1.to_dt) overlaps (t2.from_dt,t2.to_dt) 11 / FROM_DT TO_DT FROM_DT TO_DT --------- --------- --------- --------- 01-JAN-07 01-DEC-07 01-JAN-08 01-DEC-08 01-APR-07 01-AUG-07 01-OCT-07 01-FEB-08 01-APR-07 01-AUG-07 01-JAN-08 01-DEC-08 01-JUL-07 01-AUG-07 01-OCT-07 01-FEB-08 01-JUL-07 01-AUG-07 01-JAN-08 01-DEC-08 01-OCT-07 01-FEB-08 01-APR-07 01-AUG-07 01-OCT-07 01-FEB-08 01-JUL-07 01-AUG-07 01-JAN-08 01-DEC-08 01-JAN-07 01-DEC-07 01-JAN-08 01-DEC-08 01-APR-07 01-AUG-07 01-JAN-08 01-DEC-08 01-JUL-07 01-AUG-07 10 rows selected. SQL> explain plan for 2 with test_data as 3 (select date '2007-01-01' from_dt, date '2007-12-01' to_dt from dual union all 4 select date '2007-04-01' from_dt, date '2007-08-01' to_dt from dual union all 5 select date '2007-07-01' from_dt, date '2007-08-01' to_dt from dual union all 6 select date '2007-10-01' from_dt, date '2008-02-01' to_dt from dual union all 7 select date '2008-01-01' from_dt, date '2008-12-01' to_dt from dual) 8 select * 9 from test_data t1, 10 test_data t2 11 where (t1.from_dt,t1.to_dt) overlaps (t2.from_dt,t2.to_dt) 12 / Explained. SQL> set linesize 200 SQL> select * from table(dbms_xplan.display) 2 / PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------ ---------------------------------------- Plan hash value: 1640239735 ---------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 36 | 22 (37)| 00:00:01 | | 1 | TEMP TABLE TRANSFORMATION | | | | | | | 2 | LOAD AS SELECT | | | | | | | 3 | UNION-ALL | | | | | | | 4 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 | | 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 | PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------ ---------------------------------------- | 6 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 | | 7 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 | | 8 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 | | 9 | NESTED LOOPS | | 1 | 36 | 12 (0)| 00:00:01 | | 10 | VIEW | | 5 | 90 | 2 (0)| 00:00:01 | | 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6602_EE7BE7EC | 5 | 90 | 2 (0)| 00:00:01 | |* 12 | VIEW | | 1 | 18 | 2 (0)| 00:00:01 | | 13 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6602_EE7BE7EC | 5 | 90 | 2 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------ ---------------------------------------- --------------------------------------------------- 12 - filter((INTERNAL_FUNCTION("T1"."FROM_DT"),INTERNAL_FUNCTION("T1"."TO_DT"))OVERLAPS(INTERNAL _FUNCTION("T2"."FROM_DT"),INTERNAL_FUNCTION("T2"."TO_DT"))) 26 rows selected.
So not only does it let you check two pairs of dates to see if they overlap, it lets you check if they do not overlap (pretty simple stuff), and it also accepts a date and interval expression in place of date pairs.
It doesn’t work for pairs of numbers though, it seems:
SQL> select * from dual 2 where (1,5) 3 overlaps (3,8) 4 / where (1,5) * ERROR at line 2: ORA-00932: inconsistent datatypes: expected DATE got NUMBER SQL>
Still, very cool! Undocumented, but still cool.
Does it make it more cool if it is undocumented?
Posted in Oracle | 21 Comments »
Posted by David Aldridge on 2008-05-21
I wrote a while ago about the harmful effect of check constraints on query optimization. I’ll pause while you catch up on that …
Anyway, this appears to have been addressed in 10.2.0.4, and it’s now safe to go back into the water on applying check constraints. Bug 5891471.
The bug description also notes that you can set event 10195 to disable transitive predicate generation using check constraints.
I haven’t tested any of this — I will if I get time. Unless someoneelse wants to …
Posted in Oracle | 4 Comments »
Posted by David Aldridge on 2008-05-07
Following on from the previous post, a little glitch in the plans: the instance appeared to crash late last night during the load. Hopefully not some exotic bug associated with dynamic sampling.
I modified the delete-and-lock strategy yesterday after remembering that we have a number of indexes that are created with the “compute statistics” option. Since DBMS_STATS.LOCK_TABLE_STATS is intended to prevent the modification of any statistics associated with a table I ran a quick test to see if compute statistics can be specified for an index created against a statistics-locked table, and indeed it cannot.
SQL Error: ORA-38029: object statistics are locked
38029. 00000 - “object statistics are locked”
*Cause: An attept was made to modify optimizer statistics of the object.
*Action: Unlock statistics with the DBMS_STATS.UNLOCK_TABLE_STATS procedure
on base table(s). Retry the operation if it is okay to update statistics.
Well, it was probably safer not to anyway for a first run.
Now, whatever happened to that database instance?
Posted in Oracle | Leave a Comment »
Posted by David Aldridge on 2008-05-06
I am absolutely sick and tired of dealing with issues where misleading partition statistics (for example) cause ETL queries to go crazy and try to sort-merge tables with tens of millions of rows each.
Since we’re about to run a test load on production data I’m going to drop all table and index statistics and lock them. OK, actually I’m going to back the statistics up first, and then I’m going to drop and lock them.
We’ll see how we do on a 100% dynamic sampling-based system …
Posted in Oracle | 15 Comments »
Posted by David Aldridge on 2008-04-25
“See-qwel” or “S-Q-L”? Have your say on this highly important matter here, and help shape the future of our industry.
Posted in Oracle | 11 Comments »
Posted by David Aldridge on 2008-04-17
I’m told that it’s very important to keep one’s brain in shape as one grows older. Or “drifts gently towards the twilight” as I prefer to think of it. Trying to look on the optimistic side, obviously. Fortunately I end each day with my brain wrung out like a well-used sponge courtesy of some ETL development work on a largeish data warehouse development.
Although the regular ETL development is outsourced, we are responsible for the coding that processes historical data, so on deployment we will run our historical load code and then the “other team’s” regular load will take over. Yes, it is a strange arrangement. And by “we” I really mean my collegues somewhere in the pestilent jungles of Brazil, surrounded by poisonous frogs and who-knows-what other kind of other fearsome thing, and living on a diet of rice and beans in their houses on stilts.
My own role is not so bad. I don’t have to get too involved in the detailed busness logic that the code has to implement, and thank goodness for that because frankly I’ve not seen anything so complex for … well, forever. So I sit here in a metaphorically lofty perch, making myself popular by criticising the hard work of others and encouraging them to greater efforts by calling them lazy if they stop work before midnight (the traditional management role). Yes, it’s a good life alright. I constantly remind them that the real aim of the project is to produce a process for loading five years of historical data that is faster than the “B-Team’s” load of single day.
I think our methodology works quite well though. When you are down in the weeds of the logic (did I mention that it is particularly complex logic?) it is always difficult to see the slightly larger picture of where you’re taking the data next and how you optimise both specific ETL stages and how you can adjust the overall process to make everything more efficient. It’s a little like pair programming, I guess.
And we do get to do the fun bit — 10 million records joining 0-M:0-N through eight OR’d join conditions to 10 million other records, with a couple of billion joins resulting that we then artfully winnow down to a few million. The optimization process is a little mind bending, with all sorts of issues that we wouldn’t normally worry about too much becoming vitally important in an effort to avoid blowing out the host server’s memory with a degree of parallelism of 40. I have to say that there was initially some FUD over the historical process, with estimates of the load duration being bandied about that were in the order of months — we never thought it would take more than five days, and it’s now looking like it’ll run in under one day, which means that we’d be able to start the daily load process directly afterwards without altering it. And I think we still have a few more tricks up our sleeves as well in case we need to buy an extra hour somehow.
In fact it’s turning into something of a stroll through the SQL Reference book. The list of features we’re touching on include analytic functions (which have become a second language for everyone now, along with the “PCTFREE 0 NOLOGGING COMPRESS” which I nearly appended to my sandwich order the other day), connect by clauses, and even some rather exotic stuff with finding the lowest set bit in a something-or-other that I don’t understand yet but which Heitor thought of while lying in bed one night and which he used to help reduce an eight hour process to fifteen minutes. They say that they don’t have siestas in Brazil, but maybe they should start.
Oh well, another morning … Brazil is online … back to work!
Posted in Data Warehousing, Oracle, Performance | 10 Comments »
Posted by David Aldridge on 2008-04-13
Greg Rahn has an excellent entry on statistics gathering and the use and abuse of dbms_stats and initialization parameters at the Structured Data blog, one of my favourites. I added a comment on dynamic sampling, of which I am a big proponent, and Greg has another recent post on a case where dynamic sampling saved the day. Hey, I commented on that too …
Posted in Oracle | 2 Comments »