Comments (the type you get in code, that is)

“Supposed” to get, that is. What is it with SQL and comments? Why do we so rarely see explanatory comments in SQL code? Or even a decent structure?

One of the features of front-end tools like Business Objects or Microstrategy is their ability to allow the designer to create extraordinarily long and complex looking SQL with ease. (If you don’t use such tools then don’t go away — I’ll deal with you in a moment). It starts by having an object or two that are pretty simple, then you add them together … then you divide by another two objects added together, but then you need a decode on the divisor to change zero’s to null’s so that the SQL doesn’t error out, and pretty soon you’ve got an expression with a length of 1000 characters that is unreadable to the human eye.

Of course it’s not just extremely long expressions that are tricky … sometimes it’s to look at a piece of SQL even to find out which simple expression relates to which object in the universe/project/whatever. So do me a favour and put a comment block in the definitions … something like …

Sum(order_amt)/Decode(Sum(order_qty),0,Null,Sum(order_qty)) /* “Average Unit Cost” */

… and it’ll appear in the SQL itself. Not too tricky, eh?

You know, I do have a method for doing this automatically by hacking a Business Objects repository with an update statement, but if I told you how to do that and you fkd it up then you’d probably sue me or something.

By the way, the same thing applies to join definitions.

Moving along … if you’re hand-coding SQL statement then you have even less excuse for not commenting your SQL. If you follow the principle that it is better to do things in SQL than PL/SQL then you’re probably packing a fair amount of functionality into each statement, and that means …

  • Write a header to say what the hell the SQL is supposed to do … doesn’t have to be in the SQL itself, just above it will do nicely. In fact, work out this statement even before you write the SQL itself and you’ll keep a more clear mind when you are writing the code. Include in that header how it is going to do it — remember that complex SQL packs a huge amount of functionality into a small amount of text — it’s much more “logically dense” than regular 3GL code. TK has an example of this here, and thanks for the suggestion to include this, Tom.
  • Maybe you might even like to put in a comment about performance, eh? “Runs in about 0.2 seconds” … “Uses a lot of hash joins” … etc
  • Put a statement inside the SQL to say where in the code it is located (package/procedure). You wouldn’t inflict a twenty-page procedure on anyone so it’ll be easy to find.
  • You might even like to put in a unique identifier for the SQL as well as it’s location (for example, so that it’s easy to reference the copy of a “good” explain plan that you’ve kept for that query, in case of problems down the road).
  • Format the code nicely
  • NEW ITEM: if you are referencing any view more complex than “SELECT blah from whatever”, or if any of the tables are accessed remotely through a db link that is obscured by a synonym, or is actually the query name for a subquery factoring clause, then a comment would be appropriate. And when I say “appropriate” I mean just do it.
  • Clearly seperate the joins from the filters
  • Any tricky looking expressions, state what they do. Especially analytic functions and subquery factors (“what they?“)
  • Document anything that looks funky (if you are using “… and rownum > 1” in an in-line view to stop the optimizer from pushing predicates then say so … not everyone is as smart as you are).
  • (Except Tom Kyte (“who he?“), who tells me that he’s perfect. My opinion: man, has he aged since his photo in “Expert One On One”. Am I right, ladies?).
  • Erm …
  • For the love of God, use table aliases. Friendly ones, not a, b, and c.
  • Just use your common sense. People have to support that code
  • That’s it

Back to work, fellas.

Advertisements

Thread about block sizes

An interesting question from sejasp on blocks sizes … http://dba.ipbhost.com/index.php?showtopic=1512

A five-times improvement in i/o performance, just by changing a block size? It's the sort of thing that ought to be making the hairs on the back of your neck rise.

UPDATE: April 23rd

Well the thread has morphed into the issue of whether it is appropriate/rude/presumptious of a person to ask questions of the original poster in a forum thread.

My opinion, which I think I state pretty clearly there, is based on the following:

  • A person has asked a question in a public forum, requesting advice.
  • As their question stands, it offers the possibility of a number of different answers, and some of the answers may be inappropriate.
  • Erm … that's it.

The most appropriate answer for the situation may be, "I do not think that any action is required here … what you perceive as a problem does not in my opinion justify any corrective action." You know that's advice that I've given quite frequently. Here's some examples of questions to which the answer is "Do Nothing"

  1. My data warehouse application is not using bind variables … what should I do?
  2. I have a number of indexes with more than 120 extents … how should I reduce this?
  3. My buffer cache hit ratio has dropped below 99% to 97% … help!

So under these circumstances, can anyone tell me what is wrong with probing for more information? Especially when you are giving free advice.

UPDATE: 24th April

Thanks for the nice thoughts there, fellas. Much appreciated.

UPDATE: 25th April

Happy ending — see, I knew sejasp was a good bloke on account of being an Aussie. I have about nine great relatives over there (mostly poaching, but a little light larceny also).

Hey, just jokes fellas. Happy ANZAC day. (Is that appropriate? Is ANZAC day potentially a happy thing? comments welcome, as always)

Thoughts on Cost Based Optimization

Just a brief note.

In general terms there are three elements to getting a good execution plan out of Oracle’s Cost Based Optimizer (CBO).

The first of these is to give Oracle sufficient information about the data – “metadata”. This means, for example:

  • Using constraints where possible to indicate uniqueness, nullability, foreign key relations
  • Using statistics to indicate data volumes and distributions

The second of these is to give Oracle sufficient information about the system on which the instance is running. This means, for example:

  • Correct values for initiation parameters such as db_file_multiblock_read_count, cpu_count.
  • Before 9i, appropriate values for optimizer_index_cost_adj
  • From 9i upwards, use of DBMS_Stats.Gather_System_Stats() to benchmark the hardware.

The third of these is to write SQL that the optimizer can work with. I have no examples of this, but I have an instinctive feeling that it is possible to write suboptimal SQL that is “difficult” to optimize. If I think of anything more substantial then I’ll update.

So given that you believe that you have a CBO-related problem, I would propose that there are three categories of investigation: Metadata, Configuration, and SQL.

I can’t decide whether this is blindingly obvious to everyone or a sudden flash of meaningful insight. Is there a way of conducting a poll in this blog? Just post a comment. “Tautology” or “Genius”.

As always, thoughts are welcome, particularly on that last point but on anything herein of course.

Upgrade News

Informatica

The worlds smallest hooray for our upgrade from Informatica 5 to the latest version 7. Apparantly in version 5 when some numeric fields where contaminated with non-numerics, they were just being set to zero. Version 7 correctly rejects these rows, thus revealing a problem with one of our source data feeds. Looks like some of the columns sometimes get shifted over.

Informatica is not-so-good at dealing with such an oddity, although to be fair I suppose that if a file is suypposed to be “fixed width” then the very least that you could expect of it is that it’s widths would be in some way “fixed”.

I guess that we’ll probably apply some kind of “preprocessing” script to the raw data files to detect, correct, and log the problem.

Oracle

I’m a little startled to find that an upgrade to 10g is suddenly a possibility/probability. This brings to mind the saying about being careful what you wish for, ‘cos my mental (at the moment) list of things to be be done is stretching out towards the distant horizon.