How to Create Multiple Indexes in a Single DDL Statement

First, raise an enhancement request with your database vendor …

Yes, don’t get excited — I don’t know of a way to do it. I wish there was though, and the ability to rebuild multiple indexes (or partitions of different indexes) at the same time would be useful because …

Well the problem is in the domain of data warehousing — frankly, all the interesting problems are there ;) — and it arises in ETL operations when we want to load to an unindexed tables and then create or rebuild multiple indexes or index partitions on that data. With each of those index operations we need a full table or partition scan (you know, let’s just assume the “or partition” bit from now on) and for a large data set that may mean very high physical read rates are required.  Although we can avoid or mitigate the physical reads by building multiple indexes at the same time in different sessions using dbms_jobs or dbms_scheduler and relying on caching in the SGA or at a hardware level, we still incur the logical reads.

So I’d like to be able to create multiple indexes based on a single scan of the data segment. A “Create Indexes” statement in which some slave process spits out the required columns and rowids to a set of processes which handle the sorts etc.. Even if it meant writing a temporary (compresed?) data set of the required columns that then gets scanned multiple times then as long as we’re not indexing the majority of the columns on the table we could still see a net benefit.

I really can’t think of a downside to this. Of course, you could try to build fifty indexes on a fact table in a single operation and end up cooking your CPU’s and/or blowing out your temporary disk space and/or memory and incurring horrendous levels of the dreaded multipass sort, but so much of what we do is finding that sweet spot between two extremes anyway. Maybe building those fifty indexes in ten groups of five would reduce the number of reads of the data segment by a factor of five without spilling the sorts to disk, and would also put some decent load on those pricey CPU’s.

Enhancement request time again, then.

Actually, I suppose you can create mulitple indexes in a single DDL statement …

Create table my_table
  (col1 constraint xpk_my_table primary key,
  col2 constraint unq_my_table)
select blah
from whatdjamacallit;

Cheating really though, and I don’t suppose that it avoids rescanning the data segment. If I had a little time I’d test that with a bit of sql tracing.

“How do you create multiple indexes with a single statement” … now that’s quite the interesting interview question.


5 thoughts on “How to Create Multiple Indexes in a Single DDL Statement

  1. Wouldn’t buffer caching give you most of the benefit here, assuming you ran your index creates simultaneously in different sessions without parallelism?

    If it’s partitioned, you could build index partitions one at a time…

  2. Yes, and I have done that before, but …
    i) It’s a real pain to code
    ii) there are still the logical reads required
    iii) all of those buffer busy waits don’t look good.
    iv) No parallelism — although i recall hearing that using the cache hint with an parallel query does load blocks to the buffer cache, so you could start a parallel cached full table/partition scan and follow it up with simultaneous sessions building indexes noparallel (I’m not sure if parallel query on a table set to cache does the same thing). You’d need a buffer cache that was (probably considerably) larger than the data segment you’re scanning. On the other hand a cache in the hardware would help anyway.

  3. I have another thought on this — in some ways this is similar to multitable insert. You can reduce the impact of running multiple single table inserts from the same source tables by relying on caching, but ultimately multitable insert is still a valuable tool

  4. Sounds like a possible enhancement request against CREATE SCHEMA:

    The documentation states that parallel hints are supported but will be ignored – though only documented for tables. It’s not clear whether parallel creation of a single index is allowed. Most of the contents of the command would have to be executed sequentially, for dependency reasons.

    I can’t see any reason why multiple indexes from the statement couldn’t be created in parallel though.

    As you say, enhancement request time…

  5. I’m not sure if anyone’s using that command much, Mathew. There are so many limitations on it that I don’t.

    I feel like it’s one of those features where all of the underlying technologies are there, they just have to be stitched together.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s