Multitable Insert and Distinct Values

I puzzled a little on the way to work about efficiently populating a hierarchy table based on a flat dimension. (“Why?” … seriously, you just don’t want to know that).

So let us suppose you have a table like this:

create table dim 
  (city varchar2(30), 
  state varchar2(30), 
  region varchar2(30), 
  country varchar2(30)) 
/

… and data like this … 

insert into dim values ('Fairfax'         ,'Virginia','US-East'   ,'USA'  ); 
insert into dim values ('Merrifield'      ,'Virginia','US-East'   ,'USA'  ); 
insert into dim values ('Alexandria'      ,'Virginia','US-East'   ,'USA'  ); 
insert into dim values ('Portland'        ,'Maine'   ,'US-East'   ,'USA'  ); 
insert into dim values ('Colorado Springs','Colorado','US-Central','USA'  ); 
insert into dim values ('Denver'          ,'Colorado','US-Central','USA'  ); 
insert into dim values ('Longmont'        ,'Colorado','US-Central','USA'  ); 
insert into dim values ('Salt Lake City'  ,'Utah'    ,'US-Central','USA'  ); 
insert into dim values ('Rome'            ,'Lazio'   ,'IT-Central','Italy');

… and you want to turn it into a hierarchy in the following table …

create table hier 
(attribute varchar2(30), 
parent varchar2(30), 
lvl varchar2(30)) 
/

You could do it as follows:

insert into hier 
select          city   , state  , 'City'    from dim 
union all 
select distinct state  , region , 'State'   from dim 
union all 
select distinct region , country, 'Region'  from dim 
union all 
select distinct country, null   , 'Country' from dim 
/

No distinct on the first select clause because we assume that City is unique (it would be if it were the PK on a real dimension, of course). In the absence of indexes to support fast full index scans we have four full table scans and three operations to support the distinct operator …

----------------------------------------------------------------------------                                                                                                                                                                                                                                 
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |                                                                                                                                                                                                                                 
----------------------------------------------------------------------------                                                                                                                                                                                                                                 
|   0 | INSERT STATEMENT    |      |    25 |   418 |    11  (82)| 00:00:01 |                                                                                                                                                                                                                                 
|   1 |  UNION-ALL          |      |       |       |            |          |                                                                                                                                                                                                                                 
|   2 |   TABLE ACCESS FULL | DIM  |     9 |   171 |     2   (0)| 00:00:01 |                                                                                                                                                                                                                                 
|   3 |   HASH UNIQUE       |      |     9 |   162 |     3  (34)| 00:00:01 |                                                                                                                                                                                                                                 
|   4 |    TABLE ACCESS FULL| DIM  |     9 |   162 |     2   (0)| 00:00:01 |                                                                                                                                                                                                                                 
|   5 |   HASH UNIQUE       |      |     5 |    75 |     3  (34)| 00:00:01 |                                                                                                                                                                                                                                 
|   6 |    TABLE ACCESS FULL| DIM  |     9 |   135 |     2   (0)| 00:00:01 |                                                                                                                                                                                                                                 
|   7 |   HASH UNIQUE       |      |     2 |    10 |     3  (34)| 00:00:01 |                                                                                                                                                                                                                                 
|   8 |    TABLE ACCESS FULL| DIM  |     9 |    45 |     2   (0)| 00:00:01 |                                                                                                                                                                                                                                 
----------------------------------------------------------------------------                                                                                                                                                                                                                                

A pretty clean looking piece of SQL, but multiple scans of tables make my hackles rise. If this were a very large table and indexes couldn’t be leveraged then it could be a performance problem.

An alternative occured to me:

insert all 
when 1               = 1 then into hier values (city   ,state  ,'City'   ) 
when state_row_ind   = 1 then into hier values (state  ,region ,'State'  ) 
when region_row_ind  = 1 then into hier values (region ,country,'Region' ) 
when country_row_ind = 1 then into hier values (country,null   ,'Country') 
select city, 
       state, 
       region, 
       country, 
       row_number() over (partition by region  order by 1) region_row_ind, 
       row_number() over (partition by state   order by 1) state_row_ind, 
       row_number() over (partition by country order by 1) country_row_ind 
from dim 
/

A single scan of the dim table is performed, with three window sorts:

--------------------------------------------------------------------------------                                                                                                                                                                                                                             
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     |                                                                                                                                                                                                                             
--------------------------------------------------------------------------------                                                                                                                                                                                                                             
|   0 | INSERT STATEMENT        |      |     9 |   963 |     5  (60)| 00:00:01 |                                                                                                                                                                                                                             
|   1 |  MULTI-TABLE INSERT     |      |       |       |            |          |                                                                                                                                                                                                                             
|   2 |   INTO                  | HIER |       |       |            |          |                                                                                                                                                                                                                             
|   3 |   INTO                  | HIER |       |       |            |          |                                                                                                                                                                                                                             
|   4 |   INTO                  | HIER |       |       |            |          |                                                                                                                                                                                                                             
|   5 |   INTO                  | HIER |       |       |            |          |                                                                                                                                                                                                                             
|   6 |    VIEW                 |      |     9 |   963 |     5  (60)| 00:00:01 |                                                                                                                                                                                                                             
|   7 |     WINDOW SORT         |      |     9 |   288 |     5  (60)| 00:00:01 |                                                                                                                                                                                                                             
|   8 |      WINDOW SORT        |      |     9 |   288 |     5  (60)| 00:00:01 |                                                                                                                                                                                                                             
|   9 |       WINDOW SORT       |      |     9 |   288 |     5  (60)| 00:00:01 |                                                                                                                                                                                                                             
|  10 |        TABLE ACCESS FULL| DIM  |     9 |   288 |     2   (0)| 00:00:01 |                                                                                                                                                                                                                             
--------------------------------------------------------------------------------

The cost of the two methods is exactly the same, but the cardinalities are both out — the correct answer is “19″. Of the two methods I’d expect the first to be more accurate and it did do pretty well. With such a small amount of data it’s difficult to make an accurate comparison, and the proper approach would be to benchmark each method on the production data.

One interesting difference is that with the single table insert method you can control the order of insert into the hierarchy table, which might be important if you wanted to physically cluster particular rows together. I don’t think that there is a deterministic way of controlling the insert order in the multitable insert.

Anyway, that was it. No rocket science, just an idle thought.

About these ads

One thought on “Multitable Insert and Distinct Values

  1. Darn it, here you go making me look at the docs and see things like “You cannot specify a sequence in any part of a multitable insert statement. A multitable insert is considered a single SQL statement. Therefore, the first reference to NEXTVAL generates the next number, and all subsequent references in the statement return the same number.”

    No rocket science, but maybe some subatomic physics? There’s a dead cat trying to claw its way out of my head now.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s