The Oracle Sponge

Oracle Data Warehouse Design and Architecture

Year-to-Date Query Improvement Through Transformation Tables

Posted by David Aldridge on 2006-08-15

A quick comparison here between two methods of deriving year-to-date metric values from a data warehouse fact table.

Suppose we have a simple fact table of the following form:

SQL> desc fct_sales
 Name                                      Null?    Type
 ----------------------------------------- -------- ---------------------------- 

PRODUCT_CD                                NOT NULL NUMBER
 LOCATION_CD                               NOT NULL NUMBER
 DATE_OF_SALE                              NOT NULL DATE
 SALES_AMT                                 NOT NULL NUMBER

We have a business requirement to report sales by year-to-date. This is a resource intensive query because we have to report every day’s sales against every subsequent day in the calendar (or fiscal) year.

Given a dimension table “dim_sales_dates” of the structure :

SQL> desc dim_sales_dates
 Name                                      Null?    Type
 ----------------------------------------- -------- ---------------------------- 

DATE_OF_SALE                              NOT NULL DATE
 MTH_OF_SALE                               NOT NULL VARCHAR2(8)
 QTR_OF_SALE                               NOT NULL VARCHAR2(7)
 YR_OF_SALE                                NOT NULL VARCHAR2(4)

Method 1 relies on modifying the join between the date dimension table and the fact table to the following form:

select   d.mth_of_sale,
         sum(sales_amt)
from     fct_sales f,
         dim_sales_dates d
where    f.date_of_sale between trunc(d.date_of_sale,'yyyy')
                            and d.date_of_sale and
         d.mth_of_sale in ('2005 M05','2005 M06')
group by d.mth_of_sale
/

The potential problem with this is that the high number of joins required between the two tables and the relatively small data volume for the dimension table suggests that a hash join would be the optimum method. However as is documented in the Database Performance Tuning Guide the hash join can only be used for equijoins, for reasons that are obvious if you know how hashes work.

An alternative is to materialize the result of the BETWEEN join in a transformation table that relates DATE_OF_SALE to those dates prior to it in the year. Here is a sample structure:

SQL> desc trn_sales_dates_ytd
 Name                                      Null?    Type
 ----------------------------------------- -------- ---------------------------- 

DATE_IN_YTD                               NOT NULL DATE
 DATE_OF_SALE                              NOT NULL DATE
 MTH_OF_SALE                               NOT NULL VARCHAR2(8)
 QTR_OF_SALE                               NOT NULL VARCHAR2(7)
 YR_OF_SALE                                NOT NULL VARCHAR2(4)

The column DATE_IN_YTD contains all values of date from 1st January for the year of DATE_OF_SALE up to and including DATE_OF_SALE — the primary key here is of course (DATE_OF_SALE, DATE_IN_YTD).

We can now rewrite the YTD query as follows:

select   t.mth_of_sale,
         sum(sales_amt),
         count(*)
from     fct_sales           f,
         trn_sales_dates_ytd t
where    f.date_of_sale = t.date_in_ytd and
         t.mth_of_sale in ('2005 M05','2005 M06')
group by t.mth_of_sale
/

The hash join is now permitted and theoretically the query performance can improve.

Here is a complete test script, and here is the result of an execution on Oracle 10.2.0.2.0.

The demonstration shows the following:

  1. Use of a nested loop in method 1 anda hash join in method 2
  2. A drop in the consistent gets from 109,971 to 2,550
  3. An improvement in wall clock time from 18.51 seconds to 4.98 seconds.

Conclusion

Implementing transformation tables to allow the use of equijoins in data warehouse queries can provide performance benefits that make them well-worth considering.

15 Responses to “Year-to-Date Query Improvement Through Transformation Tables”

  1. Pete_S Says:

    Giving away my secrets? ;-)
    The transform table gets a little more interesting if you also put in YTD (last year) and various moving totals.

    Have you also tried functions to return a table of dates? Worked nicely in SQL but a bit awkward to use in BO.

  2. David Aldridge Says:

    Heh, I’m sure you have plenty more Pete.

    “… functions to return a table of dates …” — sounds like an ideal topic for a blog entry. I look forward to it :D

  3. Mathew Butler Says:

    So, the transformation table is created and loaded once during warehouse load time. Hence, no need to include the LIO for the transformation table creation/load?

    Mat.

  4. joel garry Says:

    The transform table gets a little more interesting if you also put in YTD (last year) and various moving totals.

    The very first report I wrote as a junior programmer was that (in Basic+/DMS500). 20+ years and several generations of technology later, I had a brief contract to make a very similar report written in a db-independent 4GL finish in a useable time. The feelings of Sisyphusian deja vu were almost overwhelming. The funny thing is, that latter customer just refuses to use DW technologies or modern SQL features (they have EE). The result was, and remains, awk, sort and SQL in korn shell scripts to load some “special tables.”

  5. David Aldridge Says:

    Mat, yes the transformation table could either be populated in advance, or my own preference would be to maintain it by adding in new dates only as required. Either way the resources required to maintain it are insignificant in relation to the resources saved.

    Joel, it’s a funny thing, when Luddites are put in charge of data warehouses. Strange things happen.

  6. Pete-s random notes » Blog Archive » Year-to-date alternatives Says:

    [...] David Aldridge challenged me to write about functions that return lists of dates. So here goes [...]

  7. The Oracle Sponge » Blog Archive » Whoops! Says:

    [...] Year-to-Date Query Improvement Through Transformation Tables [...]

  8. Dave Edwards Says:

    Thanks, good post. Mike Kruckenberg mentions it in Log Buffer #6.

    Cheers,
    Dave Edwards
    Log Buffer

  9. Rogier Werschkull Says:

    I don’t quite get it, take the example below for data that has a yearmonth grain. if i would like the yearmonth totals for 2006Q1 and 2006Q2 i would get wrong results because I would get duplicates of the yearmonth in ytd column that will be joined to the facts…

    regards,
    Rogier Werschkull
    ————
    yearmonth in ytd yearmonth quarter
    200601 200601 2006Q1
    200601 200602 2006Q1
    200602 200602 2006Q1
    200601 200603 2006Q1
    200602 200603 2006Q1
    200603 200603 2006Q1
    200601 200604 2006Q2
    200602 200604 2006Q2
    200603 200604 2006Q2
    200604 200604 2006Q2
    200601 200605 2006Q2
    200602 200605 2006Q2
    200603 200605 2006Q2
    200604 200605 2006Q2
    200605 200605 2006Q2
    200601 200606 2006Q2
    200602 200606 2006Q2
    200603 200606 2006Q2
    200604 200606 2006Q2
    200605 200606 2006Q2
    200606 200606 2006Q2

  10. David Aldridge Says:

    Rogier,

    The month always has to be included in the select clause, and if you have any other higher dimensional columns in the YTD table they would be purely for predicating — for example, “give me the year-to-month totals for all months in quarter 2006 Q3″.

    If you wanted to apply the same method to year-to-quarter queries you’d use a different table ….

    2006Q1 2006Q1
    2006Q2 2006Q1
    2006Q2 2006Q2
    2006Q3 2006Q1
    2006Q3 2006Q2
    2006Q3 2006Q3
    2006Q4 2006Q1
    2006Q4 2006Q2
    2006Q4 2006Q3
    2006Q4 2006Q4

  11. Rogier Werschkull Says:

    Thank you David, it is clear to me now!

  12. Stephen Says:

    it is clear to me too!!!

    thanks

  13. Time series problems « Pete-s random notes Says:

    [...] series problems A while back David Aldridge and I both blogged about the joys of time series analysis on sparse data; that is, the sort of data [...]

  14. Joe Says:

    I can’t get to the complete test script, and the result of an execution on Oracle 10.2.0.2.0. Please correct link. thanks.

  15. Rittman Mead Consulting » Blog Archive » Year-to-date alternatives Says:

    [...] Aldridge challenged me to write about functions that return lists of dates. So here [...]

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>