Back to a More Simple Time

If we browse the Oracle Data Warehousing Guide we find a number of interesting database structures and features that can make our warehouses easier to load, faster to query, more simple to manage etc.. Partitioning, parallelsim, bitmap indexes, data segment compression, materialized views and query rewrite, for example. It has been a long, long time since I implemented anything of any size without at least two or three of these.

So let us imagine, gentle reader, a return to the days of simplicity when we had none of these — how would that look? How would it perform? Well it turns out that it would look a lot like the data warehouse that comes out of the box with Siebel Marketing Analytics. No partitioning? Check! Only the most ordinary, uncompressed b-tree indexes? Check! Etc.. You get the idea. And how will it perform? Our lucky band of adventurers is about to find out, as we implement it with the bare minimum of modifications to add some custom data elements from our Siebel CRM system.

Some current areas of concern:

  • A fact table with 50 indexes that in development consume 1.5Gb of space for every 1Gb of fact table data. How is the drop-all-indexes-and-rebuild-after-every-load strategy going to turn out with production sized data?
  • The use of updates against the largest fact table. No data segment compression!
  • No foreign keys and bitmap indexes. Do star transformations even work with that?

 On the plus side we do have a performance testing environment sized at around 80% of production data volumes, so we will have an opportunity to tweak the worst offenders. We do have parallelsim and the licensing for everything that we moght want to throw at the system, and we do have Quest Performance Analysis for Oracle already prepped on the test and production instances.

In preparation for performance testing I have a little mental checklist of what can be implemented quickly and quietly to tackle certain problems, such as replacing b-tree indexes with bitmaps, or a little partitioning, but it’s complexified by the presence of the “Datawarehouse Administration Console” (DAC), an apparantly essential component of the architecture that is responsible for executing the 400+ Informatica workflows that load the staging and warehouse tables. The philosophy around it seems to be that anything platform specific, such as dropping and creating indexes, is defined and controlled through DAC rather than through Informatica (where it is generally a little kludgy, evn if it is serviceable). So we have some exploration to do on that product to see how it handles customisation of indexes (maybe it recreates them the way it found them, but I suspect some customisation is required to make it deal with creating “exotics” such as local bitmap indexes).

We go into integration testing this coming week, with performance testing to follow. Should be an interesting ride.

About these ads

13 thoughts on “Back to a More Simple Time

  1. Good to know you’re still with us, David.
    As for the demented design, nothing surprises me anymore from the modern “application design” brigade. The most egregious designs are beiong put forward and when folks who know what volumes of processing and capacity planning means put forward their concerns, the whole thing is overridden with a magical “but the app server cache takes care of all that”.

    Let them sink in their own self-made hell, is my motto now.

  2. I don’t believe it’s really Dave. I think the sponge has been blogjacked and this is the work of some evil villain trying to pass themselves off as the bearded one!

    Tim…

  3. Pingback: Oracle Siebel Analytics Data Warehouse: Part II « The Oracle Sponge

  4. Hmm, maybe what I meant was “less complex”. Now that we’re into the meat of the project I can’t associate it with the concept of “simple” at all :(

  5. I work at a variety of customers implementing Siebel Analytics (now Oracle BI). While the OTB product works fine in most environments, this kind of tuning is not uncommon for post-installation to account for customer specific environments. I’ll be really interested to pick up any tips.

    Which version of Siebel Analytics are you working with? Siebel Analytics is meant to support most major database platforms so is built for the ‘lowest common denominator.’ The most recent version (7.9) introduces some database specific features such as bitmap index support for Oracle and SQL Server’s clustered indexes. Teradata specific features were introduced in 7.8 I think.

    Since built to run on any database and customers have different data profiles, its hard to imagine they could ship a product that could implement many of these Oracle specific features. For example, they only ship a handful of aggregates as these have been identified as commonly used by most customers and must be able to run on any database so they implement these as database tables rather than MVs. Seems like it would be near impossible to manage the product if they shipped these aggregates as Oracle MVs, Teradata Joins, DB2 views and SQL Server tables.

    Here’s a tip for you:
    DAC exposes the SQL it uses to build indexes in an XML file so you can modify how indexes are created. In the \DAC\CustomSQLS folder, there is an XML file ‘customsql.xml’ which has a template of all SQL commands DAC issues. You can modify the Oracle template to includes Oracle specific options such as parallelism, logging vs nologging, etc. You can also use this file to modify how DAC analyzes tables and indexes and truncates tables.

  6. Thanks Patrick.

    I think that the number one lesson we have learned is that Oracle 10g is a much, much better version to be extracting data from Siebel on than 9i, specificaly because 9i will not swap joins inputs on a hash outer join. Because of this we are staging data on a 10g database prior to extraction, and also creating skinny versions of some of our larger tables (eg. s_order_item) to give better performance on some of the self-joins we encounter. The overhead on moving the data is generally in the network so using multitable insert to opulate two or three different (compressed) versions of the major tables is not on the critical path for performance.

    So that has taken execution time for some extraction queries from 23+ hours (failed on snapshot too old error and hammered the CPU’s with bilions of LIO’s) down to 35 minutes.

    We did already modify the customsql.xml — the out of the box analyze was painful. estimate-percent of 30 and the method_opt of ‘for all indexed columns size auto’ waskilling us. We’re using 2% block sampling and ‘for all columns size 1′ now which gives very acceptable performance, and have resonably aggressive dynamic sampling at level 4 to handle skewed data.

    Later on we’ll certainly be switching a lot of indexes to compress or bitmap, and probably introducing partitioning.

  7. Pingback: How to Create Multiple Indexes in a Single DDL Statement « The Oracle Sponge

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s