The Oracle Sponge

Oracle Data Warehouse Design and Architecture

Archive for September, 2007

Oracle Siebel Analytics Data Warehouse: Part II

Posted by David Aldridge on 2007-09-10

Following on from this, it occured to me this morning that accurate systems statistics may be an important performance factor for us, with all of these indexing options that the designers so thoughtfully provided. *ahem*

This is because there is much evidence that the SANs we connect to are tuned for OLTP-stlye high i/o rate performance, rather than high bandwidth BI performance. My usual instinct to promote full table scans in preference to index-based access methods when accessing a high proportion of a table have led me astray a couple of times recently and it is specifically the balance between index-based single block reads and the multiblock reads associated with table/partition scans (hypothetically supposing that we have partitions) that I want the optimizer to get right.

This very regular reason for statistics aside, we also have another interesting issue – a Very Important System that has a Very Particular Problem. Although a full-sized performance environment (PRF) is available (and used) before every release, the performance characteristics of the storage are exactly the opposite on the performance system as they are on the production (PRD) system. Specifically, PRF is fast to read and slow to write, and PRD is slow to read and fast to write (or is it the other way round? I forget). The mount points are on different SANs for security reasons (things are generally buttoned-down very tightly here, as you’d probably expect of a company with numerous external partners and a very public website) so that probably accounts for the effect in some way, but the implications are rather interesting — if we want to improve PRD performance, we have to tune the statements in PRF that are already fast.

So creating indexes in PRF is fast, because the read is large and the write is small. In production … well, let us just say it’s a bit of a time consumer. Creating a large summary table with a high volume of data performs about the same in both systems but the critical path for performance is different.

System statistics sadly do not distinguish between read and write performance in 10g (afaik), but wouldn’t it be interesting if they did? It’s a little thought experiment for later … “What database design and optimizer decisions might be affected by different balances between read and write performance?”

Hmmm …

Oh, I upped the optimizer_dynamic_sampling from 2 to 4 … or at least I raised a Remedy ticket to have a DBA do it. Another symptom of a buttoned-down environment :D

Posted in Data Warehousing, Oracle, Performance, Statistics | 5 Comments »

Back to a More Simple Time

Posted by David Aldridge on 2007-09-09

If we browse the Oracle Data Warehousing Guide we find a number of interesting database structures and features that can make our warehouses easier to load, faster to query, more simple to manage etc.. Partitioning, parallelsim, bitmap indexes, data segment compression, materialized views and query rewrite, for example. It has been a long, long time since I implemented anything of any size without at least two or three of these.

So let us imagine, gentle reader, a return to the days of simplicity when we had none of these — how would that look? How would it perform? Well it turns out that it would look a lot like the data warehouse that comes out of the box with Siebel Marketing Analytics. No partitioning? Check! Only the most ordinary, uncompressed b-tree indexes? Check! Etc.. You get the idea. And how will it perform? Our lucky band of adventurers is about to find out, as we implement it with the bare minimum of modifications to add some custom data elements from our Siebel CRM system.

Some current areas of concern:

  • A fact table with 50 indexes that in development consume 1.5Gb of space for every 1Gb of fact table data. How is the drop-all-indexes-and-rebuild-after-every-load strategy going to turn out with production sized data?
  • The use of updates against the largest fact table. No data segment compression!
  • No foreign keys and bitmap indexes. Do star transformations even work with that?

 On the plus side we do have a performance testing environment sized at around 80% of production data volumes, so we will have an opportunity to tweak the worst offenders. We do have parallelsim and the licensing for everything that we moght want to throw at the system, and we do have Quest Performance Analysis for Oracle already prepped on the test and production instances.

In preparation for performance testing I have a little mental checklist of what can be implemented quickly and quietly to tackle certain problems, such as replacing b-tree indexes with bitmaps, or a little partitioning, but it’s complexified by the presence of the ”Datawarehouse Administration Console” (DAC), an apparantly essential component of the architecture that is responsible for executing the 400+ Informatica workflows that load the staging and warehouse tables. The philosophy around it seems to be that anything platform specific, such as dropping and creating indexes, is defined and controlled through DAC rather than through Informatica (where it is generally a little kludgy, evn if it is serviceable). So we have some exploration to do on that product to see how it handles customisation of indexes (maybe it recreates them the way it found them, but I suspect some customisation is required to make it deal with creating “exotics” such as local bitmap indexes).

We go into integration testing this coming week, with performance testing to follow. Should be an interesting ride.

Posted in Data Warehousing, Materialized Views, Oracle, Parallelism, Partitioning, Performance | 13 Comments »