Optimizing Materialized Views Part I: Principles For Analysis

Some Background

For a variety of reasons that are too painful to relate, query rewrite against some largish fact tables has been unsuccessful in the past, and manual builds of summary tables have been combined with the aggregate awareness of Business Objects to produce the same effect.

The major drawback to this was the extraordinary number of hoops that had to be jumped through to make a change. The creation of a new summary table required coordinated efforts by a database designer, a DBA, a Business Objects designer and an ETL designer. As we all know the difficulty of coordinating such effort is related to something like the 7th power* of the number of people involved, so in other words by the time you have four people involved the job will take forever to get done.

In other other-words, new summary tables were never added.

However now that we have an apparantly robust and automatic method for maintaining the integrity of the materialized view refresh process and the resultant query rewrite ability, we find ourselves with a different choke-point on the creation of new materialized view-based summaries: the duration of the daily fast refresh process that accompanies the arrival of new fact table data. So here is a thought experiment that I'm going through at the moment.

"What techniques do we have for optimizing the refresh of materialized views?"

For my own selfish reasons I'm thinking about conventional fast refresh of materialized views based on a single table. Any resemblance between this scenario and others is purely coincidental, but may exist all the same.

The Scenario

So let us examine a scenario where we have multiple fast-refreshable materialized views based on a single table. Personally when thinking about fast-refreshable MV's I prefer to consider that they are based on a materialized view log, not the table itself. It is fairly trivial to demonstrate that a fast refresh of a materialized view is based entirely on the MV log without the requirement of access to the master table — I did it for Oracle Support once by creating a fast refresh-on demand MV, and taking the tablespace with the master table offline before successfully refreshing the MV.

The MV log table serves as a change-capture mechanism for its master table Some of them represent conventional summaries of the fact table. Some of them represent lists of distinct values of particular attribute columns in the table (such as Date_Of_Day, for example) in order to support fast retrieval of such values for the user interface. Some of these materialized views could logically be based on others, for example a list of Date_Of_Day values could be based on a summary table that stored metrics at a dimensionality of (Date_Of_Day, Location_Cd, Product_Cd).

I suppose that we had also best define what we mean by "minimizing work". Again for my own selfish reasons I'm talking here about minimizing disk reads and writes, and in making those that occur as efficient as possible.

The issue here is in the multiple scans of a materialized view log that are required in order to support the maintenance of materialized views. If you have seven materialized views based directly on a fact table then I don't see that Oracle can avoid seven consecutive scans of the log in order to maintain them.

Reducing The Workoad

In approaching this problem I'm dividing the approaches up into a number of categories.

  1. More efficient modifications to the materialized view log.
  2. More efficient reads of the materialized view log.
  3. Fewer reads of the materialized view log.
  4. More efficient update of the materialized views.
  5. Fewer materialized views.
  6. Erm …
  7. That's it.

Now it seems to me that any one of these categories with the possible exception of the last two holds promise for more efficient MV maintenance — when they are combined together we ought to be able to do something worthwhile.

Methodology

I think that I can say without any fear of contradiction that if you want to make a process more efficient then you have to start by fully understanding it, and to my knowledge there are few tools more powerful for rummaging in the internals of an obscure Oracle process than a SQL trace.

So here's the plan.

  1. Set up a representative scenario involving conventional fast refresh of multiple materialized views against a single master table, in the form of a SQL script.
  2. Execute the script with SQL tracing turned on.
  3. Extract from the trace file any statement that touches the materialized view log or the materialized views.
  4. Look for methodologies for optimizing each statement.
  5. Look for opportunities for reducing the number of statements executed.
  6. Measure.
  7. Change.
  8. Re-execute
  9. Re-measure.
  10. Compare benefits
  11. Et cetera

Well, as I mentioned previously these blogs have a habit of growing out-of-control, and personally I have the attention span of a goldfish so I'm going to split this up into a number of blogs. Hopefully this will also give anyone who spots an error or bad assumption the chance to leap in before I plow on down the Path of Wrongness.

So in the next blog I'll set up the representative scenario, run a SQL trace on it, and carve out the most tender parts of the internals for our later consumption.

continued here

* This rule is based on a mixed bag of Democrat, Republican, and Anarcho-syndicalist technicians with varying number of children, wives, previous work commitments and deadlines. Your mileage may vary.

Advertisements

6 thoughts on “Optimizing Materialized Views Part I: Principles For Analysis

  1. Nobody talking to about this, David?

    Which version of Oracle will you be using for your tests?
    I know that fast refresh of partitioned MVs worked differently in Oracle 9i to 10g. A case of delete in 9 against truncate partition in 10 ! – guess which was slow?
    I am looking at how fast refresh works with dimensional changes with a ‘report as now’ DW model – I’ll blog something when I get a round tuit (as my engineer friends say)

  2. I think we’re paddling a quiet backwater of the databasing world here Pete.

    That’s an interesting thing about the fast refresh … are you refering to PCT fast refresh there?

  3. Forgot the other question … I think it’ll be 10.1 — my 10.1 database has the Sales History “SH” sample schema on it, and I think I’ll dabble with using that.

    Because I’m concentrating on conventional fast refresh at the moment there ought to be fewer grounds for inter-version variation, and I’m anticipating that most of the techniques will be transferrable to other versions.

  4. Oh how easy it is to become an expert on things nobody else knows about!

    Yes, I was talking about PCT on fact summary MV.

    I guess you should look at two types of refresh – the ones where a load of new fact is added (daily batch) and the ones where a dimension key changes for some rows (hierarchy moves)

  5. Well I could look at those, but at the moment I’m just looking at the more simple case of an MV based on a single master (fact) table. That’s not to say I can’t extend the case later of course or that MV’s on multiple masters are a bad thing, but again I’m invoking the “selfish blogger” clause that allows me to just write about what I need to know right know :)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s