Archive for the ‘Materialized Views’ Category
Posted by David Aldridge on 2008-03-17
I interviewed a couple of senior ETL developer candidates last week, so I shook the dust off of some of my tried-and-true probes for technical knowledge.
I’ve always had a liking for asking for the differences between “truncate” and “delete” ever since I was asked it myself some years ago. I think it allows candidates to demonstrate a nice range of knowledge.
Here are some answers in no particular order.
-
“Truncate is fast, delete is slow”. Hmmm, well that’s mostly true I guess. The most basic or all answers.
-
“Truncate is DDL, delete is DML”. Is this just so obvious that people don’t mention it?
-
“You can’t rollback a truncate”. Indeed. See 2 above, of course. And 1.
-
“Truncate is implicitly a commit”. A better answer than 3 I think. I seem to recall that someone once mentioned that there are actually two commits in a truncate, one before and one after … but maybe I dreamed that. I should test it really.
-
“You can’t grant permission to truncate a table”. Ah, practical experience shines through. If you don’t like your ETL process to connect as the owner of the schema then this is a challenge that has to be overcome with stored procedures or something sophisticated like that. You really don’t want to grant “DROP ANY TABLE” to your ETL user.
-
“You can delete any subset of rows, but you can only truncate the complete table, or a partition or subpartition of it”. Is this also so obvious that nobody mentions it?
-
“Truncate makes unusable indexes usable again”. A real gotcha for the unwary. If you attempt to optimise a data load by rendering indexes unusable and truncating a table (possibly followed by an index rebuild and a partition exchange) then be careful of the order.
-
“Truncate can’t maintain foreign keys”. It’s “cascading delete”, not “cascading truncate”. That would be an interesting feature though — point 4 above would make it a little trickier. Truncating an index cluster is pretty close to a “cascading truncate” to a limited extent though. In any case no truncate is permitted on a table referenced by foreign keys.
-
“You can’t flashback a truncate”. This is an oddity to me. We can flashback a “drop table”, rollback uncommited deletes, or use flashback to recover pre-commit deleted data, but a truncate is a barrier across which we cannot flashback.
-
“Truncate deallocates space, delete doesn’t”. Unless you want it not to, using the “reuse storage” clause. However the high water mark is reset in either case so maybe that’s a better answer …
-
“Truncate resets the high water mark, delete doesn’t”. And on the indexes, also. See 7 above also.
-
“Truncate allows special treatment of materialized view logs”. More of that practical experience and/or documentation-reading skill shows through.
-
“DML triggers do not fire on a truncate”. Because … um … it’s DDL not DML.
I suppose there are other issues as well, but this shows that it’s a complex business. I really wish we could grant “Truncate table” to a user or role … enhancement request time I think.
Isn’t it interesting that in some circumstances only one method can be used to remove all rows from a table in a single step, sometimes both methods can. In other circumstances both methods can remove a subset of rows from a table, and sometimes both can.
Oh well, back to work.
Posted in Data Warehousing, Materialized Views, Oracle, Partitioning, Performance | 22 Comments »
Posted by David Aldridge on 2007-09-09
If we browse the Oracle Data Warehousing Guide we find a number of interesting database structures and features that can make our warehouses easier to load, faster to query, more simple to manage etc.. Partitioning, parallelsim, bitmap indexes, data segment compression, materialized views and query rewrite, for example. It has been a long, long time since I implemented anything of any size without at least two or three of these.
So let us imagine, gentle reader, a return to the days of simplicity when we had none of these — how would that look? How would it perform? Well it turns out that it would look a lot like the data warehouse that comes out of the box with Siebel Marketing Analytics. No partitioning? Check! Only the most ordinary, uncompressed b-tree indexes? Check! Etc.. You get the idea. And how will it perform? Our lucky band of adventurers is about to find out, as we implement it with the bare minimum of modifications to add some custom data elements from our Siebel CRM system.
Some current areas of concern:
- A fact table with 50 indexes that in development consume 1.5Gb of space for every 1Gb of fact table data. How is the drop-all-indexes-and-rebuild-after-every-load strategy going to turn out with production sized data?
- The use of updates against the largest fact table. No data segment compression!
- No foreign keys and bitmap indexes. Do star transformations even work with that?
On the plus side we do have a performance testing environment sized at around 80% of production data volumes, so we will have an opportunity to tweak the worst offenders. We do have parallelsim and the licensing for everything that we moght want to throw at the system, and we do have Quest Performance Analysis for Oracle already prepped on the test and production instances.
In preparation for performance testing I have a little mental checklist of what can be implemented quickly and quietly to tackle certain problems, such as replacing b-tree indexes with bitmaps, or a little partitioning, but it’s complexified by the presence of the ”Datawarehouse Administration Console” (DAC), an apparantly essential component of the architecture that is responsible for executing the 400+ Informatica workflows that load the staging and warehouse tables. The philosophy around it seems to be that anything platform specific, such as dropping and creating indexes, is defined and controlled through DAC rather than through Informatica (where it is generally a little kludgy, evn if it is serviceable). So we have some exploration to do on that product to see how it handles customisation of indexes (maybe it recreates them the way it found them, but I suspect some customisation is required to make it deal with creating “exotics” such as local bitmap indexes).
We go into integration testing this coming week, with performance testing to follow. Should be an interesting ride.
Posted in Data Warehousing, Materialized Views, Oracle, Parallelism, Partitioning, Performance | 13 Comments »
Posted by David Aldridge on 2007-01-03
First, a disclaimer: the techniques herein represent “blue-sky” thought on an abstract problem. I have not implemented these ideas, although I am very willing to do so given the chance and if nobody can postulate a fatal objection to them. Or a near-fatal one. If you disagree with any of these ideas then you are not contradicting my experience and I’m not going to get all defensive over the matter.
And so to the substance … Read the rest of this entry »
Posted in Business Objects, Data Warehousing, Materialized Views, Oracle, Partitioning, Performance | 2 Comments »
Posted by David Aldridge on 2006-09-25
I submitted an abstract for a presentation at the Rocky Mountain OUG Training Days 2007, on the subject of materialized views, summaries, fast refresh and whatnot. Obviously a lot of the substance will come from the material I’ve been posting here for the past year-and-a-bit, emphasising conceptual understanding, analysing the internals of the refresh process, a caveat here and there, the ongoing search for optimisation techniques, and with other material to tie it together as a coherent topic.
Since this is aimed at all levels of experience, from “what is a materialized view” through to “Ah, but in the case of partitioning of a HOLAP-based MV …” what issues would you like to see covered in the talk?
Here is what I have so far, in no particular order …
- When and what to summarise.
- Components of a materialized view (table, metadata etc).
- Query rewrite.
- Complete and fast refresh.
- Fast refresh following conventional path, direct path, and partition DDL operations.
- Propagating changes to the MV data.
- Performance issues.
- Alternatives to MV use and tuning manual refresh of summaries.
- DBMS_Advanced_Rewrite
- Caveats (32-column limitation on GROUP BY, ORA-12034, PCT and prebuilt tables etc)
A lot of ground to cover, but I’ve optimistically applied for a 90 minute slot — I think I could manage double that at a pinch!
What would you emphasise out of that, and what would you add?
Posted in Materialized Views | 10 Comments »
Posted by David Aldridge on 2006-08-28
pranavgovind hit this problem, as described on the DBA Support.com forum, wherein Oracle ’s attempt to index all the columns of an aggregating materialized view fails when more than 32 columns are included in the GROUP BY, and the error “ORA-01793: maximum number of index columns is 32″ is raised.
I added my own thoughts there, but this is symptomatic of some more general failings in the area of materialized views in which the creation and refresh codes do not take as much advantage of chances to optimise as a human could.
For example, consider a materialized view built against a fact table joined to a date hierarchy, so as to define the MV query like this. In such a case only the date_of_month column needs to be indexed — the inclusion of any other columns represents a waste of resources. In fact in this case the index ought to be declared to be UNIQUE, whether or not the Sys_Op_Map_NonNull() function is used on the column in order to be compatible with Oracle’s recursive SQL for refreshing materialized views. However Oracle would wastefully index all four of the GROUP BY columns.
Posted in Materialized Views, Oracle, Performance | Leave a Comment »
Posted by David Aldridge on 2006-08-08
Here is a thought that is brought on by this posting at the Dizwell forum.
The question seem to me to boil down to this essential issue: “Can a large memory allocation compensate for poor I/O performance in a data warehouse?”. It also seems to me that the answer to this question is a partly-qualified “No”, for the following reasons. Read the rest of this entry »
Posted in Data Warehousing, Materialized Views, Parallelism, Performance | 13 Comments »
Posted by David Aldridge on 2006-05-18
Introduction
This is a basic topic for Oracle data warehousing beginners, based on some ideas that I’m hoping will stop buzzing round in my head if I commit them to virtual paper.
There are three Oracle features that provide a foundation for successful data warehousing:
- Partitioning
- Parallelism
- Psummary Tables (the “p” is silent)
Here are the benefits that they bring to the system. Read the rest of this entry »
Posted in Data Warehousing, Materialized Views, Parallelism, Partitioning, The Best of The Oracle Sponge | 14 Comments »
Posted by David Aldridge on 2006-04-12
I was asked in an email this morning about how to improve materialized view refresh performance, and it’s something that appears in Google hits for the blog quite frequently. So I thought I’d write a quick-and-dirty posting to summarise some of the research (and the resulting practical applications) that I’ve been fiddling with recently.
Here are some bullet points:
- The materialized view fast refresh mechanism is a one-size-fits-all solution, and is probably not efficient for 99% of summary table maintenance operations.
- The join of the aggregated change data to the MV is function-based, as the columns of both relations are wrapped in the Sys_Op_Map_NonNull() function that allows “null = null” joins. I think that it is extremely unlikely that anyone has nullable attribute columns in their fact or summary tables, so this (and the composite function-based index required to support it) are a waste of resources.
- Because of the nature of the join it seems to be extremely unlikely that partition pruning of the summary table could take place.
- The join mechanism promotes nested loop joins, where a hash join is probably more efficient (that’s technically an outer join in the merge, of course).
- The refresh mechanism assumes that a merge will be required, when sometimes an insert is not only possible but is very much more efficient.
If performance (and robustness, IMHO) are an issue for you then I would advise that you do the following:
- Use materialized views only for enabling query rewrite (which means creating them on a prebuilt summary table, and unless you are loading to the summary by partition exchange then you have to drop the MV, refresh the summary table, and recreate the MV). In 10g it is much more easy to use the DBMS_Advanced_Rewrite package instead of MV’s.
- Write your own refresh code, based on the usual principles of writing good SQL. If you don’t need a merge then don’t use it. If you don’t need to join to dimension tables to get higher attributes then don’t do it.
- Leverage different levels of aggregation to help produce higher levels. For a series of hierarchical summaries, multi-level aggregations can be extremely beneficial.
- Consider storing the refresh and MV definition SQL in CLOB columns of a summary management table, so they can be tuned and edited without needing to open up package code to do so.
- Consider using a complete refresh, either through MV’s or manually, for higher aggregation levels, particularly when you can reference another summary table to do so.
My practical experience of taking this path delivered a reduction in the refresh time of a set of seven materialized views based on a single fact table, from a couple of hours down to six minutes. The publication of this is an example of what my client’s technical lead calls “polishing the crown” :D
Anyway, this is all a sort-of abstract from a whitepaper that I’m working on right now, which will include all sorts of proofs and demonstrations of the above, plus some process and risk management advice, but it doesn’t look like it’ll be ready very soon. Feel free to comment or ask questions on the points raised above though — I don’t want anyone to be kept in suspenders ’till the paper is finished.
Posted in Materialized Views | 17 Comments »
Posted by David Aldridge on 2006-04-05
An article by Jonathan Lewis on constraint problems with materialized views set off a little bell in my head about a metalink document on the subject, Note 284101.1.
The note protests the use of unique, foreign key and primary key constraints on materialized views, and includes the following interesting phrases …
“It is not a good idea to create unique (unique indexes ) constraints , primary key or Foreign key constraints on the snapshot base table … There should not be any Unique/PK constraints at Mview base table … Drop the Unique/PK constriants on Mview Base Tables.”
Then one of the workarounds is “You can have Deferred Constraints“. The document doesn’t identify why having deferred constraints might be a bad idea, or why not having integrity constraints on the table at all is preferable to having them deferred .. I can’t think of a reason why that might be so, and I can think of several reasons why it might not be so (query optimization being the prime one).
All this misdirection and ambiguity, combined with the abbreviation of an error code from “ORA-00001″ to “ORA-1″ in the title of a help document (does that make sense to anyone?) make this a pretty poor piece of work. It’s not the Most Useless Metalink Article Ever, but it’s close.
By the way, if you read the title and the first few words of this article, and thought “Dave’s going to say that Jonathan Lewis gave bad advice!” before realising I wasn’t … the effect was intentional :D If you didn’t think that then my plan failed.
Posted in Materialized Views | 2 Comments »
Posted by David Aldridge on 2006-02-21
I wrote a while ago that I didn't think that Partition Change Tracking (PCT) fast refresh was worth using, because eventually something will cause the refresh to fail and then, for various silly reasons, you can only get PCT FR back by completely rebuilding your MV in one shot. This is not a practical endeavour in many cases, hence PCT is not worth so much.
However, it has now occured to me that if one kept a backup copy of the data in the PCT-based MV, even if it is only as a regular table, then in the event of disaster such a table could be brought up to dat with respect to the master tables(s) and with the correct MV definition placed over it temporarily the complete rebuild of the PCT-based MV could be rewritten to effectively be a full table scan of this backup. that ought to be more achievable I suppose.
It still seems like a lot of trouble though.
Posted in Materialized Views | 4 Comments »