Interviewing: The difference between Delete and Truncate in Oracle

I interviewed a couple of senior ETL developer candidates last week, so I shook the dust off of some of my tried-and-true probes for technical knowledge.

I’ve always had a liking for asking for the differences between “truncate” and “delete” ever since I was asked it myself some years ago. I think it allows candidates to demonstrate a nice range of knowledge.

Here are some answers in no particular order.

  1. “Truncate is fast, delete is slow”. Hmmm, well that’s mostly true I guess. The most basic or all answers.
  2. “Truncate is DDL, delete is DML”. Is this just so obvious that people don’t mention it?
  3. “You can’t rollback a truncate”. Indeed. See 2 above, of course. And 1.
  4. “Truncate is implicitly a commit”. A better answer than 3 I think. I seem to recall that someone once mentioned that there are actually two commits in a truncate, one before and one after … but maybe I dreamed that. I should test it really.
  5. “You can’t grant permission to truncate a table”. Ah, practical experience shines through. If you don’t like your ETL process to connect as the owner of the schema then this is a challenge that has to be overcome with stored procedures or something sophisticated like that. You really don’t want to grant “DROP ANY TABLE” to your ETL user.
  6. “You can delete any subset of rows, but you can only truncate the complete table, or a partition or subpartition of it”. Is this also so obvious that nobody mentions it?
  7. “Truncate makes unusable indexes usable again”. A real gotcha for the unwary. If you attempt to optimise a data load by rendering indexes unusable and truncating a table (possibly followed by an index rebuild and a partition exchange) then be careful of the order.
  8. “Truncate can’t maintain foreign keys”. It’s “cascading delete”, not “cascading truncate”. That would be an interesting feature though — point 4 above would make it a little trickier. Truncating an index cluster is pretty close to a “cascading truncate” to a limited extent though. In any case no truncate is permitted on a table referenced by foreign keys.
  9. “You can’t flashback a truncate”. This is an oddity to me. We can flashback a “drop table”, rollback uncommited deletes, or use flashback to recover pre-commit deleted data, but a truncate is a barrier across which we cannot flashback.
  10. “Truncate deallocates space, delete doesn’t”. Unless you want it not to, using the “reuse storage” clause. However the high water mark is reset in either case so maybe that’s a better answer …
  11. “Truncate resets the high water mark, delete doesn’t”. And on the indexes, also. See 7 above also.
  12. “Truncate allows special treatment of materialized view logs”. More of that practical experience and/or documentation-reading skill shows through.
  13. “DML triggers do not fire on a truncate”. Because … um … it’s DDL not DML.

I suppose there are other issues as well, but this shows that it’s a complex business.  I really wish we could grant “Truncate table” to a user or role … enhancement request time I think.

Isn’t it interesting that in some circumstances only one method can be used to remove all rows from a table in a single step, sometimes both methods can. In other circumstances both methods can remove a subset of rows from a table, and sometimes both can.

Oh well, back to work.

Back to a More Simple Time

If we browse the Oracle Data Warehousing Guide we find a number of interesting database structures and features that can make our warehouses easier to load, faster to query, more simple to manage etc.. Partitioning, parallelsim, bitmap indexes, data segment compression, materialized views and query rewrite, for example. It has been a long, long time since I implemented anything of any size without at least two or three of these.

So let us imagine, gentle reader, a return to the days of simplicity when we had none of these — how would that look? How would it perform? Well it turns out that it would look a lot like the data warehouse that comes out of the box with Siebel Marketing Analytics. No partitioning? Check! Only the most ordinary, uncompressed b-tree indexes? Check! Etc.. You get the idea. And how will it perform? Our lucky band of adventurers is about to find out, as we implement it with the bare minimum of modifications to add some custom data elements from our Siebel CRM system.

Some current areas of concern:

  • A fact table with 50 indexes that in development consume 1.5Gb of space for every 1Gb of fact table data. How is the drop-all-indexes-and-rebuild-after-every-load strategy going to turn out with production sized data?
  • The use of updates against the largest fact table. No data segment compression!
  • No foreign keys and bitmap indexes. Do star transformations even work with that?

 On the plus side we do have a performance testing environment sized at around 80% of production data volumes, so we will have an opportunity to tweak the worst offenders. We do have parallelsim and the licensing for everything that we moght want to throw at the system, and we do have Quest Performance Analysis for Oracle already prepped on the test and production instances.

In preparation for performance testing I have a little mental checklist of what can be implemented quickly and quietly to tackle certain problems, such as replacing b-tree indexes with bitmaps, or a little partitioning, but it’s complexified by the presence of the “Datawarehouse Administration Console” (DAC), an apparantly essential component of the architecture that is responsible for executing the 400+ Informatica workflows that load the staging and warehouse tables. The philosophy around it seems to be that anything platform specific, such as dropping and creating indexes, is defined and controlled through DAC rather than through Informatica (where it is generally a little kludgy, evn if it is serviceable). So we have some exploration to do on that product to see how it handles customisation of indexes (maybe it recreates them the way it found them, but I suspect some customisation is required to make it deal with creating “exotics” such as local bitmap indexes).

We go into integration testing this coming week, with performance testing to follow. Should be an interesting ride.

Thoughts on Range-Range Composite Partitioning

As of 10gR2 there is No Such Feature as Range/Range composite partitioning, of course. Composite partitioning is limited to Range/Hash or Range/List, only the latter of which have I made much use of — which is not to say that Range/Hash does not have its place, just that its place and my place have not yet intersected.

However the Word On The Street is that 11g comes with Range/Range included, and that prompted some thoughts because I’ve been making much use of multicolumn range partitioning, so my thought is this: if we already have the ability to use multicolumn range partitioning then what is the benefit of Range-Range composite partitioning? Continue reading

Data Warehouse Architecture Decisions for Ad hoc and Predefined Reports

First, a disclaimer: the techniques herein represent “blue-sky” thought on an abstract problem. I have not implemented these ideas, although I am very willing to do so given the chance and if nobody can postulate a fatal objection to them. Or a near-fatal one. If you disagree with any of these ideas then you are not contradicting my experience and I’m not going to get all defensive over the matter.

And so to the substance … Continue reading

More on Partition Key Statistics, and an Oddly Harmful Constraint

As a followup to my previous post on the importance of gathering partition-level statistics on partition key columns it occured to me that if a table is list partitioned with a single value per partition then Oracle theoretically does not need statistics on that column to infer that only a single value is present. Continue reading

Partition Pruning Prevented by Functions on Predicates

A rather straightforward issue here, prompted by a question on the forum at DBA Support.

When you place a predicate on a function of a partition key column then the optimizer is prevented from pruning unless there is a check constraint to indictate logical equivalence between the column and the function of the column. Continue reading

Data Type of Partition Key Column Cannot Be Inferred Through Foreign Key Reference

I remember coming across this issue a few years ago and as I just hit it again I thought I’d “pass the note round the class”.

When you create a table that has foreign keys then a nicely robust way of specifying the data type for the foreign key columns is by allowing Oracle to infer it. Try the following script … Continue reading

Partition Pruning and Bind Variables

Prompted by a question at the Dizwell Forum, here is a script to demonstrate that using bind variables in partition key predicates causes Oracle to use global (table) statistics instead of partition (or subpartition) statistics.

Script

Result on 9.2.0.4

This is similar behaviour to Oracle’s use of global statistics in other circumstances, such as when joining on a partition key to a smaller table on which a predicate is placed. Anyway, see the forum post for other comments :D

Histograms For Join Predicates (or “Hooray for technical content!”)

Someone sent me a scenario the other day similar to this:

A small dimension table, for example of US State names, has a synthetic primary key (say STATE#) and a unique key on the real-world value (STATE_NAME), and a much larger fact-type table has an indexed foreign key to this lookup table. The distribution of values in the fact table is very skewed. Is it possible for the optimizer to take into account the skew of STATE# in the fact table when it is joined to the dimension table and a predicate is placed on STATE_NAME?

Well, a tricky issue. Continue reading