A List: Ten Reasons Not To Use An External ETL Tool

It has been a while since we had a nice list, so here is one on a subject I rarely touch — Extraction, Transformation and Load technology. Continue reading


A List: Moving Data Between Databases

A simple one, this. But there's a lot of ways out there.

  1. Spool to Flat File, Load with SQL*Loader or External Tables: SQL*Plus does a reasonable job of unloading. Tom Kyte has a C utility for unloading data also.
  2. SQL*Plus COPY command: Sadly deprecated, and without support for all data types, but a handy tool for basic data transfers. Allows arbitrary SQL on the data selection.
  3. Database Links: Direct movement from one database to another. Aside from the network latency this is basically the same as reading data from your local database. If you combine database links and materialized views, then you can replicate remote data to your local database through the databae link.
  4. DBMS_PIPE: I don't recall reading of anyone using this, but I expect that it's a theoretical possibility to send data from one instance to another through a pipe. scratch this one: see TK's comment
  5. Export/Import: Sometimes mistaken for part of a backup and recovery strategy!
  6. Transportable Tablespaces: Closely related to Export/Import, allowing movement of entire tablespaces without the need for unloading and reloading data.
  7. Data Pump: Export/Import for the new millenium :) and with many new features.
  8. Oracle Streams: Powerful functionality here. Non-trivial, but probably the modern tool of choice for the regular propagation of data around your corporate network.

Finally, here is an introduction to the sharing of information between database systems.

A List: Things That Materialized Views Can Be (Philisophically) Similar To

When is a materialized view like a … ?:

  1. Constraint: When you use it to enforce a multirow constraint, such as "SUM(allocation_pct) per Site = 100", which is not supported through regular constraints. Better then triggers because MV's refresh on commit, not on DDLoperations themselves.
  2. Index: When you use it to speed a full scan of a subset of a table's columns .
  3. Partition: When a materialized view selects a subset of a table's rows, thus making the subset accessible through multiblock reads instead of single block index-based access, and without requiring the whole of the original table to be scanned.
  4. New Set Of Unbelievably Fast Disks: When a materialized view with query rewrite allows a large data set to be pre-aggregated and subsequently queried in almost no time at all.
  5. Trigger: When you stop trying to enforce multirow constraints with triggers and start using materialized views instead.
  6. Synonym: When it allows you to give an alternative name to a table.
  7. Time Machine: When a stale materialized view shows you the previous state of a table's data. A materliazed view log then gives you an audit trail of sorts for changes since the last refresh*

1, 3 and 4 seem to be the most interesting ones there. I don't think I'd use 2, unless it was on a single column with nulls, and a bitmap index was not appropriate, and the query had to consider nulls. 6 seems like a real stretch.

* Thanks Gary

A List: Ways to Scan a Table Faster

So you need to read an entire table? Here’s some features and techniques that may help.

Employ Parallelism: Ah, the sweetest technique of all. Brute force. If you have the resources, then let Oracle divide the table into chunks and use all your system resources to scan it. Unfriendly in an OLTP environment, mind you.
Compress Data Segments: Make the table smaller! Block level compression can shrink that data like a plunge into cold water. Very OLTP unfriendly.
Reduce Free Space Percent: PCTFREE = very small => more rows per block => smaller table. And potentially a higher chance of row migration, of course.
Increase Percent Used: PCTUSED = large => less likely that blocks retain free space following deletes => more rows per block => smaller table.
Use a Larger Block Size: For significantly long rows you may get reduced empty space, thus a smaller table to scan.
Reorder The Columns: If you are commonly interested in just a subset of columns (for example metrics in a fact table) then consider making them the first columns of the table definition – for tables with lots of columns there is measurable overhead in finding the end columns (I’m not talking about row chaining here). Hmmm, maybe I’ll post something else about this.
Index Columns of Interest: An index can be treated as a skinny table, and your query might be satisfied by a fast ful or full index scan. The usual comments about NULL values in indexes apply here. Don’t neglect consideraton of index size either – index key compression and use of bitmap indexes provide smaller structures to scan.
Materialized Views: Is there anything they’re not good for? This could be a genuine skinny table, or an aggregation of selected columns.
Ensure Table Is Spread Over Available Devices: With consequent reduced likelihood of encountering an i/o choke point.

There is not a single feature listed here that carries with it no disadvantages, and depending on your circumstances the severity of the disadvantages may range from the insignificant to the devastating.

Do not implement any of these without understanding how the feature works, the mechanism by which it provides the advantage, how it brings disadvantages, and how these all interact in your particular situation.

A List: SQL Features You’ve Probably Never Used In Production

By popular demand (one anonymous person — hey, it's all relative), another list.

OK, you may have played with them, found a hypothetical circumstance for them, but you've probably never found a real situation in which to use them, or if you have you've probably backed out at the last minute. Or the DBA won't let you use them. Or you found there was a bug that spoiled it.

More likely you will not even recognize them — unless you're one of those losers who enjoys browsing documentation of course. * ahem *

  • The Subquery Factoring Clause: OK, a bit of a softball to start with. I bet that someone has used this. Not first time, but as part of a tuning exercise. And to show off.
  • The Partitioning Clause of a Query Table Expression: Actually hardcoding a partition name in a query? Wellll, OK maybe.
  • The Model Clause: This is great! What's it for?
  • Multiset Operators: Or nested tables in general
  • Submultiset: OK, that's cheating. Let's just say "nested tables"
  • The CORR_K() Function: Oh wait, there was that time when … nah not really. Never used it.
  • PowerMultiset_By_Cardinality: "…takes as input a nested table and a cardinality and returns a nested table of nested tables containing all nonempty subsets (called submultisets) of the nested table of the specified cardinality". * wipes-eyes * OK seriously, that's the last time I'm going to mention nested tables. Or nested tables of nested tables.
  • Alter Sequence: Admit it … you thought you were going to be able to change the current value with it, but then found you just had to drop the sequence and recreate it, so that's what you did.
  • Associate/Dissociate Statistics: What?

Well, I pride myself on the high education level of my readership, and also it's high regard for risk-taking. If you have actually used one of these features in a production system, and it has actually made it past the DBA/watchdog, then please reply under the heading "I have an unusual affinity for Shiny New Things, and have successfully used the following features in a production system:"

A List: Accessing Part Of A Table Quickly

I like lists. In particular, I like to make little lists of Oracle features that have some commonality in the way they work, or in what they do.

Today's list is Oracle Features For Accessing Part Of A Table Quickly. Quicker than scanning the whole table anyway, and assuming that the table is not of trivial size. Here's what I have so far:

  1. Creating an Index: obviously.
  2. Making the Table Index-Organized: Is that too similar to Option 1 to justify a new heading, do you think?
  3. Placing The Table In A Cluster: Hash or indexed.
  4. Partitioning The Table: Partition pruning being the key feature here, or through using the PARTITION or SUBPARTITION clauses of the query table expression.
  5. Creating a Materialized View: By precomputing the result set for a query that requires only part of the full table. Query rewrite optional, I suppose.

Off the top of my head, that's all I can think of right now. Other suggestions welcome.