Someone sent me a scenario the other day similar to this:
A small dimension table, for example of US State names, has a synthetic primary key (say STATE#) and a unique key on the real-world value (STATE_NAME), and a much larger fact-type table has an indexed foreign key to this lookup table. The distribution of values in the fact table is very skewed. Is it possible for the optimizer to take into account the skew of STATE# in the fact table when it is joined to the dimension table and a predicate is placed on STATE_NAME?
Well, a tricky issue. Oracle has to resolve the predicate on dimension.STATE_NAME into a predicate on dimension.STATE# and thence into a predicate on fact.STATE#. The optimizer will do a very similar, and very effective, operation that I’ve written about a couple of times before through dynamic partition pruning based on joins, so my first reaction was to suggest implementing a list partitoning, or list subpartitioning, scheme on the table. For me that would be the gold standard of approaches, but it might not be possible, or might be not completely effective, if the organisation doesn’t want to pay for the partitioning option, or the logical limits of composite partitioning have already been reached, or if a table rebuild is too large a hammer to take to this nut.
A related thought was that if the partitioning option was in place then a summary table using a partitioning scheme more sympathetic to this aim could be implemented. For sure it’s a big step to take, especially if it means duplicating the entire data set of the current fact table, but it would extend the range of queries that can benefit from partition pruning very handily. The summary table could of course include the join to the dimension table and/or aggregate the result set, giving further benefits.
Another thought that occured was to create a bitmap join index, effectively allowing predicates to be placed on a hidden column of fact.STATE_NAME. I ran a couple of tests on 10.2.0.2 and it appeared to be ineffective in respect of histogram recognition. The estimated cardinality was always 1% of the total number of rows in the (non-partitioned) sample fact table even with hidden columns analyzed. Maybe I missed something there, and if I did then please enlighten me as it’s a pretty easy implementation.
Here’s a funky technique that actually did work — create a function to accept STATE# and return STATE_NAME, either through querying the dimension table or through embedding the logic in a PL/SQL CASE statement. Once indexed columns had been analyzed a predicate placed on this function (instead of the dimension table) did result in correct estimation of cardinality. The list of objections to this technique is pretty broad, including the possibility of returning the wrong result once the dimension table changes and a few other issues, but whether the risk is worth adopting or not is up to the user. One might choose to use DBMS_Advanced_Rewrite in 10g+ to have predicates against dimension.STATE_NAME rewritten to the hidden column, but the feasibility of that was not part of the test I ran/
There is one last suggestion of course — rebuild the table to use natural keys instead of synthetic keys. This would yield the same problem if the lookup table contained a commonly predicated additional level in the hierarchy, (eg. CONUS_Y_N, TOO_HUMID_IN_SUMMER or HAS_HELMET_LAW) and we’d be back to “square uno” once more.
To sum up, here are the solutions I thought of:
- Partition or subpartition the fact table on STATE#. (preferred option)
- Create a summary table with partitioning or subpartitoning on STATE#. (uses most space and slows data load, but very flexible and powerful)
- Create a function-based index on fact to perform the lookup, and query that value instead. (a bit flaky, but it works without major system impact)
- Rebuild the fact table based on the STATE_NAME instead. (still limited in multi-level hierarchies)
All other suggestions gratefully accepted. I’m really puzzled by not be able to achieve this with a bitmap join index, I must say. I’m sure I must have missed something simple