One of the bedrock concepts in dimensional modelling is that of “Slowly Changing Dimensions”, which are structures and loading techniques that allow the system to account for time-variant relationships between attributes in a hierarchy. For example, tracking changes in the address of a customer as his wife’s employer forces them to move from Colorado Springs to Arlington VA. Ahem.
There are three commonly recognised types:
- Type 1: Dishonestly pretend that he always lived in Arlington by overwriting the “Colorado Springs” attribute in his customer record.
- Type 2: Recognise that this is now the third time that said customer has had to move by preserving records of his previous residences in London, Dayton OH, Colorado Springs and Arlington. A different customer line is created in the dimension table for each residence.
- Type 3: Hide part of the truth by acknowleding his previous residence in Colorado Springs as a different attribute from the current address, but not acknowledging previous inconveniences.
This characteristic is not applicable solely at the dimension level. Individual attributes within a dimension can have different SCD types, so you might choose to make “Marital Status” a Type 1 attribute and “City of Residence” a Type 3.
While SCD Type 2 is the one that preserves the highest level of historical accuracy it has one failing unique among the three types — retrieval of complete set of historical facts history for customers currently resident in Arlington requires a complex query structure and generally leads to poor performance.
This is a situation handled by a hybrid type of SCD that I have been using for the past eight years, and which I was very happy to find documented by Margy Ross and Ralph Kimball in a typically useful article at Intelligent Enterprise Magazine. This hybrid technique, which you will find in that article under the heading Mini Dimension with “Current” Overwrite, breaks out rapidly changing attributes into a new dimension in which we identify the set of attribute values in effect at the time that the fact event took place (sales transaction date, for example).
So in the case of our inconvenienced customer we would store the current “City of Residence” in the customer dimension table and create another “Customer City of Residence” dimension joined as a new attribute directly to the fact table. The latter dimension would indicate the city of residence for the customer at the time that the event (eg. a sales transaction) took place.
I’ve used this in the context of preserving the history of who is responsible for managing items (and for various other changeable attributes of the 200,000-ish items catalogued), and the “Current Manager” attribute of the item dimension and the independent “Historical Manager” dimension are aimed at solving different business problems.
In one case an item manager can view the sales (or inventory, or backorder etc) history of the items that they currently manage. This is an important tool for forecasting future trends by extrapolating from historical values, and also tends to support tactical analysis.
In the other case the sales and inventory valuation can be historically assigned to cost centers to which the item managers belong — more of a strategic organisation-level view.
The latter tends to be a more performant report, interestingly enough, because the lower cardinality of the foreign key column in the fact table (“item manager”, rather than “item”) makes precalculated result sets (materialized views, if you like that sort of thing) more compact and thus more likely to be implemented, to be honest.