Here's a question: when deciding whether to use a date datatype as a key value, why does there seem to be some consensus that it is OK to do so, even in situations where the same person would not consider using a country code, for example? Now I'm not talking about whether people should use a date datatype or should instead represent a date as a character string or number, because for me that is not an issue. If you want to store a date then you use a date datatype.
But how about a situation where you are using a date value as a key, for example in a data warehouse fact table? Is it "OK" to use the date, or should you be using a synthetic key? This has always been a bit of a philosophical conundrum for me, because I always think, "Yeah, dates are OK as key values — no need for a synthetic there" but have not put my finger on the reason why.
But recent discussions have firmed up some thoughts on the "true nature" of a synthetic key, and I had a little epiphany on the subject in response to a question by Pando at DBASupport.com.
I started with two points:
- The difference between a natural key and a synthetic key is that a natural key has some meaning to the naked eye in isolation, and therefore can be exposed directly to the user and "generated" by users. However a synthetic key's meaning is only revealed through translation to a natural value.
- If you change the value of a natural key, you are generally not changing what that key value represents, eg. "Finland", you are just changing it's means of representation — ie. some visible code value.
Now what is a date value in Oracle? It is merely some internal representation that correlates one-to-one with that particular date. Let me float the idea that a date column is actually a synthetic value, but appears to be natural because Oracle uses one of a great many possible conversions to represent it as such. So if your boss came in one morning and said, "Bad news lads, the business is changing the standard date format to be Julian", or you wanted to suddenly start representing month names in … um … Urdu, then just as in the case where a country changes its code we're not saying that the date itself has changed, only that its "human-friendly" representation has. You can do this pretty simply through changing NLS settings — if you have quality code that always uses a date format to say such things as:
Where date_of_transaction = To_Date('01-02-2005','DD-MM-YYYY') *
… then you generally don't even have to change that.
So here's my proposition: date-based keys are really synthetic, and that is why representing them with a regular sequence-generated synthetic value makes no sense — it's the representation of them, with year-month-day or whatever, that is natural, and Oracle is effectively maintaining an internal and very flexible two-way translation table for you, accessed through the to_date() and to_char() functions.
Now one difference between dates as synthetic values and other synthetics is that every Oracle database is using the same internal representation for the same date — in a sense the translation table is global withing the Oracle world. So when moving these raw internal values between different Oracle databases (through tablespace transport, for example) there is not the usual trouble with synthetics of needing a translation to the local system of representation. On the other hand, I'd be willing to lay a couple of $'s on the line that converting from Oracle to other RDBMS's does indeed require a translation of the internal value, possibly by converting to and from a natural representation.
* There was a formatting error in the original post that dropped the RHS of this predicate.