Marketing of Silver Bullets

My morning news shot brings me product marketing for Fast Search & Transfer ASA’s “Adaptive Information Warehouse”. They allege that the product “… improves on the benefits of traditional data warehouses while cutting down on implementation time, cost and complexity“.

Maybe it does, but based on the corporate news release on which this article is apparantly based, it is difficult to see why. Here are some quotes, followed by my own comments.

“The Data Cleansing Solution tool allows customers to create a single master index of corporate data, regardless of where it is located: databases, business applications, content management systems, CRM software, intranets and the like”

How does it know which data is good and which is bad? Is JS Smith the same as John Smith? How does it know how often and what to read? How does it know what history to preserve?

Setting up an AIW system will take significantly less time than building a data warehouse, cost less and provide more agile access to corporate data, Sutija said. An AIW system can be set up in eight to 10 weeks, while a data warehouse can take 18 months or longer, he said.

On the other hand, a data warehouse might take TEN YEARS to set up! On the other other-hand, it might be set up by comptent practitioners in a bus architecture that will deliver results very much faster. Eight-Ten weeks versus 18 months … which of those numbers do you think represents best-case scenario and which represents worst case? Hmmm.

…AIW indexes data that is much more granular than is usually available in a data warehouse, where data is usually consolidated into weekly or monthly results, he added

On the other hand, the data warehouse might be set up competently to preserve the granularity of the system of record, as is recognised best practice.

Fast may have an easier time convincing fast-growing, medium-size companies that haven’t yet invested in data warehouses, Creese added, because the cost and complexity of building a data warehouse are well known.

Or because they are more prone to the spread of Fear, Uncertainty and Doubt, and have no in-house expertise to filter the reality from the marketing?

I wonder if they have a technology for simplifying the repackaging of corporate press releases into “news articles” as well?

Data Warehouse Architect Position Available — Dayton OH

As I was mentioning earlier, I am leaving my current client and moving on to new pastures. Apparantly I have been with them fulltime for five years, though when I started it was just for a quick two weeks to get them up to speed on some issues — time flies, it seems.

Here are the highlights of the requirements for my replacement:

  • In-depth knowledge and prior experience of Oracle Data Warehousing concepts and features — partitioning, parallelism, materialized views and query rewrite, bitmap indexing, data segment compression.
  • Strong SQL and PL/SQL development and tuning experience.
  • 9i and 10g experience.
  • Instance tuning for data warehouse environments.
  • Knowledge of ETL tools (Informatica preferred).
  • Knowledge of BI tools (Business Objects v6 and/or XI preferred).
  • Prior experience of dimensional and 3NF modeling.
  • Experience in supply chain management & accounting & finance preferred.
  • Incumbent will be responsible for full life-cycle development, from requirement analysis through schema design and strong input into ETL and BI configuration.
  • Must be U.S. citizen and be able to obtain a Department of Defense Security clearance.

Here is a link to apply for the job or to get more details, or you can email me directly at daaguard-jobs@yahoo.com.

Leaning Towards Application Express

I’ve been thinking recently about the intersection between Oracle’s Application Express technology and my own field of data warehousing. Not so much from the technical side, although I do see ApEx and a potential for knocking up a pretty reasonable and low cost executive dashboard against a data warehouse, but more strategically.

I’m reminded of reading stories about EDS, who got into deep waters when they won an enormous contract to manage the US Navy’s IT infrastructure, and who found after signing on the dotted line that the number of applications to be supported had been underestimated by an order of magnitude in the pre-contract work. I seem to recall 3,000 being the estimate and 30,000 being the actuality. Of course the vast majority of that are the little skunk-works applications that seem to sprout up like blades of grass in the cracks of the sidewalk as soon as the heavy tread of configuration management is falling elsewhere.

There must be thousands of companies like that though — little spreadsheets being used to hoard planning numbers, cost allocations, all the things that the enterprise applications couldn’t integrate in time. Integrating data from such desktop-based applications and spreadsheets into a data warehouse is an enormous effort on the ETL side, very prone to error and requiring much manual coordination. This seems to be an ideal breeding environment for ApEx applications. Once the data is tucked up snugly in the Oracle database then we can read it at our leisure, and the data volumes are almost always trivial. A few hundred or thousand rows, maybe. Oracle have released a beta of the Application Express Application Migration Workshop which seems promising. I don’t have anything to hand to test it with, though the way that it “absorbs” spreadsheets is very neat.

Well ApEx is something that I have a look at now and then, poke around the demonstration applications and maybe try some of the walk-throughs, but it has never clicked for me. It must be too intuitive or something, and I’m trying to out-think it. I’m glad to see from John Scott’s blog that he and Scott Spendolini have a book coming out on the topic. From the publisher’s description it looks like it goes into the conceptual stuff that I and at least one other have been struggling with so I’ll put that on my shopping list “toot sweet”. Maybe this will become one of the essential applications for future Oracle data warehousing environments.

Interviewing

Everyone seems to be doing it at the moment. Howard Rogers has a nice series about a search for a junior DBA, and … um … someone else was as well … can’t remember who though. So not really “everyone”, I suppose.

Moving along, I have been both interviewer and interviewee recently. I handed in my notice to my current client, not from any great need to do so but more to give myself a kick up the backside to take my own job search more seriously. It’s very easy to become disillusioned with the soul-destroying business of transmogrifying your prior experience into yet another company’s recruitment web-site, only for it to disappear after hitting “submit”, or to see that you are not in consideration for a position for which you think you’d be just right and to have no means of contacting the recruiters involved (Oy, Raytheon!). Continue reading

Links Updated

A few additions to note.

I’ve been a follower Ralph Kimball‘s articles and books for a few years now. Speaking from my own experience his formalisation of the technique of delivering conformed data marts to build an enterprise-level data warehouse (Bus Architecture) simply works, and works well. I’ve been using it as the basis for a lot of design work over the past eight years, and my most recent client has an extraordinary record of receiving 100% award fees on their contract, based largely (I think) on the rapid development that it enables.

Although proponents of a unified (3NF) Enterprise Data Warehouse have much to say in criticism of it, I haven’t seen any arguments that stand up in practice. They either focus on problems associated with not following proper design procedures (eg. not conforming the dimensions) or on problems that are even less tractable in an EDW/CIF system (having three source systems that give different values for revenue, for example).

I recently finished reading Kimball and Caserta’s  Data Warehouse ETL Toolkitbook, which was on the whole a great basis for robust development of the ETL subsystem. Where I did have issues with it was in some database-specific errors (Redo and Undo are notthe same thing) and in what I felt to be insufficient caveats on the differences between feature implementations across different vendors (the ability, or inability, to drop indexes associated with a single table partition for example). However the procedural descriptions and emphasis on the planning and documenting side is beyond reproach, and ought to be required reading for all data warehousers.

On the same lines I have added a link to Intelligent Enterprise, which I like for the less vendor-specific focus on Business Intelligence. Sometimes it does us good to get out for a breath of fresh air.

Lastly, as a fan of extremely long blog posts I have to give a nod to Nuno Souto’s blog, the contents of which speak for themselves. Good stuff, Noons.

Extensions to Slowly Changing Dimension Types

One of the bedrock concepts in dimensional modelling is that of “Slowly Changing Dimensions”, which are structures and loading techniques that allow the system to account for time-variant relationships between attributes in a hierarchy. For example, tracking changes in the address of a customer as his wife’s employer forces them to move from Colorado Springs to Arlington VA. Ahem. Continue reading

Thoughts on Range-Range Composite Partitioning

As of 10gR2 there is No Such Feature as Range/Range composite partitioning, of course. Composite partitioning is limited to Range/Hash or Range/List, only the latter of which have I made much use of — which is not to say that Range/Hash does not have its place, just that its place and my place have not yet intersected.

However the Word On The Street is that 11g comes with Range/Range included, and that prompted some thoughts because I’ve been making much use of multicolumn range partitioning, so my thought is this: if we already have the ability to use multicolumn range partitioning then what is the benefit of Range-Range composite partitioning? Continue reading

Under the Wire

I just slipped my white paper for the 2007 Rocky Mountain Oracle User Group Training Days in yesterday, which was coincidentally the deadline for getting them included on the CD that attendees will be receiving. Lovely.

The title is Linux 2.6 I/O Schedulers For Oracle Data Warehousing, so it will be an esoteric little 30 minutes of talk about head movement, elevators, and a cheap (free) way to get 60% better read performance. Certainly not mainstream stuff but anyone looking for an economical test and/or development platform would be doing well to consider Linux nowadays even if you do insist on some monstrous HPUX RISC +SAN architecture for your production machine. When your sub-$10,000 machine outperforms your “real” hardware at disk bandwidth then you have something of interest to talk about over the water cooler at least.

Looking through the list of presenters there are a fair few that I know/bumped into at Hotsos 2006 or have corresponded with since — discretion and amnesia prevent me from recalling to whom I owe drinks, but if you say I do then I’ll probably believe you.

Unfortunately socialising will be curtailed by the 70 mile drive home every evening down I25. In the event of yet another flippin’ snow storm, and they do seem to be popular this year, I’d better make sure I have a bag packed all the same.

New Laptop Required: Suggestions?

I think my laptop is finally ready for the boneyard. Dead battery, dodgy USB ports, overheat issues (to install a new o/s I need to use a “cryogenic install” method of running the install on my deck on a cold winter’s night, preferably subzero temperatures).

I could go browsing around Best Buy and such like but I’m sure that a reader has recently got themselves a great deal. Portability is high on the wishlist, as is a ton of memory.

Any ideas? Brands to avoid?