Originally published 18 December 2008
For the past decade or so, I have had the honor and privilege of helping people around the world with their data warehouses. The work that it has been my privilege to do has ranged from initial design to operations to monitoring to ease of and speed of end user access. In addition, I have seen data warehouses around the world. A recent check of my passport showed that work has been done in 56 countries.
Yet, just when you think you have seen it all, something new crops up. A jackrabbit jumps out of a hole in the ground right in front of you.
The other day, I was consulting at a bank. We were looking at their data warehouse when something different cropped up. The bank was loading data into their data warehouse. Then, once a month, they were reading their data warehouse and recalculating a figure. The figure was a risk rating for each loan customer. Each month, they recalculated the bank’s assessment of the likelihood of a loan being repaid.
At first glance, this seems to violate the cardinal rule of a data warehouse never being updated. After all, at least one field is being read, recalculated, and written over.
But further investigation showed that perhaps this practice was OK.
From a business standpoint, it was necessary to have a very current assessment of risk. Recalculation on a monthly basis was a genuine business requirement. Secondly, the criteria used to assess risk were changing all the time. Over a year’s time, there was considerable change in the basic risk calculation. Each change in the basic risk calculation invalidated the previous month’s calculation. So, once again, there was a business requirement for a recalculation.
A third factor was that the bank had been operating their risk assessment this way for many years. This meant that when a bank officer examined this month’s record, there was no cry of, “My data does not match last month.”
A fourth factor was that the calculation was done for all loans. While it is true that the assessment value may stay the same for a given loan throughout a stretch of months, the calculation will have been done differently each month.
So from all of these standpoints, the recalculation of data within the data warehouse seemed like an OK thing to do.
There was one nagging question that arose. That question was, “Is the bank ever going to need to compare one month’s set of calculations against a previous month’s set of calculations?” Was there ever going to be a need to look at this month’s data compared to the same month a year ago, for example?
The bank officers assured me that such a calculation was never done. And as long as a calculation of this variety is never done, then there is no need to store a historical record of each month’s calculations. But had such a comparison needed to be done (or even if someone thought that it might need to be done sometime in the future), it would have been simple enough to store the historical record of each month’s calculation.
Thus it was that the good people of the bank got me to agree that violation of one of the basic tenets of a data warehouse was OK.
Does all of this mean that the rules for data warehouse design really aren’t cast in stone? The answer is that people violate the rules of good data warehouse design all the time. And, in each case, something is gained and something is lost.
The point of good design is to make sure you know exactly what is gained and what is lost. As long as the tradeoff is well thought out, as long as the tradeoff has both sides of the equation considered, then the decision to violate the rules of data warehouse design may be OK.
All too often, people make design decisions that violate the good practices of data warehouse design without having a clue that a tradeoff is being made. All too often, people make design decisions looking at just one side of the coin. And this is where people get in trouble.
One such decision – for example – is the conscious decision to do wholesale transaction processing in a data warehouse. From the standpoint of technology, can such a design decision be accommodated? The answer is yes, there are technologies that will allow you to update a data warehouse. And this may seem to be a dandy thing to do.
But the downside of doing wholesale transaction processing in a data warehouse is quite large. When you do wholesale update in a data warehouse:
And the list goes on.
Changing the basic tenets of good data warehouse design must be done carefully and with full inspection of both the positive and the negative cases of the database design tradeoffs.
SOURCE: Calculations Made Outside of the Data Warehouse
Recent articles by Bill Inmon
Comments
Want to post a comment? Login or become a member today!
Be the first to comment!