We solved the problem of storing and accessing large amounts of data - traditional database vendors upgraded their offerings to work with terabyte size data. And we saw the advent of datawarehousing appliances such as Teradata that could store and access data in hundreds of TB with relative ease.
While we solved the problem of storing and accessing such data volumes, processing it and looking for patterns continues to remain a challenge. For example, a large bank in the US took upto 14 hours to run a SAS AML algorithm inspite of having Teradata. If we had to look for a pattern such "A calls B and hangs up, and immediate after that B calls A" through a months data of calls, it could take hours if not days.
Why? We still need to access the data, pull it into one or more servers away from the datawarehouse to perform the computation or search.
The same process at the bank above took 3 minutes when the SAS APIs in burnt into Teradata were used.
In this case, the data did not have to be pulled away and examined. All of the work happened close to the place where data was stored. The appropriate APIs (in this case SAS APIs) resided in the core of each Teradata Snippet Processing Unit, making it 300 times faster.
Greenplum and Asterdata are two other datawarehousing appliances that have impressed me with grounds up implementations of "taking processing closer to data". It is worthwhile giving a good look at these products - both have unique features - and much better price performance.
To take advantage of these technologies think through your processing needs as well as your data model and data volumes.
Posted June 9, 2010 6:05 AM
Permalink | 1 Comment |



