Choosing the Best Platform for Big Data Projects: A Spotlight Q&A with Richard Winter

Originally published 11 November 2013

This BeyeNETWORK spotlight features Ron Powell's interview with Richard Winter, President of WinterCorp. Richard describes the new metric he developed: TCOD, the total cost of data. Richard also talks about ways that this TCOD metric can be used to choose the best platform for big data projects.

Richard, youíve invented a new metric Ė TCOD, the total cost of data. Can you define for our audience what that is?


Richard Winter: Yes, that is the sum of all of the information technology-related costs in getting from A to B in a business sense. If your company, for example, wants to implement a major new marketing initiative using big data, you could apply this metric to help estimate the total cost of accomplishing your objective and using your marketing data over a period of time.

What elements comprise the TCOD?


Richard Winter: Many people are familiar with the term TCO Ė total cost of operation. The new TCOD metric includes TCO (the cost of the system and the cost of operating and managing it), but it also includes related personnel costs, particularly those for developing the solution. And the solution in the analytic data space usually means developing analytic applications, developing queries, developing analytics, ad hoc analytics and sometimes ETL Ė preparing the data for analysis. It turns out that in many cases, those costs are larger than the system costs.

It doesnít surprise me. A lot of people say personnel are incredibly expensive and those costs grow over the years. In your paper that you did on the TCOD, you had a data refining example on turbines, which I found very interesting. Could you talk about that?

Richard Winter: Yes, this example fascinates me. It turns out that there are a lot of industrial turbines running in the world. GE estimates over 2 million large industrial turbines are in operation, and they're in airplanes, ships, boats, windmills, power plants and other facilities. A lot of things you never think about have large turbines spinning all the time. Now, all modern large turbines have many sensors embedded in them that emit data. So, a large commercial jet engine used on a typical commercial airline flight emits 10 terabytes of sensor data per hour. Thatís an example.† What is the problem? Letís say you are an operator of jet aircraft or youíre a manufacturer of jet engines or airplanes, or youíre responsible for maintaining jet engines, you need the results of analyzing the sensor data in order to make important decisions. It turns out thereís so much sensor data that you couldnít keep it all forever Ė no matter where you tried to put it.

The challenge is to analyze the data quickly after it arrives, and filter out the stuff thatís really valuable so that it can be saved and further analyzed, and throw away the stuff thatís not interesting. Thatís the data refining challenge. The data refining challenge exists in this jet engine example and the turbine example, and it also exists in an increasing number of applications in industries. If you face such a data refining problem, you could use the TCOD metric and framework to estimate which data platform or which type of data platform would give you the best economics overall in solving your data refining problem.

Thatís intriguing Ė an airplane could have four jet engines so that would be 40 terabytes of data per hour. Thatís amazing.

Richard Winter: Yes, for every airplane flying.

The other comparison Ė the enterprise data warehouse versus Hadoop. The metrics on that were very interesting. Could you elaborate on that?

Richard Winter: Let me just comment for a minute on the first example. My research shows that if you use the TCOD metric, you discover with the data refining problem that the economics of Hadoop are terrific, that the overall cost of solving the problem is roughly three times less than it would be on a data warehouse appliance in the scenario I came up with, which I think is fairly representative. The TCOD metric is helpful there because it allows you to see all the major components of the cost and how they add up to a whole that drives the economics of your business.

In the enterprise data warehouse example, the data requirements are very different than in the refining example. In the refining example, the raw data lives for a few hours or a few days before it gets refined and most of it gets thrown away. In the enterprise data warehouse example, data lives for decades and is reused thousands of different ways. What the data platform needs to do is provide you with the facilities to manage that data, to integrate it, to enforce its data integrity, to perform queries efficiently, to perform analytics efficiently, and to be able to develop queries and analytics rapidly. And that creates a different set of economics. If you apply the TCOD metric to that case, it comes out the other way. Turns out that though the Hadoop platform costs less than the data warehouse platform, the total cost of the project is much less with the data warehouse than with Hadoop.

The metric is useful in illuminating these two examples and many other cases of data management requirements.

Based on all of the research you have done, what conclusions can we determine regarding the TCOD?

Richard Winter: Itís worth driving these decisions from a business point of view with the total cost metric. In applying that metric, you want to use an approach something like the one Iíve used where you take into account the cost of developing queries and analytics because theyíre so important in these cases. I think if you do that, youíll discover that you have both Hadoop cases and data warehouse technology cases in your business. Youíll need both types of platforms, but you would benefit greatly from exercising care in fitting each application to the platform that provides the best economics.

Richard, thank you so much for introducing us to the TCOD metric.

Note: If you would like to read Richard's full research report on the TCOD metric, it is available at no charge at www.wintercorp.com/tcod-report.

SOURCE: Choosing the Best Platform for Big Data Projects: A Spotlight Q&A with Richard Winter

  • Ron PowellRon Powell
    Ron is an independent analyst, consultant and editorial expert with extensive knowledge and experience in business intelligence, big data, analytics and data warehousing. Currently president of Powell Interactive Media, which specializes in consulting and podcast services, he is also Executive Producer of The World Transformed Fast Forward series. In 2004, Ron founded the BeyeNETWORK, which was acquired by Tech Target in 2010.† Prior to the founding of the BeyeNETWORK, Ron was cofounder, publisher and editorial director of DM Review (now Information Management). He maintains an expert channel and blog on the BeyeNETWORK and may be contacted by email at†rpowell@powellinteractivemedia.com.

    More articles and Ron's blog can be found in his BeyeNETWORK expert channel.

Recent articles by Ron Powell



 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!