Data Streaming Essential for Real-Time Big Data Applications: A Q&A with Neil McGovern of SAP by Ron Powell - BeyeNETWORK

Originally published 25 August 2014

This BeyeNETWORK article features Ron Powell’s interview with Neil McGovern, senior director of marketing at SAP. Neil and Ron talk about data and event streaming and what they mean for real-time big data analytics.
Neil, what are data streams and why are they becoming so important?

Neil McGovern:
Great question. Data streams are a series of similar events, such as a stream of point-of-sale transactions from cash registers, clickstreams from websites, ticker streams from stock exchanges or even the data that streams in from sensors – external sensors or network equipment. Much of this data has been around for a long time, but as we move to a real-time world, instead of batching up these streams into a batch file and consuming it in one gulp, streaming data is data that comes into the organizations and is handled as it is created or as it is detected and arrives. And it’s becoming important because as we move to a real-time world, we need to be able to analyze these events as they happen, not the next morning after you process the batch of events that happened in prior time. Streams of events are continuous events that are happening, and stream processing is the analysis of those events as they are happening as they’re streaming into your organization.

You mentioned event stream processing. Is it the same as time series analysis? If not, what is the difference?

Neil McGovern:
Those two areas are similar, but they are different. Times series analysis is a type of analytics that can be performed on time-stamped data such as events, but there are other types of time-stamped data as well. Let me give you an example that might make this clear. The difference between event stream processing and time series analysis is that we can calculate an average value for a stream of events, but it doesn’t tell what’s happening to those events when they’re happening. So if we calculated the average value of a stream of events for 30 minutes, for example, that gives us a snapshot of what the data is. Time series analysis tells us when it happened – did all of these events happen in the last 30 seconds or did they spread over the whole 30 minutes? So it’s the difference between understanding the values of the data and the makeup of that data -- that’s what time series analytics does. That’s why times series analytics is very common in event stream processing. It can be used in other places, and event stream processing is more than just time series analytics. So the two are intertwined.

How is processing data in an event stream different from processing data in a database?

Neil McGovern: In a database, you have to wait until you have a complete data set and then run a query over that to determine the result. This is the standard model we’re used to. We ask one question of a large amount of data and get the answer. In event stream processing, we flip that on its head. Instead of asking one question of multiple data sets, as each new piece of data comes in, we trigger the queries associated with that piece of data. If an event happens, that triggers the query, and we get the results. We can then build up those results over time. Let me give you a simple example. If you want to know how many men and women there are in an audience of 5,000 people, the traditional way would be to wait until everybody is in the room, stand up on the stage, and ask them to raise their hands if they’re a man. Then you would count the number of hands in the air. And you’d then know the number of men versus the number of women in the audience. That’s your standard database model. With event streaming you’d count the number of men and women as they walk in the door. In other words, you query everybody coming in as to whether they are a man or woman, and build up the results as people walk in so that you get an idea of what’s happening even before you have a full data set. As soon as the last person is in the door, you know the complete answer. So it’s an event-driven analysis model rather than a polling model that is the key distinguisher between the two.

So there really isn’t any latency when you’re doing event stream processing because you know the results as soon as the last person walks into the room, rather than launching the query once everybody is in the room. Right?

Neil McGovern: That’s exactly right. And your event triggers the query so when the event happens, the query is fired off automatically rather than having to wait to run a query against the database in the polling model versus the event-driven model. There is a lot less latency in the system.

What problems are businesses solving and what challenges are they addressing via event stream processing?

Neil McGovern: Event stream processing is used in many industries today. A lot of the earlier uses were in the financial services market where it was used to keep indexes like the Dow Jones up to date. So as soon as a transaction happens, it recalculates the events automatically. It is also used to monitor for patterns of events that may indicate illegal activity. In fact, several stock exchanges around the world – like the one in particular that we work with in Europe – use our system to do exactly that. Because one of the things you can do is identify complex events across multiple streams of data. You can see if, for example, these three things happen within a five-minute window, take an action. But as people became more aware of the technology, the number of uses proliferated and we now see telecommunications firms using event processing to monitor networks in real time to know if a cell is acting up. It’s used in manufacturing to monitor equipment and detect faults as they’re developing so you can do proactive maintenance rather than waiting for the machinery to fail. Smart roads, smart grids and smart bridges will all rely on streaming analytics. Already in the Tokyo prefecture, we see all the bridges and the sensors in those bridges connected up to a central data center that is running our event processing software that detects the state of the bridges, how structurally sound they are in case of an earthquake, so they know instantaneously what the pressures were on one particular bridge, etc.

The number of uses is proliferating. An interesting one I saw recently was the use of the processing to monitor hygiene in hospitals to make sure everyone is following the rules and the procedures that are in place. So it’s a real-time monitoring of people’s activities to make sure that they’re adhering to the best hygiene practices.

It’s a range of areas where we’re seeing event processing happen today.

You seem to be doing a lot with your customers. What do you see next for stream processing in the future?

Neil McGovern: Event processing or stream processing is going to be key for future applications. Event processing is essential for real-time applications and for handling huge volumes of data from, for instance, the Internet of Things. These are two of the key developing trends in the industry -- real-time analytics and the Internet of Things. And event processing is very important there. It can be used as what we call the data shock absorber. So if you get a huge amount of data coming in, these event-processing systems are incredibly performant and can handle millions of events per second, far more than the database could handle. So they can handle the huge volumes of data we’re expecting from the Internet of Things. And they can do it in real time. So they’re an essential tool for real-time applications. These are two of the largest trends in application development today, and they are going to rely on event processing to be successful.

We’ll see event processing happen more and more. For instance, if you type a search into Google for an elliptical machine, for instance, you’ll notice that for the next day or two you’ll see ads appearing on other websites for elliptical machines because Google is using the stream of data coming in and what we know about you to customize what pages of advertising in real time for you. So real-time analytics, handling streams of data, we expect to see more and more of these use cases in a widening range of industries. I can’t tell you every use of streaming data, but I can tell you that most of our industries will be as familiar with event processing as they are with batch processing concepts today.

Well, that makes a lot of sense. Neil, I want to thank you for providing this great insight into how SAP is providing the right platform for stream processing and highlighting the many benefits your customers have achieved.

SOURCE: Data Streaming Essential for Real-Time Big Data Applications: A Q&A with Neil McGovern of SAP by Ron Powell - BeyeNETWORK

  • Ron PowellRon Powell
    Ron is an independent analyst, consultant and editorial expert with extensive knowledge and experience in business intelligence, big data, analytics and data warehousing. Currently president of Powell Interactive Media, which specializes in consulting and podcast services, he is also Executive Producer of The World Transformed Fast Forward series. In 2004, Ron founded the BeyeNETWORK, which was acquired by Tech Target in 2010.  Prior to the founding of the BeyeNETWORK, Ron was cofounder, publisher and editorial director of DM Review (now Information Management). He maintains an expert channel and blog on the BeyeNETWORK and may be contacted by email at rpowell@powellinteractivemedia.com. 

    More articles and Ron's blog can be found in his BeyeNETWORK expert channel.

Recent articles by Ron Powell

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!