This BeyeNETWORK spotlight features Ron Powell's interview with David Loshin, President of Knowledge Integrity. David discusses the various challenges presented by big data and the steps that can be taken to tackle those challenges.David, you recently wrote a white paper entitled, “Big Data in Government: Business Drivers and Best Practices.” Can you talk about the business drivers?
Business drivers are sometimes masked by the desire to adopt new technology. But, in reality, if we boil it down to some of the key points, government agencies are significant consumers of lots of information. They need to manage that data internally, coordinate programs internally within an agency, and often share data between agencies. It introduces new opportunities for scalability for managing, sharing and moving information. And, the kinds of systems we have had in the past need to be upgraded, managed and maintained in a way that allows for that scalability to be promulgated across the agencies.Government agencies and the government are very different than the commercial world. Can you give us specific use cases?David Loshin:
Yes, there actually are a number of really interesting use cases, a large number of them hinge on things like looking for better use of data, having better informed decisions from actionable intelligence, and publication of data for greater transparency. As you can tell when you look at websites like data.gov, there is significant caching of government information. Those data sets need to be produced in some way, and a lot of them are very large. A particularly interesting one is identifying and eliminating fraud, waste and abuse. Today in a large number of agencies, the analysis of fraud, waste and abuse is really looking to recover the monies from fraudulent activities that have already taken place. We want to be able to help or show how organizations can use scalable, high-performance technologies to transition from trying to recover stuff that’s already been done to identifying fraud as it is happening and prevent the negative impacts of those activities before they happen so you don’t have to worry about trying to track it down and find the money. This is suitable to big data applications because it takes a lot of data, absorbs a lot of data and analyzes massive amounts of data. A lot of these data sets have structured sets of transactions as well as a lot of unstructured data coming from a number of different sources. The algorithms for developing these predictive models are computationally intensive. They need a lot of computational resource and access to a lot of the data. That’s one good example.
Another example involves homeland security. Cybersecurity is one aspect of that, and it is a growing issue both in the private and the public sector. The idea there is that you want to be able to monitor network activities and behaviors as they are taking place to identify both known patterns of risk and suspicious patterns of access that may be indicative of some kind of breach, some kind of data leakage, some kind of denial of service attack or some attempt at pulling information from government agencies. This big risk is growing in scope, and it’s suitable to a big data application because of numerous data streams, significant variety of structures, significant variety of formats and content – DNS data, DHCP data, packet data, header information, net flow information, web logs and alerts – all these different types of data including, again, unstructured data such as information that is in emails, plus the header information that links individuals sending emails to each other. Those are some good examples of the combination of structured and unstructured data. Massive amounts of data from all these different sources – again, the performance of these applications and the algorithms are impacted by the need to capture, organize and analyze massive amounts of data. So it’s something that, again, would be subjected to and benefit from high performance parallel systems.
What’s the next step for these agencies?David Loshin:
One of the challenges is that the appearance of simplicity of implementing a big data application through downloading open source software and implementing it on desktop machines belies the complexity of being able to do this in a way that not only provides the level of scalability and performance that’s expected to be able to handle these massive amounts of data, but also in a way that has a lowered or manageable total cost of operations. That, coupled with the fact that many government agencies have already invested hundreds of thousands , millions or tens of millions of dollars in existing infrastructure for doing analyses for these same kinds of challenges, means they’re not about to rip and replace those for something that potentially is unsecured, developed by external parties, and isn’t really hardened for production use. So it looks like what the next steps would be is to assess the suitability of some of these applications to transition incrementally by selecting certain kinds of value propositions for certain kinds of applications, piloting those kinds of applications on a variety of different high-performance big data platforms – including analytical appliances, networks of machines and networks of servers running Hadoop or running other types of platforms – and identifying clear objectives for performance and scalability and looking at a way of creating the governance around the adoption of those technologies so that it blends and dovetails with the existing infrastructure and investment.So if our audience wanted read this white paper, should they just type in the title of the report into a search engine? David Loshin:
I know for a fact that they can directly search for it. It’s called “Big Data and Government: Business Drivers and Best Practices.” It's also available from Teradata
Great. David, thank you so much for helping our audience understand some of the big data challenges within the government.
SOURCE: Government Big Data Drivers and Best Practices: A Spotlight Q&A with David Loshin
Recent articles by Ron Powell