Oops! The input is malformed!
Originally published 25 July 2013
Analysts often seek to derive or infer actionable knowledge about a single entity based on that entity’s context. For example, we often use demographic data associated with locations to make inferences about the individuals that live in that location, such as the home ownership, educational attainment, and median income data collected and then disseminated as a result of a nationwide census. Inference through context is rampant – there are many scenarios in which aggregate demographics are used to describe cohorts. Using the same examples, people’s preferences are often categorized by age group (“most popular shows for male television viewers between the ages of 18 and 34”), level of education (“most popular web \sites for college graduates”) or income (“most popular automobile model for those with average income greater than $100,000”).
And in all of these cases, you are effectively using “median data” taken from a swath of individuals as a way of classifying any unique individual. In essence, we are relying on aggregate information associated with data collected at some discrete point in time and then subjected to some process of analysis and reporting. But while time marches on, these aggregate profiles may not be continually synchronized with reality; and, in the worst case, the aggregate profiles become stale as changes take place.
For example, a location that might have been popular for recent retirees in the year 2000 may have transitioned to an older, poorer and less healthy community by 2013. This may not be a result of transience in the community, but rather is a byproduct of the aging of those recent retirees – they were in their mid ‘60s in 2000, but by 2013 they are largely in their late ‘70s or early ‘80s. So a community that might have been ripe for selling golf clubs in 2000 might be more of a target for assisted health care by 2013.
This is an extreme example; but as the velocity of business increases, there is a corresponding need for consistent and current information about the contexts we use for analysis and categorization. Yet reliance on both costly and unwieldy comprehensive surveys such as a decennial census as the basis of our context analysis means that the profiles are static and tend to degrade in value over time.
Currency: In the survey model, you are limited by the data collected at the time the survey took place. The model degrades over time if it is not continually updated. For example, the US census is performed once in ten years, so halfway between the events, the data is already stale. On the other hand, as long as there is access to updated information, the crowdsourced model can virtually be updated in real time. Flight data transactions are always updated and can be accessed at any time, providing the updates as requested.In my opinion, there are some key value drivers associated with the crowdsourced model. The next question centers on how that crowdsourced model can be designed and implemented? In my next column, I intend to explore the requirements for implementing this crowdsourced model and then consider the tools and techniques that could make it happen.
Breadth: In the survey model, you essentially have one major chance to collect the data that is needed to develop the profiles, so there is a focus on absorbing data from across a broad space. Again, the US census approaches every household in the country. Alternatively, the crowdsourced model can focus on specific smaller regions as necessary. The collection of flight data transactions can be limited to those individuals flying out of and into one specific metropolitan area, a single airport or limited within certain time bands.
Precision: With the survey model, you can see changes over time, but trending is mapped over long durations. With the US census, the time frame is ten years, so any trends are effectively “macro-trends.” With the crowdsourced model, you can analyze the information associated with a particular location over a particular time frame with relatively small periods between times of collection (such as week by week, day by day, or even hour by hour). This allows you to see emerging patterns or trends within very short time frames, providing additional insight.
Recent articles by David Loshin