Oops! The input is malformed!
Originally published 1 May 2014
In the first part of this series we introduced the concept of corporate data. There are three distinct types of corporate data – structured data, unstructured repetitive data and unstructured nonrepetitive data. Each type of corporate data has its own unique characteristics.
Let’s look at corporate data from the perspective of business relevancy.
Starting with structured data, in almost all parts of structured data there is business value (or at least potential value). It is hard to find a unit of structured data that does not at least have the potential of satisfying a business need. Take a person’s bank balance. At any moment in time, it is possible that a person may want to check his/her bank balance. So structured online data has the potential of being used for business value. It is hard to rule out almost any structured data as having little or no business relevance.
Now let’s consider unstructured repetitive data. Unstructured repetitive data is data that exists in large volumes that is highly repetitive. There are telephone call records and there are manufacturing analog calibrated measurements of data. Of this class of data – expressed as a percentage of the total – very little of the data is business relevant. For example, in measuring the quality of steel, maybe only one out of 1,000,000 records is actually useful. So – unlike structured data – most of the records found in unstructured repetitive data are not business relevant. There may be some records that are potentially useful. But nearly all the records found in the unstructured repetitive sector have little or no business use.
Then there is the third classification of data. That class is the unstructured nonrepetitive class of data. Typical of the unstructured nonrepetitive class of data are emails, call center records, warranty claim records, health care records and many more types of records. When it comes to business relevance, the unstructured nonrepetitive class of data has different characteristics than either the structured world or the unstructured repetitive class of data.
In the unstructured nonrepetitive class of data there is a small percentage of data that is not business relevant. In this class of data is email spam and blather. There is very little, if any, business value to spam or blather. There are stop words – words that are necessary for proper conversation but that are essentially extraneous to the topic being discussed.
However, all the remaining text in the unstructured nonrepetitive category of words has business value (or at least potential business value).
There is a tremendous difference in business relevancy when looking at the different types of data found in the corporation. With structured data, nearly all data is business relevant – or at least potentially relevant. With unstructured repetitive data, hardly any data is business relevant. And with unstructured nonrepetitive data, some moderate percentage of data is business relevant.
This difference between types of data and their business relevancy makes a big difference when it comes to finding business value as shall be seen in the next article in this series.
Recent articles by Bill Inmon