Oops! The input is malformed!
Originally published 5 May 2011
We often use the terms "category" and "categorization" in data management, perhaps without thinking too deeply about what a category is. But just what is a category, and what is special about it? And is it really important to answer these questions, or are they simply pedantic and academic? Perhaps there is nothing much to understand about categories.
In the past there was less of a need to worry about categories. But today, semantics and ontology are becoming increasingly important in data management, and categories are important in these disciplines. It is therefore worthwhile to take a brief look at what categories are and what their significance might be.
The word "category" is an ancient one that owes its origin to Aristotle. Aristotle was able to form a finite list of all the different ways things could exist. His list consisted of ten ultimate aspects of reality, and Aristotle called them "the categories.” Most of them are self-explanatory, and they are as follows:
The categories were formed in part with the idea that every material individual thing must be fully determined by each of the ten categories, and this helps us understand its place in nature. For nearly twenty-four centuries, the Aristotelian Categories were taught as a central part of traditional logic. They were also called the "summa genera" and the "predicaments.” We often use the phrase "in a predicament" without understanding that "predicament" is a technical medieval Latin word that means exactly the same as "category.”
This may all be very interesting, but how does it apply to modern data management? One answer is that the Aristotelian Categories keep turning up over and over again. For instance, the Zachman Framework has six columns that bear an uncanny resemblance to the traditional categories, as shown in Table 1.
Zachman Framework Column
Corresponding Aristotelian Category
|What||Substance, Quality, Relation|
Table 1: Possible Mapping of the Zachman Framework Columns to the Aristotelian Categories
Of course, Table 1 is my interpretation of the mapping, but there is at least a prima facie case that such a mapping exists. This may help explain why there is a general feeling that something fundamental lies behind the Zachman Framework, and it is possible that it is the correspondence to the traditional categories.
This correspondence also allows us to ask why certain Aristotelian Categories are not in the Zachman Framework. For instance, "Quantity" is not in the Framework, and the Framework has been criticized for not having a column called "How Much.” Also, we see from Table 1 that "Why" is in the Framework but not in the Aristotelian Categories. Perhaps "Why" relates to purpose, which is not something that Aristotle thought could be used to describe things in the universe.
Therefore, an analyst might think about using the Aristotelian Categories in addition to asking questions about "Who, What, Where, When, Why, and How.” Unfortunately, the literature on the traditional categories is rather far removed from their practical application. For an analyst, perhaps it is best to use the categories as a check list to make sure that all aspects of understanding of a particular problem have been covered.
A second application of the idea of categories is in the increasingly popular upper ontologies. An upper ontology is a set of categories within a particular domain of human experience. They are very common. For instance, the Hollywood attitude to life might be summarized as “Sex, Drugs, and Rock and Roll.” Whatever anyone might think of this, it is a valid upper ontology with its three categories as one particular distillation of the purpose of life.
Upper ontologies are also important for IT. Today, the upper ontology "People, Process, and Technology" is very much in vogue. Innumerable IT departments and advising consultants devise transformational programs based on this upper ontology. It is not just a framework for organizing artifacts — it is a real view of what IT is. This brings us to two interesting questions that can be asked of all upper ontologies: What is missing from them, and why are these concepts missing? If everything that IT is concerned with can be distilled into "People, Process, and Technology," what has become of information and data? In the terms used by the philosophers, why have information and data been eliminated from these categories? If it cannot be shown how information and data can be referred to as "People, Process, and Technology," then it is not an upper ontology, but a mere slogan – a phrase used more to elicit an emotional reaction rather than deliver any understanding.
Upper ontologies do not have to be termed as such. For instance, they include subject area models, lists of master data entities, and the high-level concepts in canonical exchange formats. What can be done with an upper ontology remains a little fuzzy. The categories in such an ontology should be the general concepts (supertypes) from which all the specific concepts (subtypes) can be derived. This should help in data models. Also, the establishment of categories should help in interoperability. However, general concepts are broad rather than deep and have fewer attributes than specific concepts. Also, just how does a list of top-level concepts help with interoperability? This suggests that there are more than simple questions — there are complex problems that are going to have to be thought about a great deal. So it seems we lack a robust methodological approach for both creating upper ontologies, and applying them.
Today, "categorization" is often used interchangeably with other terms such as "classification.” These do not distinguish between true bottom-up classification and top-down logical division. These are separate activities. A data entry operator may need to classify a particular customer as one of many possible customer types, but this is a different problem than an architect has when trying to decide how many different kinds of parties exist in the enterprise.
My view is that the use of categories in upper ontologies is important, and needs a distinct approach. Such categories form frameworks for understanding information and data in more specific ways. Also, the same set of data may have a number of different viewpoints at a detailed level, but the upper ontology may be constant from viewpoint to viewpoint. As such, the set of categories that forms the upper ontology has a good deal of value as a starting point of common reference for the different viewpoints. It may also assist in the detailed transformations between the different viewpoints. Some of this is future work for semantics and ontology, but it would be helpful to gain general agreement on the rather special nature of categories, as opposed to applying the term "category" to an overly broad set of meanings that have rather little in common.
SOURCE: What Is a Category?
Recent articles by Malcolm Chisholm