Saturday, January 13, 2007

Taxonomy, Ontology and Facets

I have a confession to make. For the last 5 years or so, I have been working with Taxonomies and Ontologies without fully understanding what they are. Fortunately, as a web developer (my last job), they were peripheral to what I did, and there were APIs which hid the underlying structure, so not understanding them was not as clueless as it now seems in retrospect. At my current job, however, I am working with the company's information assets more closely than I was before, so this understanding is a requirement. This article describes the current state of my understanding. I am by no means an expert on these concepts, but I think the information here may be useful to a web developer who is curious about Taxonomies, Ontologies and Facets, and how they can be used to surface information on web pages.

Before I started reading up, I would describe a Taxonomy as a classification of entities according to some preferred classification property, and an Ontology as a re-classification of the entities in the Taxonomy according to some other property (or properties) that we wanted to show the user of our web pages. So for example, the Taxonomy for an electronics store may be the various suppliers at the top, followed by the different types of items each company manufactures, followed by the items themselves. This classification may have been natural because the Taxonomy reflected the needs of how they got the specifications for their products (by supplier), but down the road, they would probably want to show the customer all the HDTVs they sell together, regardless of who manufactured it. So they would create another classification of their products by category. There may be many such classifications, so their Ontology may end up as a forest of trees.

As surprising as it seems, the above is largely correct. However, notice the relative lack of specifics. Notice also how the explanation above requires a large amount of hand-waving to explain it and does not delve too deeply about how it can be implemented, a sure sign that the person explaining it needs to refine his understanding of the subject.

I found the clearest and most concise (to me anyway) explanation about how a Taxonomy and Ontology differ from each other here. As this explanation states, a Taxonomy is generally modelled as a Tree that may or may not support multiple inheritance, so the only relationships between entities that can be inferred are Broader (parent or parents of the current entity), Narrower (children of the current entity) and Related (siblings of the current entity). Ontologies, on the other hand are modelled as triples (source entity, relation, target entity). So the "Dogs LivesIn House" relationship would be the triple (Dog, lives in, House).

A Facet is a property that can be applied to all or some of the entities in the taxonomy. It is a one-dimensional cross cut, and does not specify a relation by itself. A good introduction to Facets for a web developer is William Denton's article - "How to make a Faceted Classification and put it on the Web". The closest parallel to Facets I could think of was the use of star schemas in Data Warehousing.

Ontologies can be modelled in various ways. If the Ontology is used to recategorize the entities, then it may make sense to build a set of parallel tree structures based on the Taxonomy. With this approach, however, the Ontology becomes as difficult to change and enhance as the Taxonomy. Fortunately, this is not necessary most of the time. Most Ontology trees that back web applications are shallow and broad, because you want to provide the maximum data with the simplest possible interface to the client without being irrelevant. So mostly, all we need to do is to enrich the Taxonomy by providing additional relations, and Facets are usually sufficient and very well-suited for this purpose.

In a computer system, you would typically model the tree structure for a Taxonomy using a relational database. To model an Ontology relation, you could model each relation as a separate table with a many-to-many relation foreign key references back to the entity table from the taxonomy.

1
2
entity (id, entity_details...)
livesIn (lhs_entity_id, rhs_entity_id)

If there are many different relationships, it may make more sense to combine them into a single table and use a relation_id in a relations table instead.

1
2
3
entity (id, entity_details...)
relationships (id, relationship_details...)
entity_relations (lhs_entity_id, relationship_id, rhs_entity_id)

So to find what are the possible places the entity "Dog" can live in, we would find the entity_id for "Dog", the relationship_id for "livesIn", and find the entities that are related to Dog in this manner from the entity_relations table.

1
2
3
4
5
6
7
select er.*
  from entity el, entity er, relationships r, entity_relations re
  where el.id = re.lhs_entity_id
  and r.id = re.relationship_id
  and er.id = re.rhs_entity_id
  and el.name = "Dog"
  and r.name = "LivesIn" 

In the above example, we modelled our triple with the entity_relations table. A Facet is technically a property, which is what the relation part of our triple is. However, it is associated with an entity, which gives us the left hand side entity of our triple, and has a value. The value could be anything, but in cases where we control the creation of the Facet and its values, it could very well point to an entity in our dataset, which would be the same as the right hand side entity relation of our triple.

6 comments (moderated to prevent spam):

Mohit Agrawal said...

Can we say that Taxonomy defines entities whereas Ontology describes relation between entities(and their attributes)?

Sujit Pal said...

Probably not. The way I think of it, a taxonomy is a tree structure. So the only relations that can be supported are (B)roader (parents of a node), (N)arrower (children of a node) and (R)elated (siblings of a node). An ontology is when you take multiple trees and join nodes in the trees via relationships, so the end result looks like a forest or graph.

Mohit Agrawal said...

Could you briefly explain me this structure in terms of Database? ie. What will be difference in the actual data of DB tables in case of Ontology and Taxonomy?

Sujit Pal said...

If you wanted to go with databases, ideally, you want to store each of your taxonomies in separate databases, and your ontology in yet another database. I have seen this done at a previous company. The problem is in the relationships, they are hard to distinguish when you have them all together, since an ontology exposes more facts (entity relation entity triples) than a taxonomy (only BNR). However, if you are able to distinguish them in your application, you could store them all in a single database. In either case, there should be no difference between how you model entities and relationships.

Unknown said...

how to store class name and subclass name in .owl file into variables

Sujit Pal said...

Hi Sanju, class and subclass are examples of parent-child or IS-A relationship, and they can be treated the same as instances. In general, in case of taxonomies, classes and subclasses are the non-leaf nodes and instances are the leaf nodes. As an example, consider the Product.owl file - there is a single class with multiple instances Product1, Product2 and Product3. The Product node represents a class and is at the root of the taxonomy, and Product1 is connected to Product via an IS-A relationship, ie Product1 -(IS-A)-> Product. Its fuzzier for Ontologies, consider for example the Shakespeare.owl file here. There are 4 classes defined here - Play, Place, Person and Author. Instances of Play are HenryV and "LovesLaborLost, etc. Both Play and HenryV are nodes in the ontology and are connected by IS-A relationships, ie HenryV -(IS-A)-> Play. If you need to distinguish class vs instance, one possibility may be to use an attribute.