Domain Modeling and Ontology Engineering

The semantic web is poised to influence us in ways that will be as radical as the early days of the Internet and World Wide Web. For software developers it will involve a paradigm shift, bringing new ways of thinking about the problems that we solve, and more-importantly bringing us new bags of tricks to play with.

One of the current favourite ways to add value to an existing system is through the application of data mining. Amazon is a great example of the power of data mining; it can offer you recommendations based on a statistical model of purchasing behaviour that are pretty accurate. It looks at what the other purchasers of a book bought, and uses that as a guide to make further recommendations.

What if it were able to make suggestions like this: We recommend that you also buy book XYZ because it discusses the same topics but in more depth. That kind of recommendation would be incredible. You would have faith in a recommendation like that, because it wasn’t tainted by the thermal noise of purchaser behaviour. I don’t know why, but every time I go shopping for books on computer science, Amazon keeps recommending that I buy Star Trek books. It just so happens that programmers are suckers for schlock sci-fi books, so there is always at least one offering amongst the CompSci selections.

The kind of domain understanding I described above is made possible through the application of Ontology Engineering. Ontology Engineering is nothing new – it has been around for years in one form or another. What makes it new and exciting for me is the work being done by the W3C on semantic web technologies. Tim Berners-Lee has not been resting on his laurels since he invented the World Wide Web. He and his team have been producing a connected set of specifications for the representation, exchange and use of domain models and rules (plus a lot else besides). This excites me, not least because I first got into Computer Science through an interest in philosophy. About 22 years ago, in a Sunday supplement newspaper a correspondent wrote about the wonderful new science of Artificial Intelligence. He described it as a playground of philosophers where for the first time hypotheses about the nature of mind and reality could be made manifest and subjected to the rigours of scientific investigation. That blew my mind – and I have never looked back.

Which brings us to the present day. Ontology engineering involves the production of ontologies, which are an abstract model of some domain. This is exactly what software developers do for a living, but with a difference. The Resource Description Framework (RDF) and the Web Ontology Language (OWL) are designed to be published and consumed across the web. They are not procedural languages – they describe a domain and its rules in such a way that inference engines can reason about the domain and draw conclusions. In essence the semantic web brings a clean, standardised, web enabled and rich language in which we can share expert systems. The magnitude of what this means is not clear yet but I suspect that it will change everything.

The same imperatives that drove the formulation of standards like OWL and RDF are at work in the object domain. A class definition is only meaningful in the sense that it carries data and its name has some meaning to a programmer. There is no inherent meaning in an object graph that can allow an independent software system to draw conclusions from it. Even the natural language labels we apply to classes can be vague or ambiguous. Large systems in complex industries need a way to add meaning to an existing system without breaking backwards compatibility. Semantic web applications will be of great value to the developer community because they will allow us to inject intelligence into our systems.

The current Web2.0 drive to add value to the user experience will eventually call for more intelligence than can practically be got from our massive OO systems. A market-driven search for competitiveness will drive the software development community to more fully embrace the semantic web as the only easy way to add intelligence to unwieldy systems.

In many systems the sheer complexity of the problem domain has led software designers to throw up their hands in disgust, and opt for data structures that are catch-all buckets of data. Previously, I have referred to them as untyped associative containers because more often than not the data ends up in a hash table or equivalent data structure. For the developer, the untyped associative container is pure evil on many levels – not least from performance, readability, and type-safety angles. Early attempts to create industry mark-up languages foundered on the same rocks. What was lacking was a common conceptual framework in which to describe an industry. That problem is addressed by ontologies.

In future, we will produce our relational and object oriented models as a side effect of the production of an ontology – the ontology may well be the repository of the intellectual property of an enterprise, and will be stored and processed by dedicated reasoners able to make gather insights about users and their needs. Semantically aware systems will inevitably out-compete the inflexible systems that we are currently working with because they will be able to react to the user in a way that seems natural.

I’m currently working on an extended article about using semantic web technologies with .NET. As part of that effort I produced a little ontology in the N3 notation to model what makes people tick. The ontology will be used by a reasoner in the travel and itinerary planning domain.

:Person a owl:Class .
:Need a owl:Class .
:PeriodicNeed rdfs:subClassOf :Need .
:Satisfier a owl:Class .
:need rdfs:domain :Person;
rdfs:range :Need .
:Rest rdfs:subClassOf :Need .
:Hunger rdfs:subClassOf :Need .
:StimulousHunger rdfs:subClassOf :Need .
:satisfies rdfs:domain :Satisfier;
rdfs:range :Need .
:Sleep a :Class;
rdfs:subClassOf :Satisfier ;
:satisfies :Rest .
:Eating a :Class;
rdfs:subClassOf :Satisisfier;
:satisfies :Hunger .
:Tourism a :Class;
rdfs:subClassOf :Satisisfier;
:satisfies :StimulousHunger .

In the travel industry, all travel agents – even online ones – are routed through centralised bureaus that give flight times, take bookings etc.  The only way that an online travel agency can distinguish themselves is if they are more smart and easier to use. They are tackling the later problem these days with AJAX, but they have yet to find effective ways to be more smart. An ontology that understands people a bit better is going to help them target their offerings more ‘delicately’. I don’t know about you, but I have portal sites that provide you with countless sales pitches on the one page. Endless checkboxes for extra services, and links to product partners that you might need something from. As the web becomes more interconnected, this is going to become more and more irritating. The systems must be able to understand that the last thing a user wants after a 28 hour flight is a guided tour of London, or tickets to the planetarium.

The example ontology above is a simple kind of upper ontology. It describes the world in the abstract to provide a kind of foundation off which to build more specific lower ontologies. This one just happens to model a kind of Freudian drive mechanism to describe how people’s wants and desires change over time (although the changing over time bit isn’t included in this example). Services can be tied to this upper ontology easily – restaurants provide Eating, which is a satisfier for hunger. Garfunkle’s restaurant (a type of Restaurant) is less than 200 metres from the Cecil Hotel (a type of Hotel that provides sleeping facilities, a satisfier of the need to rest) where you have a booking. Because all of these facts are subject to rules of inference, the inference engines can deduce that you may want to make a booking to eat at the hotel when you arrive, since it will have been 5 hours since you last satisfied your hunger.

The design of upper ontologies is frowned upon mightily in the relational and object oriented worlds – it smacks of over-engineering. For the first time we are seeing a new paradigm that will reward deeper analysis. I look forward to that day

StumbleUpon Toolbar Stumble It!

Do I say anything worthwhile in this blog? « Alec the Geek

Alec recently agonized about whether he says anything worthwhile in his blog, or whether a blog was the right medium for the kind of things he wanted to say:

Do I say anything worthwhile in this blog? « Alec the Geek

Well, the short answer is an unqualified ‘yes‘.

The long answer (based on our own experiences) is that simplistic mainstream trolls or opinion pieces will attract more passing traffic than a carefully thought out, meticulously planned, thrice-proof-read epistles. Keeping your posts short and sweet, and simple for the lowest common denominator in the audience would seem to be the way to go.

That’s if all you cared about was site hits. But that misses the point. If your interests are niche interests, and if they are too technically demanding to draw a large crowd, does that make them any the less worthy? Of course not. In fact the more niche they are, and the more care you put into their production, the higher the information content. The web needs a higher signal to noise ratio, and it can only get that if well-informed and thoughtful people write in depth about whatever they want.

Keep up the good work, dude. 

I Crossed the 200 Hits/Day Threshold! Yayyy!!!!

Today I had record stats on my blog. I passed the 200 hits per day threshold.

Since Mitch Denny told me the kind of visitor numbers he got on his blog, I’ve been operating with a bit of an inferiority complex. At the time I was peaking at about 20 hits per day and at the time I was pleased with those levels. Since then (about 4 months ago) I have been trying to attract more visitors to the blog. I get a lot of fun out of writing the posts (and my mum reads some of them back in the UK) but in the end there’s not much point keeping a public weblog unless the public reads it.

I started watching which posts brought the most traffic, and unsurprisingly it was the ones that told readers in advance what they were going to read about. Obscure or humorous titles got nowhere. People want to know the topic before they expend the time and mental effort visiting the page. Choosing my titles and topics more wisely, and creating cross-links to well-known sites (such as Mitch’s) has helped a lot, as had search engine registrations (especially reddit which I had never heard of before). I also found that controversial-but-lite content (such as my Anti-Agile Gripe) got way more traffic than the more painstaking articles on configuration and LINQ series. I don’t know whether the LINQ series will get more traffic the closer to release Orcas gets?

I haven’t gotten much in the way of comments which has been disappointing – I can’t tell whether readers are just skimming through or reading the posts! I’m not sure what to do about that. Any ideas?

Boo.com to reincorporate: one for you Aggy

It appears that Boo has reincarnated to drag us down, once again. Of course, it won’t be possible without the help of Aggy Finn who was with them at the time*.

This is how he escaped last time:

If the contractor wages go down to 10GBP/hr again, Aggy, I shall be blaming you!!!!!

* Needless to say, most this is a pack of lies. Especially the bit about Aggy being responsible.

Photosynth CTP now on display

Live Labs has now put Photosynth on display to the public. You can’t add your own photos yet, but the interface is everything that they promised. I can’t wait to be able to use this on my own photos.

I think it will change the way I use my camera. With this interface I will still have to compose pictures that stand alone, since each picture is rendered alone on the screen. But I will be much more inclined to randomly shoot around me to try to ensure I have an overlapping montage ready for when photosynth is available on the desktop (assuming it will be). Currently it is only available as an ActiveX control for use in a browser. Does that mean that the control is to link you to images stored centrally? Is this the product that MS hopes will draw people away from Flickr to a service of their own? I hope not. I want this to replace Canon ZoomBrowser EX as the way I get to my images in future.