Month: April 2007

Using RDF and C# to create an MP3 Manager – Part 1

This article follows on from the previous post about semantic web applications in C#. I’ll be using the SemWeb framework again, but this time I chose to demonstrate the capabilities of RDF by producing a simple MP3 file manager. I haven’t completed it yet, and I’ll be working on that over the next few days to show you just how easy RDF/OWL is to work with in C# these days.

The program is pretty simple – I was inspired to write it by the RDF-izers web site, where you can find conversion tools to producing RDF from a variety of different data sources. While I was playing with LINQ I produced a simple file tagging system – I simply scanned the file system extracting as much metadata as I could from the files that I found, adding them to a tag database that I kept in SQL server. Well this isn’t much different. I just extract ID3 metadata tags from the MP3 files I find and store them in Track objects. I then wrote a simple conversion system to extract RDF URIs from the objects I’d created for insertion into an in-memory triple store. All-up it took about 3-4 hours including finding a suitable API for ID3 reading. I won’t show (unless demanded to) the code for the test harness or for iterating the file system. Instead I’ll show you the code I wrote for persisting objects to an RDF store.

First up, we have the Track class. I’ve removed the vanilla implementation of the properties for the sake of brevity.

[OntologyBaseUri("file:///C:/dev/prototypes/semantic-web/src/Mp3ToRdf/")]
[OwlClass("Track", true)]
public class Track : OwlInstanceSupertype
{
[OwlProperty("title", true)]
public string Title /* ... */
[OwlProperty("artistName", true)]
public string ArtistName /* ... */
[OwlProperty("albumName", true)]
public string AlbumName /* ... */
[OwlProperty("year", true)]
public string Year /* ... */
[OwlProperty("genreName", true)]
public string GenreName /* ... */
[OwlProperty("comment", true)]
public string Comment /* ... */
[OwlProperty("fileLocation", true)]
public string FileLocation /* ... */

private string title;
private string artistName;
private string albumName;
private string year;
private string genreName;
private string comment;
private string fileLocation;

public Track(TagHandler th, string fileLocation)
{
this.fileLocation = fileLocation;
title = th.Track;
artistName = th.Artist;
albumName = th.Album;
year = th.Year;
genreName = th.Genere;
comment = th.Comment;
}
}

Nothing of note here except for the presence of a few all-important attributes which are used to give the persistence engine clues about how to generate URIs for the class, its properties and their values. Obviously this is a rudimentary implementation, so we don’t have lots of extra information about XSD types and versions etc. But for the sake of this illustration I’m sure you get the idea, that we can do for RDF pretty much what LINQ to SQL does for relational databases.

The attribute classes are also very simple:

[AttributeUsage(AttributeTargets.Class | AttributeTargets.Struct | AttributeTargets.Property)]
public class OwlResourceSupertypeAttribute : Attribute
{
public string Uri
{
get { return uri; }
}
private readonly string uri;
public bool IsRelativeUri
{
get { return isRelativeUri; }
}
private readonly bool isRelativeUri;
public OwlResourceSupertypeAttribute(string uri)
: this(uri, false){}
public OwlResourceSupertypeAttribute(string uri, bool isRelativeUri)
{
this.uri = uri;
this.isRelativeUri = isRelativeUri;
}
}

[AttributeUsage(AttributeTargets.Class | AttributeTargets.Struct | AttributeTargets.Property)]
public class OwlClassAttribute : OwlResourceSupertypeAttribute
{
public OwlClassAttribute(string uri)
: base(uri, false){}
public OwlClassAttribute(string uri, bool isRelativeUri)
: base(uri, isRelativeUri){}
}

[AttributeUsage(AttributeTargets.Property)]
public class OwlPropertyAttribute : OwlResourceSupertypeAttribute
{
public OwlPropertyAttribute(string uri)
: base(uri, false){}
public OwlPropertyAttribute(string uri, bool isRelativeUri)
: base(uri, isRelativeUri){}
}

[AttributeUsage(AttributeTargets.Class | AttributeTargets.Struct)]
public class OntologyBaseUriAttribute : Attribute
{
public string BaseUri
{
get { return baseUri; }
}
private string baseUri;
public OntologyBaseUriAttribute(string baseUri)
{
this.baseUri = baseUri;
}
}

OwlResourceSupertypeAttribute is the supertype of any attribute that can be related to a resource in an ontology – that is anything that has a URI. As such it has a Uri property, and in addition it has an isRelativeUri property which indicates whether the URI is relative to a base URI defined elsewhere. Although I haven’t implemented my solution that way yet, this is intended to allow the resources to reference a base namespace definition in the triple store or in an RDF file. The OwlClassAttribute extends the OwlResourceSupertype restricting its usage to classes or structs. You use this (or the parent type if you want) to indicate the OWL class URI that the type will be persisted to. So for the Track class we have an OWL class of “Track”. In an ontology that Track will be relative to some URI, which I have defined using the OntologyBaseUriAttribute. That attribute defines the URI of the ontology the Class and Property URIs are relative to in this example (i.e. “file:///C:/dev/prototypes/semantic-web/src/Mp3ToRdf/“).

For each of the properties of the class Track I have defined another sublass of OwlResourceSupertype called OwlPropertyAttribute that is restricted solely to Property members. Another simplification that I have introduced is that I am not distinguishing between ObjectProperties and DatatypeProperties, which OWL does. That would not be hard to add, and I’m sure I’ll have to over the next few days.

So, I have now annotated my class to tell the persistence engine how to produce statements that I can add to the triple store. These annotations can be read by the engine and used to construct appropriate URIs for statements. We still need a way to construct instances in the ontology. I’ve done that in a very simple way – I just keep a counter in the scanner, and I create an instance URI out of the Class Uri by adding the counter to the end. So the first instance will be “file:///C:/dev/prototypes/semantic-web/src/Mp3ToRdf/Track_1” and so on. This is simple, but would need to be improved upon for any serious applications.

Next I need to reflect over an instance of class Track to get a set of statements that I can add to the triple store. For this I have exploited the extension method feature of C# 3.5 (May CTP) which allows me to write code like this:
foreach (Track t in GetAllTracks(txtFrom.Text))
{
t.InstanceUri = GenTrackName(t);
store.Add(t);
}

The triple store is called store, GetAllTracks is an iterator that filters the files under the directory indicated by whatever is in txtFrom.Text. GenTrackName creates the instance URI for the track instance. I could have used a more sophisticated scheme using hashes from the track location or somesuch, but I was in a rush ;-). The code the the persistence engine is easy as well:

public static class MemoryStoreExtensions
{
public static void Add(this MemoryStore ms, OwlInstanceSupertype oc)
{
Debug.WriteLine(oc.ToString());
Type t = oc.GetType();
PropertyInfo[] pia = t.GetProperties();
foreach (PropertyInfo pi in pia)
{
if(IsPersistentProperty(pi))
{
AddPropertyToStore(oc, pi, ms);
}
}
}
private static bool IsPersistentProperty(PropertyInfo pi)
{
return pi.GetCustomAttributes(typeof (OwlPropertyAttribute), true).Length > 0;
}
private static void AddPropertyToStore(OwlInstanceSupertype track, PropertyInfo pi, MemoryStore ms)
{
Add(track.InstanceUri, track.GetPropertyUri(pi.Name), pi.GetValue(track, null).ToString(), ms);
}
public static void Add(string s, string p, string o, MemoryStore ms)
{
if(!Empty(s) && !Empty(p) && !Empty(o))
ms.Add(new Statement(new Entity(s), new Entity(p), new Literal(o)));
}
private static bool Empty(string s)
{
return (s == null || s.Length == 0);
}
}

Add is the extension method which iterates the properties on the class OwlInstanceSupertype. OwlInstanceSupertype is a supertype of all classes that can be persisted to the store. As you can see, it gets all properties and checks each on to see whether it has the OwlPropertyAttribute. If it does, then it gets persisted using AddPropertyToStore. AddPropertyToStore creates URIs for the subject (the track instance in the store), the predicate (the object property on class Track) and the object property (which is a string literal containing the value of the property). That statement gets added by the private Add method, which just mirrors the Add API on the MemoryStore itself.

And that’s it. Almost. The quick and dirty ontology I defined for the music tracks looks like this:

@prefix rdf: .
@prefix daml: .
@prefix log: .
@prefix rdfs: .
@prefix owl: .
@prefix xsdt: .
@prefix : .

:ProducerOfMusic a owl:Class.
:SellerOfMusic a owl:Class.
:NamedThing a owl:Class.
:TemporalThing a owl:Class.
:Person a owl:Class;
owl:subClassOf :NamedThing.
:Musician owl:subClassOf :ProducerOfMusic, :Person.
:Band a :ProducerOfMusic.
:Studio a :SellerOfMusic, :NamedThing.
:Label = :Studio.
:Music a owl:Class.
:Album a :NamedThing.
:Track a :NamedThing.
:Song a :NamedThing.
:Mp3File a owl:Class.
:Genre a :NamedThing.
:Style = :Genre.
:title
rdfs:domain :Track
rdfs:range xsdt:string.
:artistName
rdfs:domain :Track
rdfs:range xsdt:string.
:albumName
rdfs:domain :Track
rdfs:range xsdt:string.
:year
rdfs:domain :Album
rdfs:range xsdt:integer.
:genreName
rdfs:domain :Track
rdfs:range xsdt:string.
:comment
rdfs:domain :Track
rdfs:range xsdt:string.
:isTrackOn
rdfs:domain :Track
rdfs:range :Album.
:fileLocation
rdfs:domain :Track
rdfs:range xsdt:string.

When I run it over my podcasts, the output persisted to N3 looks like this:

<file:///C:/dev/prototypes/semantic-web/src/Mp3ToRdf/Track_1> <file:///C:/dev/prototypes/semantic-web/src/Mp3ToRdf/title> “History 5 | Fall 2006 | UC Berkeley” ; <file:///C:/dev/prototypes/semantic-web/src/Mp3ToRdf/artistName> “Thomas Laqueur” ;
<file:///C:/dev/prototypes/semantic-web/src/Mp3ToRdf/albumName> “History 5 | Fall 2006 | UC Berkeley” ;
<file:///C:/dev/prototypes/semantic-web/src/Mp3ToRdf/year> “2006” ;
<file:///C:/dev/prototypes/semantic-web/src/Mp3ToRdf/genreName> “History 5 | Fall 2006 | UC Berkeley” ;
<file:///C:/dev/prototypes/semantic-web/src/Mp3ToRdf/comment> ” (C) Copyright 2006, UC Regents” ;
<file:///C:/dev/prototypes/semantic-web/src/Mp3ToRdf/fileLocation> “C:\\Users\\andrew.matthews\\Music\\hist5_20060829.mp3″ .

You can see how the URIs have been constructed from the base URI, and the properties are all attached to instance Track_1. Next on the list will probably be a bit of cleaning up to use a prefix rather than this longhand URI notation, then I’ll show you how to query the store to slice and dice your music collections every which way.

Keeping up

It used to be that you could keep abreast with the news by reading the Sunday morning newspapers, or at least that’s how it seemed. Now I find that I scan several hundred RSS feeds to track all the news I care about. The birth of the Blogosphere is a mixed blessing to anyone with a life. You have to fit your surfing and professional development in around everything else. According to Google’s analysis of my reading patterns, I seem to be doing most of that after dark:

It’s an unhealthy pattern, which I picked up in university – I feel guilty if I’m not doing something useful. After I left university I tried to shed the habit, but I never seem to have done it successfully for very long. I have to force myself to sit in front of the box. Last century anthropologists remarked about how driven we all seem in comparison to the leisurely lifestyle of indigenous peoples. We’ll, I wonder what they would make of a pattern like this? People feel that in order to keep up they must give up on sleep. There doesn’t seem to be much room for plain old fun in this pattern eh? In Readify, they classify people in this category as night-programmers – and they regard them a more recruitable for obvious reasons. I can’t help think it’s a euphemism for hopeless workaholics. Not that I mind that much – I get more of a sense of achievement from an evening programming than I do from a night in front of the TV, but I still wonder.

I gotta do something about this. Something’s gotta give – but what?

A simple semantic web application in C#

The latest update of the SemWeb library from Josh Tauberer includes a C# implementation of the Euler reasoner. This reasoner is able to go beyond simplistic RDFS reasoning – being able to navigate the class and property relationships – to make use of rules. The ontology I’ve been using to get used to coding in the framework models a simple state machine. The ontology couldn’t be simpler. Here’s an N3 file that declares the classes and their relationships.


@prefix daml: .
@prefix rdfs: .
@prefix owl: .
@prefix : .

#Classes
:State a owl:Class;
daml:comment "states the system can be in";
daml:disjointUnionOf ( :S1 :S2 :S3 ).

:InputToken a owl:Class;
daml:comment "inputs to the system";
daml:disjointUnionOf ( :INil :I1 :I2 ).

:Machine a owl:Class.
:System a owl:Class.

#properties
:isInState
rdfs:domain :Machine;
rdfs:range :State;
owl:cardinality "1".

:hasInput
rdfs:domain :System;
rdfs:range :InputToken;
owl:cardinality "1".

#Instances
:Machine1
a :Machine;
:isInState :S1.

:This a :System;
:hasInput :INil.

As with any deterministic finite state machine, there are two key classes at work here. :State and :InputToken. State is a disjoint union of :S1, :S2 and :S3. That means that :S1 is not an :S2 or an :S3. If you don’t specify such a disjunction, the reasoners cannot assume it – if there is no rule that says they are disjoint, the reasoner won’t be able to assume they’re different – just because Machine1 is in state S1, doesn’t mean it isn’t potentially in state S2 as well. You have to tell it that an S1 is not an S2. Pedantry is all-important in ontology design, and while I have gained a fair measure of it over the years as a defensive programmer I was shocked at the level of semantic support I get from the programming languages I use. OWL provides you with such a rich palette to work with, but less conventional support. It is kind of liberating to be designing class libraries in OWL vs. OO languages. Kind of like when you go from DOS to Bash.

Anyway, the rules for this little ontology define a transition table for the state machine:


@prefix log: .
@prefix rdfs: .
@prefix owl: .
@prefix : .

# ~>
{ :Machine1 :isInState :S1. :This :hasInput :I1. }
=>
{ :Machine1 :isInState :S1. :This :hasInput :INil. }.

# ~>
{ :Machine1 :isInState :S1. :This :hasInput :I2. }
=>
{ :Machine1 :isInState :S2. :This :hasInput :INil. }.

# ~>
{ :Machine1 :isInState :S1. :This :hasInput :I3. }
=>
{ :Machine1 :isInState :S3. :This :hasInput :INil.}.

I got into problems initially, since I thought about the problem from an imperative programming perspective. I designed it like I was assigning values to variables. That’s the wrong approach – treat this as adding facts to what is already known. So, rather than saying if X, then do Y, think of it as if I know X, then I also know that Y. The program to work with it looks like this:


internal class Program
{
private static readonly string ontologyLocation =
@"C:\dev\prototypes\semantic-web\ontologies\20074\states\";

private static string baseUri = @"file:///C:/dev/prototypes/semantic-web/ontologies/2007/04/states/states.rdf#";
private static MemoryStore store = new MemoryStore();
private static Entity Machine1 = new Entity(baseUri + "Machine1");
private static Entity Input1 = new Entity(baseUri + "I1");
private static Entity Input2 = new Entity(baseUri + "I2");
private static Entity theSystem = new Entity(baseUri + "This");
private static string hasInput = baseUri + "hasInput";
private static string isInState = baseUri + "isInState";

private static void Main(string[] args)
{
InitialiseStore();
DisplayCurrentStates();
SetNewInput(Input2);
DisplayCurrentStates();
}

private static void DisplayCurrentStates()
{
SelectResult ra = store.Select(new Statement(Machine1, new Entity(isInState), null));
Debug.Write("Current states: ");
foreach (Statement resource in ra.ToArray())
{
Debug.Write(resource.Object.Uri);
}
Debug.WriteLine("");
}

private static void InitialiseStore()
{
string statesLocation = Path.Combine(ontologyLocation, "states.n3");
string rulesLocation = Path.Combine(ontologyLocation, "rules.n3");
Euler engine = new Euler(new N3Reader(File.OpenText(rulesLocation)));
store.Import(new N3Reader(File.OpenText(statesLocation)));
store.AddReasoner(engine);
}

private static void SetNewInput(Entity newInput)
{
Resource[] currentInput = store.SelectObjects(theSystem, hasInput);
Statement input = new Statement(theSystem, hasInput, Input1);
store.Remove(new Statement(theSystem, hasInput, currentInput[0]));
store.Add(new Statement(theSystem, hasInput, newInput));
Resource[] subsequentState = store.SelectObjects(Machine1, isInState);
Statement newState = new Statement(Machine1, isInState, subsequentState[0]);
store.Replace(new Statement(Machine1, isInState, null), newState);
}
}

The task was simple – I wanted to set the state machine up in state :S1 with input :INil, then put input :I1 in, and see the state change from :S1 to :S2. In doing this I am trying to do something that is a little at odds with the expressed intention of ontologies. They are more static declarations of a body of knowledge as much as a specification for a dynamically changing body of facts. What that means is that they are additive – the frameworks and reasoners allow you to add to a body of knowledge. That makes reuse and trust possible on the semantic web[1, 2]. If I can take your ontology and change it to mean something other than what you intended then no guarantees can be made about the result. The ontology should stand alone – if you want to base some data on it, that’s up to you, and you will have to manage it. In practical terms, that means you have to manually change the entries for the input and the states as they change. What the ontology adds is a framework for representing the data, and a set of rules for working out what the next state should be. That’s still powerful, but I wonder how well it would scale.

What to notice

There are a few things in here that you should pay very close attention to if you want to write a semantic web application of your own using SemWeb. Firstly, the default namespace definition in the ontology and rules definition files. Generally, the examples of N3 files on the W3C site use the following format to specify the default namespace of a file:

@prefix : <#>

Unfortunately that leaves a little too much room for manoeuvring within SemWeb, and the actual URIs that it will use are a little unpredictable. Generally they are based upon the location that SemWeb got the files from. Instead, choose a URL like so:

@prefix : <http://aabs.purl.org/ontologies/2007/04/states/states.rdf#&gt;.

This was not the location of the file – it was just a URL I was using prior to using N3 format. The point is that you just need to give an unambiguous URL so that the semweb and its reasoner can distinguish resources properly, when you ask it questions. I used the same URL in the rules.n3 file, since most of what I was referring to was defined in the namespace above. I could just as easily have defined a new prefix for states.n3 and prepended all the elements in the rules with that prefix. The point is to have a non-default URL so that semweb is in no doubt about what the URL of the resources are that you are referring to.

Next, remember that you will have to dip into the instances in the store to make manual changes to their state – this is no different from any relational database application. Although, I was disconcerted at first, because I had hoped that the reasoner would make any changes I needed for me. Alas that is not in the spirit of the semantic web apparently, so be prepared to manage the system closely.

I think that the use of OWL ontologies would be very easy for question answering applications, but there may be a little more work required to place a reasoner at the core of your system. Of course, I could be wrong about that, and it could be my incorrigable procedural mindset that is to blame – I will keep you posted. I’ll be posting more on this over the next few weeks, so if there are things you want me to write about, or answers to questions you have, pllease drop me a line or comment here, and I’ll try to come up with answers.

2000 Year Old Computer Recreated

Archaeologists from the University of Cardiff have finally worked out how to put together a complex mechanism that was found in the 2000 year old ruins of a sunken boat off the coast of Greece. The announcement was impressive because this mechanical calculating device had defied researcher’s attempts to understand it for over a hundred years. The device was used to calculate the position of both the moon and the sun during their orbit, and to calculate eclipses with remarkable accuracy. It is also believed that the device was able to predict the positions of the planets as well. This mechanism is orders of magnitude more complicated than anything that we thought the ancients were capable of. We now know that the Greeks were not just great philosophers, but were technologists as well – they were able to turn their cosmological ideas into working models of the solar system. That’s no mean feat in both scientific and engineering terms.

The findings have forced researchers to reassess their ideas about just how technically advanced the ancient Greeks were. If they were able to calculate orbits with such accuracy it implies a level of sophistication that has only been rediscovered in the last few centuries. It makes me consider just how far we fell in the intervening Christian period. The dark ages were not just a period where civilisation took a back-step into an age of ignorance – it was an age so obsessed with religious orthodoxy that no possibility existed for societies to raise themselves out of their benighted state. This discovery highlights just how much was lost during that period. The Greeks had a better understanding of how the world worked than anyone up to the time of Newton and the enlightenment. What could have caused us to lose what must have been a fantastically useful body of knowledge?

The lack of a printing press made it hard to disseminate ideas, but knowledge that valuable wouldn’t have existed in a vacuum. The techniques employed would have been arrived at by groups of artisans or engineers over a long period of research – the knowledge would have been in the possession of many people, not few. So how was it lost? We’ll probably never know. My guess would be that it was at odds with some piece of religious orthodoxy, and was therefore actively ignored or suppressed. But one thing is sure – orthodoxy of any kind (especially the religious kind) is anathema to progress, and can easily lead to a dark age of thousands of years.

We should not be complacent about our secular society’s stability and progress. This ancient computer makes it all too clear that what has been gained can even more easily be lost.