Month: May 2007

LinqToRdf – SPARQL Query Generation

Progress with the LINQ to RDF Query Provider is continuing apace. I have been pretty tightly focused for the last few days, so I haven’t ad time to post an update. I’ve lately been working on a SPARQL query, which will allow me a much richer palette to play with. Here’s the current version of the LINQ query I’m using to test the RdfSparqlQuery class.

[TestMethod]
public void SparqlQuery()
{
    string urlToRemoteSparqlEndpoint = @http://someUri;
    TripleStore ts = new TripleStore();
    ts.EndpointUri = urlToRemoteSparqlEndpoint;
    ts.QueryType = QueryType.RemoteSparqlStore;
    IRdfQuery<Track> qry = new RDF(ts).ForType<Track>(); 
    var q = from t in qry
        where t.Year == 2006 &&
        t.GenreName == "History 5 | Fall 2006 | UC Berkeley" 
        select new {t.Title, t.FileLocation};
    foreach(var track in q){
        Trace.WriteLine(track.Title + ": " + track.FileLocation);
    }        
}

And here’s what it’s producing as output:

@prefix a: <http://aabs.purl.org/ontologies/2007/04/music#> .

SELECT ?Title, ?FileLocation
WHERE {
?Track <
http://aabs.purl.org/ontologies/2007/04/music#title> ?Title .
?Track <
http://aabs.purl.org/ontologies/2007/04/music#artistName> ?ArtistName .
?Track <
http://aabs.purl.org/ontologies/2007/04/music#albumName> ?AlbumName .
?Track <
http://aabs.purl.org/ontologies/2007/04/music#year> ?Year .
?Track <
http://aabs.purl.org/ontologies/2007/04/music#genreName> ?GenreName .
?Track <
http://aabs.purl.org/ontologies/2007/04/music#comment> ?Comment .
?Track <
http://aabs.purl.org/ontologies/2007/04/music#fileLocation> ?FileLocation .
?Track <
http://aabs.purl.org/ontologies/2007/04/music#rating> ?Rating .
 FILTER {
    ((t.Year)=(2006^^xsdt:int))&&((t.GenreName)=(“History 5 | Fall 2006 | UC Berkeley”^^xsdt:string))
  }
}

as you can see, the query is fairly close. I just have to:

  • Convert the property names in the filters to use the free variable name chosen for the object of a property assertion.
  • Restrict the variables enumerated in the graph definition to just those required for the projection (if there is a projection)
  • Use the UNION operator to allow disjunctions (OrElse expression types)
  • Add the XSDT (XML Schema Datatypes) namespace to the prefixes
  • Use standard SPARQL variable syntax for the ?Track

I also need to find a fast back-end SPARQL enabled triple store that I can start to run these queries against. A month or two ago I did a .NET conversion of Jena using IKVM. I may end up going back to that. Any other suggestions would be welcome.

If you want to have a play with it, or take a look at how to produce a LINQ Query provider, then you can use Subversion to get the code from Google. Use the command below:

svn checkout http://linqtordf.googlecode.com/svn/trunk/ linqtordf

Windows Live Writer Beta 2, VS Paste and WordPress

Here’s an example I pasted from LinqToRdf.

private void GenerateBinaryExpression(Expression e, string op)
{
    if (e == null)
        throw new ArgumentNullException("e was null");
    if (op == null)
        throw new ArgumentNullException("op was null");
    if (op.Length == 0)
        throw new ArgumentNullException("op.Length was empty");
    BinaryExpression be = e as BinaryExpression;
    if (be != null)
    {
        QueryAppend("(");
        Dispatch(be.Left);
        QueryAppend(")"+op+"(");
        Dispatch(be.Right);
        QueryAppend(")");
        Log("+ :{0} Handled", e.NodeType);
    }
}

If this displays properly for you, then I am a happy man.

Creating A LINQ Query Provider

As promised last time, I have extended the query mechanism of my little application with a LINQ Query Provider. I based my initial design on the method published by Bart De Smet, but have extended that framework, cleaned it up and tied it in with the original object deserialiser for SemWeb (a semantic web library by Joshua Tauberer).

In this post I’ll give you some edited highlights of what was involved. You may recal that last post I provided some unit tests that i was working with. For the sake of initial simplicity (and to make it easy to produce queries with SemWeb’s GraphMatch algorithm) I restricted my query language to make use of Conjunction, and Equality. here’s the unit test that I worked with to drive the development process. What I produced last time was a simple scanner that went through my podcasts extracting metadata and creating objects of type Track.

[TestMethod]
public void QueryWithProjection()
{
CreateMemoryStore();
IRdfQuery<Track> qry = new RdfContext(store).ForType<Track>();
var q = from t in qry
where t.Year == 2006 &&
t.GenreName == "History 5 | Fall 2006 | UC Berkeley"
select new {t.Title, t.FileLocation};
foreach(var track in q){
Trace.WriteLine(track.Title + ": " + track.FileLocation);
}
}

This method queries the Tracks collection in an in-memory triple store loaded from a file in N3 format. It searches for any UC Berkley pod-casts produced in 2006, and performs a projection to create a new anonymous type containing the title and location of the files.

I took a leaf from the book of LINQ to SQL to crate the query object. In LINQ to SQL you indicate the type you are working with using a Table<T> class. In my query context class, you identify the type you are working with using a ForType<T>() method. this method instantiates a query object for you, and (in future) will act as an object registry to keep track of object updates.

The RDFContext class is very simple:

public class RdfContext : IRdfContext
{
public Store Store
{
get { return store; }
set { store = value; }
}
protected Store store;
public RdfContext(Store store)
{
this.store = store;
}
public void AcceptChanges()
{
throw new NotImplementedException();
}
public IRdfQuery<T> ForType<T>()
{
return new RdfN3Query<T>(store);
}
}

As you can see, it is pretty bare at the moment. It maintains a reference to the store, and instantiates query objects for you. But in future this would be the place to create transactional support, and perhaps maintain connections to triple stores. By and large, though, this class will be pretty simple in comparison to the query class that is to follow.

I won’t repeat all of what Bart De Smet said in his excellent series of articles on the production of LINQ to LDAP. I’ll confine myself to this implementation, and how it works. So we have to start by creating our Query Object:

public class RdfN3Query<T> : IRdfQuery<T>
{
public RdfN3Query(Store store)
{
this.store = store;
this.originalType = typeof (T);
parser = new ExpressionNodeParser<T>();
}

First it stores a reference to the triple store for later use. In a more real world implementation this might be a URL or connection string. But for the sake of this implementation, we can be happy with the Memory Store that is used in the unit test. next we keep a record of the original type that is being queried against. this is important because later on you may also be dealing with a new anonymous type that will be created by the projection. This will not have any of the Owl*Attribute classes with which to work out URLs for properties and to perform deserialisation.

The two most important methods in IQueryable<T> are CreateQuery and GetEnumerable. CreateQuery is the place where LINQ feeds you the expression tree that it has built from your initial query. You must parse this expression tree and store the resultant query somewhere for later use. I created a string called query to keep that in, and created a class called ExpressionNodeParser to walk the expression tree to build tyhe query string. This is equivalent to the stage where the SQL SELECT query gets created in DLINQ. My CreateQuery looks like this:

public IQueryable<TElement> CreateQuery<TElement>(Expression expression)
{
RdfN3Query<TElement> newQuery = new RdfN3Query<TElement>(store);
newQuery.OriginalType = originalType;
newQuery.Project = project;
newQuery.Properties = properties;
newQuery.Query = Query;
newQuery.Logger = logger;
newQuery.Parser = new ExpressionNodeParser<TElement>(new
StringBuilder(parser.StringBuilder.ToString()));
MethodCallExpression call = expression as MethodCallExpression;
if (call != null)
{
switch (call.Method.Name)
{
case "Where":
Log("Processing the where expression");
newQuery.BuildQuery(call.Parameters[1]);
break;
case "Select":
Log("Processing the select expression");
newQuery.BuildProjection(call);
break;
}
}
return newQuery;
}

You create new query because you may be doing a projection, in which case the type you are enumerating over will not be the original type that you put into ForType<T>(). Instead it may be the anonymous type from the projection. You transfer the vital information over to the new Query object, and then handle the expression that has been passed in. I am handling two methods here: Where and Select. There are others I could handle, such as OrderBy or Take, but that will have to be for a future post.

Where is the part where the expression representing the query is passed in. Select is passed the tree representing the projection (if there is one). The work is passed off to BuildQuery and BuildProjection accordingly. these names were gratefully stolen from LINQ to LDAP.

BuildQuery in LINQ to LDAP is a fairly complicated affair, but in LINQ to RDF I have paired it right downb to the bone.

private void BuildQuery(Expression q)
{
StringBuilder sb = new StringBuilder();
ParseQuery(q, sb);
Query = Parser.StringBuilder.ToString();
Trace.WriteLine(Query);
}

We create a StringBuilder that can be passed down into the recursive descent tree walker to gather the fragments of the query as each expression gets parsed. the result is then stored in the Query property of the Query object. BuildProjection looks like this:

private void BuildProjection(Expression expression)
{
LambdaExpression le = ((MethodCallExpression)expression).Parameters[1] as
LambdaExpression;
if (le == null)
throw new ApplicationException("Incompatible expression type found when building a projection");
project = le.Compile();
MemberInitExpression mie = le.Body as MemberInitExpression;
if (mie != null)
foreach (Binding b in mie.Bindings)
FindProperties(b);
else
foreach (PropertyInfo i in originalType.GetProperties())
properties.Add(i.Name, null);
}

Much of it is taken directly from LINQ to LDAP. I have adapted it slightly because I am targeting the May 2007 CTP of LINQ. I’ve done this only because I have to use VS 2005 during the day, so I can’t use the March 2007 version of Orcas.

ParseQuery is used by BuildQuery to handle the walking of the expression tree. Again that is very simple since most of the work is now done in ExpressionNodeParser. It looks like this:

private void ParseQuery(Expression expression, StringBuilder sb)
{
Parser.Dispatch(expression);
}

Parser.Dispatch is a gigantic switch statement that passes off the expression tree to handler methods:

public void Dispatch(Expression expression)
{
switch (expression.NodeType)
{
case ExpressionType.Add:
Add(expression);
break;
case ExpressionType.AddChecked:
AddChecked(expression);
break;
case ExpressionType.And:
And(expression);
break;
case ExpressionType.AndAlso:
AndAlso(expression);
//...

Each handler method then handles the root of the expression tree, breaking it up and passing on what it can’t handle itself. For example, the method AndAlso just takes the left and right side of the operator and recursively dispatches them:

public void AndAlso(Expression e)
{
BinaryExpression be = e as BinaryExpression;
if (be != null)
{
Dispatch(be.Left);
Dispatch(be.Right);
}
}

The equality operator is the only operator that currently gets any special effort.

public void EQ(Expression e)
{
BinaryExpression be = e as BinaryExpression;
if (be != null)
{
MemberExpression me = be.Left as MemberExpression;
ConstantExpression ce = be.Right as ConstantExpression;
QueryAppend(tripleFormatStringLiteral,
InstancePlaceholderName,
OwlClassSupertype.GetPropertyUri(typeof(T),
me.Member.Name),
ce.Value.ToString());
}
MethodCallExpression mce = e as MethodCallExpression;
if (mce != null && mce.Method.Name == "op_Equality")
{
MemberExpression me = mce.Parameters[0] as MemberExpression;
ConstantExpression ce = mce.Parameters[1] as ConstantExpression;
QueryAppend(tripleFormatStringLiteral,
InstancePlaceholderName,
OwlClassSupertype.GetPropertyUri(typeof(T),
me.Member.Name),
ce.Value.ToString());
}
}

The equality expression can be formed either through the use of a binary expression with NodeType.EQ or as a MethodCallExpression on op_Equality for type string. If the handler for the MethodCallExpression spots op_Equality it passes the expression off to the EQ method for it to render instead. EQ therefore needs to spot which type of Node it’s dealing with to know how to get the left and right sides of the operation. In a BinaryExpression there are Right and Left properties, whereas in a MethodCallExpression these will be found in a Parameters collection. In our example they get the same treatment.

You’ll note that we assume that the left operand is a MemberExpression and the right is a ConstantExpression. That allows us to form clauses like this:

where t.Year == 2006

but it would fail on all of the following:

where t.Name.ToUpper() == "SOME STRING"
where t.Name == t.Other
where t.Year.ToString() == "2006"

Each of these cases will have to be handled individually, so the number of cases we need to handle can grow. As Bart De Smet pointed out, some of the operations might have to be performed after retrieval of the results since semantic web query languages are unlikely to have complex string manipulation and arithmetic functions. Or at least, not yet.

The QueryAppend forms an N3 Triple out of its parameters and appends it to the StringBuilder that was passed to the Parser initially. At the end of the recursive tree walk, this string builder is harvested and preprocessed to make it ready to pass to the triple store. In my previous post I described an ObjectDeserialisationsSink that was passed to SemWeb during the query process to harvest the results. This has been reused to gather the results of the query from within our query.

I mentioned earlier that the GetEnumerator method was important to IQueryable. An IQueryable is a class that can defer execution of its query till someone attempts to enumerate its results. Since that’s done using GetEnumerator the query must be performed in GetEnumerator. My implementation of GetEnumerator looks like this:

IEnumerator<T> IEnumerable<T>.GetEnumerator()
{
if (result != null)
return result.GetEnumerator();
query = ConstructQuery();
PrepareQueryAndConnection();
PresentQuery(query);
return result.GetEnumerator();
}

result is the List<TElement> variable where I cache the results for later use. What that means is that the query only gets run once. Next time the GetEnumerator gets called, result is returned directly. This reduces the cost of repeatedly enumerating the same query. Currently the methods ConstructQuery, PrepareQueryAndConnection, and PresentQuery are all fairly simple affairs that exist more as placeholders so that I can reuse much of this code for a LINQ to SPARQL implementation that is to follow.

As you’ve probably worked out, there is a huge amount of detail that has to be attended to, but the basic concepts are simple. the reason why more people haven’t written LINQ query providers before now is simply that fact that there is no documentation about how to do it. When you try though, you may find it easier than you thought.

There is a great deal more to do to LINQ to RDF before something it is ready for production use, but as a proof of concept that semantic web technologies can be brought into the mainstream it serves well. Thereason why we use ORM systems such as LINQ to SQL is to help us overcome the Impedance Mismatch that exists between the object and relational domains. An equally large mismatch exists between the Object and Semantic domains. tools like LINQ to RDF will have to overcome the mismatch in order for them to be used outside of basic domain models.

Using RDF and C# to create an MP3 Manager – Part 3

Last time I hurriedly showed you how you can perform the next step of converting a triple store into an ORM system of sorts. The purpose of all this activity, and the reason I left off blogging about LINQ was that I am working on a system to allow me to use LINQ with a triple store and reasoner. The benefit of doing so is that we should have a lot of the benefits of the modern relational world with the power and connectivity of the new world of the semantic web. In this post I shall outline the steps required to start working with LINQ to RDF (as I’ve chosen to call it through lack of imagination).

I’ve been using test driven development throughout, so I already have a few ‘integration’ unit tests to show you:

[TestMethod]
public void Query()
{
  string urlToRemoteSparqlEndpoint = "http://localhost/MyMusicService/SparqlQuery.ashx";
  RdfContext<Track> ctx = new RdfSparqlContext<Track>(urlToRemoteSparqlEndpoint);
  var titles = from t in ctx
    where t.Year > 1998 &&
    t.GenreName == "Ambient Techno" ||
    t.GenreName == "Chillout"
    select t.Title;
  foreach(string title in titles)
    Console.WriteLine(title);
}

In English, this means that rather than manipulating a local triple store, I want the RdfSparqlContext to compose a SPARQL query and present it to the query endpoint found at location urlToRemoteSparqlEndpoint. I then want it to deserialise the results returned and store them in titles. This is a nice mixture of new features from .NET 3.5, combined with some of the features I’ve already developed for object deserialisation.

With the example above I am concerned more with the querying aspect of LINQ (it being a query language n’ all!) but that is not much use to me in the world of transactional web development where I find myself mired for the moment, so we need full CRUD behaviour from this system. Here’s a unit test for object update.

[TestMethod]
public void Update()
{
  string urlToRemoteSparqlEndpoint = @"http://localhost/MyMusicService/SparqlQuery.ashx";
  RdfContext<Track> ctx = new RdfSparqlContext<Track>(urlToRemoteSparqlEndpoint);
  var q = from t in ctx
    where t.Year > 1998 &&
    t.GenreName == "Ambient Techno" ||
    t.GenreName == "Chillout"
    select t;
  foreach (Track t in q)
    t.Rating = 5;
  ctx.AcceptChanges();
}

Here, I’m getting a bunch of objects back from the triple store, modifying their Rating property and then asking for those changes to be stored. This follows the usage patterns for LINQ to SQL.

To satisfy these unit tests (plus ones for the rest of the CRUD behaviour and with support for N3, in-memory and RDBMS based local triple stores is what I’m aiming to complete – eventually. Here’s the general scheme for implementing querying using LINQ. It assumes that RDF data structures are taken unmodified from the SemWeb library.

  • Create a Query Object (I guess this would be our RdfContext class above)
  • Implement IQueryable<T> on it.
  • When the enumerable is requested, convert the stored expression tree into the target language
  • Present it to whatever store or endpoint is available,
  • Deserialise the results into objects
  • Yield the results (probably as they are being deserialised)

This is a very broad outline of the tasks. I’ll explain in a lot more depth in subsequent posts, as I tackle each step.

Using RDF and C# to create an MP3 Manager – Part 2

I’ve been off the air for a week or two – I’ve been hard at work on the final stages of a project at work that will go live next week. I’ve been on this project for almost 6 months now, and next week I’ll get a well earned rest. What that means is I get to do some dedicated Professional Development (PD) time which I have opted to devote to Semantic Web technologies. That was a hard sell to the folks at Readify, what with Silverlight and .NET 3 there to be worked on. I think I persuaded them that consultancies without SW skills will be at a disadvantage in years to come.

Anyway, enough of that – onto the subject of the post, which is the next stage of my mini-series about using semantic web technologies in the production of a little MP3 file manager.

At the end of the last post we had a simple mechanism for serialising objects into a triple store, with a set of services for extracting relevant information out of an object, and to tie it to predicates defined in on ontology. In this post I will show you the other end of the process. We need to be able to query against the triple store and get a collection of objects back.

The query I’ll show you is very simple, since the main task for this post is object deserialisation, once we can shuttle objects in and out of the triple store then we can focus on beefing up the query process.

Querying the triple store

For this example I just got a list of artists for the user and allowed them to select one. That artist was then fed into a graph match query in SemWeb, to bring back all of the tracks whose artist matches the one chosen.

The query works in the usual way – get a connection to the data store, create a query, present it and reap the result for conversion to objects:

private IList<Track> DoSearch()
{
  MemoryStore ms = Store.TripleStore;
  ObjectDeserialiserQuerySink<Track> sink = new ObjectDeserialiserQuerySink<Track>();
  string qry = CreateQueryForArtist(artists[0].Trim());
  Query query = new GraphMatch(new N3Reader(new StringReader(qry)));
  query.Run(ms, sink);
  return tracksFound = sink.DeserialisedObjects;
}

We’ll get on the the ObjectDeserialiserQuerySink in a short while. The process of creating the query is really easy, given the simple reflection facilities I created last time. I’m using the N3 format for the queries, for the sake of simplicity – we could just as easily used SPARQL. We start with a prefix string to give us a namespace to work with, we then enumerate the persistent properties of the Track type. For each property we then insert a triple meaning “whatever track is select – get its property as well”. Lastly, we add the artist name ass a known fact, allowing us to specify exactly what tracks we were talking about.

private static string CreateQueryForArtist(string artistName)
{
  string queryFmt = "@prefix m: <http: aabs.purl.org/ontologies/2007/04/music#> .\n";
  foreach (PropertyInfo info in OwlClassSupertype.GetAllPersistentProperties(typeof(Track)))
  {
    queryFmt += string.Format("?track <{0}> ?{1} .\n", OwlClassSupertype.GetPropertyUri(typeof(Track), info.Name), info.Name);
  }
  queryFmt += string.Format("?track <{0}> \"{1}\" .\n", OwlClassSupertype.GetPropertyUri(typeof(Track), "ArtistName"), artistName);
  return queryFmt;
}

Having created a string representation of the query we’re after we pass it to a GraphMatch object, which is a kind of query were you specify a graph that is a kind of prototype for the structure of the results desired. I also created a simple class called ObjectDeserialiserQuerySink:

public class ObjectDeserialiserQuerySink<T> : QueryResultSink where T : OwlClassSupertype, new()
{
  public List<T> DeserialisedObjects
  {
  get { return deserialisedObjects; }
  }
  private List<T> deserialisedObjects = new List<T>();
  public ObjectDeserialiserQuerySink()
  {
  }
  public override bool Add(VariableBindings result)
  {
    T t = new T();
    foreach (PropertyInfo pi in OwlClassSupertype.GetAllPersistentProperties(typeof(T)))
    {
      try
      {
        string vn = OwlClassSupertype.GetPropertyUri(typeof (T), pi.Name).Split('#')[1];
        string vVal = result[pi.Name].ToString();
        pi.SetValue(t, Convert.ChangeType(vVal, pi.PropertyType), null);
      }
      catch (Exception e)
      {
        Debug.WriteLine(e);
        return false;
      }
    }
    DeserialisedObjects.Add(t);
    return true;
  }
}

For each match that the reasoner is able to find, a call gets made to the Add method of the deserialiser with a set of VariableBindings. Each of the variable bindings corresponds to solutions of the free variables defined in the query. Since we generated the query out of the persistent properties on the Track type the free variables matched will also correspond to the persistent properties of the type. What that means is that it is a straightforward job to deserialise a set of VariableBindings into an object.

That’s it. We now have a simple triple store that we can serialise objects into and out of, with an easy persistence mechanism. But there’s a lot more that we need to do. Of the full CRUD behaviour I have implemented Create and Retrieve. That leave Update and Delete. As we saw in one of my previous posts, that will be a mainly manual programmatical task since semantic web ontologies are to a certain extend static. What that means is that they model a domain as a never changing body of knowledge about which we may deduce more facts, but where we can’t unmake (delete) knowledge.

The static nature of ontologies seems like a bit of handicap to one who deals more often than not with transactional data – since it means we need more than one mechanism for dealing with data – deductive reasoningh, and transactional processing. With the examples I have given up till now I have been dealing with in-memory triple stores where the SemWeb API is the only easy means of updating and deleting data. When we are dealing with a relational database as our triple store, we will have the option to exploit SQL as another tool for managing data.

Powered by ScribeFire.

An Ominous Blog Post

Normally I have a very optimistic outlook. Especially when it comes to technological breakthroughs. But this morning I was given pause for thought. MAKE magazine carried a news article today about a highly accurate DNA replicator for $10. I am fully convinced that such breakthroughs can be used to tackle the issues of world poverty, but I’ve just finished reading Tomorrow’s War by David Shukman. It was written over 10 years ago but was gloomy, even then about our chances of controlling the proliferation of expertise in the production of WMD, and it was written before 9/11.

At this rate, the techniques and the resources for biological weapons development will be freely available, but the skills needed to combat them will not. I just think of the irresponsibility of computer hackers and virus writers – who often wreak havoc without any thought of the costs or consequences. If such power can be unleashed in the real world, then we are in way more danger than we ever were during the cold war.

That is a doomsday scenario, if you ask me.