Knowledge Graphs 6 – Semantics

With this installment we finally get to the part of knowledge graphs that I personally find really exciting: Semantics. In this installment, I will introduce some of the simple rules of entailment that are a part of the RDFS standard.

This is part 6 of an ongoing series providing a little background on ‘knowledge graphs‘. The aim is to let software developers get up to speed as fast as possible. No theory, no digressions, and no history. Just practical knowledge.

This is the barest tip of the iceberg of reasoning about data that is possible with RDF. I hope to be able to convey some of the power possible in this technology that are so hard to find anywhere else. I describe how this secret sauce allows us to incrementally build meaning into our data, as our understanding of it grows – another thing that is hard to do with many other popular technologies.

Properties Redux

Remember from last time how to define a class and some properties on it:

:Player a rdfs:Class .
:Team a rdfs:Class .

:playsFor a rdf:Property;
    rdfs:domain :Player;
    rdfs:range  :Team .

And how we can define property hierarchies if we want to:

:worksFor rdfs:domain :Person;
    rdfs:range :Organisation .
:playsFor rdfs:subClassOf :worksFor .

And how to define instances of those classes and properties:

<http://dbpedia.org/resource/George_Best&gt; :playsFor <http://dbpedia.org/page/Manchester_United_F.C.&gt; .

Let’s unpack some of what I said. There are two classes in the default namespace, :Player and :Team, that can be related using the :playsFor property.

I then defined a new property called :worksFor that just links people to organisations. I then said that if a player plays for a team, then they work for the team. Yes, I know there are exceptions to this in the real world, but you get the idea, right? There are people who work for the team that don’t play for it, like coaches and medics etc, so :worksFor is a super-property to :playsFor.

I then used the :playsFor property to link two new resources in our graph; http://dbpedia.org/resource/George_Best and http://dbpedia.org/page/Manchester_United_F.C. both taken from the RDF graph that comes from wikipedia, called dbpedia.

RDFS Entailment

While I was able to capture a little microcosm of the world of soccer, I’m sure you can see that there is more in there if we only had to means to get at it. RDFS provides some of the means to do that. It defines an Entailment Regime for the new property and class building blocks. See here also.

An entailment regime is, in essence, a set of rules for what additional conclusions are valid given some basic initial statements. Often those rules follow one of the familiar syllogism structures: All A are B, x is A, therefore x is B.

Here’s an example of one of the rules, called rdfs11, that describes the transitivity of subclass relationships:

if ( xxx rdfs:subClassOf yyy && yyy rdfs:subClassOf zzz) 
then (xxx rdfs:subClassOf zzz)

Which is another way of saying that it is logically correct to add xxx rdfs:subClassOf zzz to your graph whenever you see xxx rdfs:subClassOf yyy and yyy rdfs:subClassOf zzz in your triple store.

Conveniently, these rules can be converted to SPARQL update statements:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt;
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#&gt;

INSERT
{ 
    GRAPH <http://industrialinference.com/inferred/&gt; {
	?xxx rdfs:subClassOf ?zzz .
    }
}
WHERE
{
    ?xxx rdfs:subClassOf ?yyy .
    ?yyy rdfs:subClassOf ?zzz .
} 

The beauty of this is that, you can query for data that you never anticipated. Here’s a trivial little example to demonstrate. Imagine on day one of your project, you stored data matching the simple schema above.

:bob a :Player .

Initially, you can’t do much with the data other than store and retrieve it. But once you start to annotate your data with further relationships, things can get interesting. Imagine we say later on that a Player is a kind of Person. We don’t need to modify any of our data, just add another triple:

:Player rdfs:subClassOf :Person .

Now, whenever we query for all people, we get back the players as well. I can’t overstate how important this is! Suddenly, we are getting different fuller results because entailment allowed us to deduce new facts from our data.

We didn't change our data, nor our applications or data access code to get this.  All we had to do was supply more details about our types and their relationships.

There is a similar rule to rdfs11 called rdfs5 that applies the same transitive reasoning to property relationships:

if (xxx rdfs:subPropertyOf yyy && yyy rdfs:subPropertyOf zzz )
then (xxx rdfs:subPropertyOf zzz)

which translates in SPARQL like so:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#&gt;

INSERT
{ 
    GRAPH <http://industrialinference.com/inferred/&gt; {
        ?xxx rdfs:subPropertyOf ?zzz .
    }
}
WHERE
{
    ?xxx rdfs:subPropertyOf ?yyy .
    ?yyy rdfs:subPropertyOf ?zzz .
}

When we have rdfs5 in place, and we query who :worksFor Manchester United, we will also get back the players like George Best, in addition to the coaching, management and medical staff.

Another vital pair of rules are rdfs2 and rdfs3. They look like this:

if (aaa rdfs:domain xxx && yyy aaa zzz )
then (yyy rdf:type xxx)

if (aaa rdfs:range xxx && yyy aaa zzz )
then (zzz rdf:type xxx)

Which means that if you know the definitions of the types of resources at either end of a property, you can assign them to the resources mentioned in actual instance data. Remember where I said:

<http://dbpedia.org/resource/George_Best&gt; :playsFor <http://dbpedia.org/page/Manchester_United_F.C.&gt; .

since I defined the property :playsFor like this:

:playsFor a rdf:Property;
    rdfs:domain :Player;
    rdfs:range  :Team .

Then rdfs2/3 allows me to add the following triples to my store:

<http://dbpedia.org/resource/George_Best&gt; a :Player .
<http://dbpedia.org/page/Manchester_United_F.C.&gt; a :Team .

Not only that, but because we defined the :subPropertyOf rule for :worksFor then rdfs5 allows me to add these triples as well:

<http://dbpedia.org/resource/George_Best&gt; :worksFor
    <http://dbpedia.org/page/Manchester_United_F.C.&gt; .
<http://dbpedia.org/resource/George_Best&gt; a :Person .
<http://dbpedia.org/page/Manchester_United_F.C.&gt; a :Organisation .

So, you see there is a lot we can deduce from a bit of schema and a little bit of data. More importantly, you might choose to just store the raw triple saying, george best plays for man united, without any other metadata about what it means.

Later on, you can incrementally add this extra information. As you go, you will find more and more insights start to come out of your data, and you can answer more and more questions. For example, with just the initial raw data, I couldn’t say that George Best was a Person and not a Car or Engine. After defining the meaning of :playsFor I will know all this and more.

Summary

This is the briefest possible introduction to Entailment I could provide. I hope it has shown you that the rules provide meaning to relationships, and that those rules if applied judiciously allow you to get data out that you didn’t put in. They allow you to answer questions that were unanticipated when you put your data in. They allow you to declaratively adorn your raw data with metadata later on, and use that metadata with entailment rules to enrich your data in unforeseen ways.

As I mentioned, this is but the merest whiff of what is possible, and as this series progresses I hope to cover some of the awesomeness that is OWL2, as well as introduce you to inference engines – the systems that can sit in the background applying the rules of entailment for you.

For now, if you want to understand the rules and see how they might be applied, take a look at this little project I knocked up in my spare time. It’s a poor man’s inference engine, but hopefully it shows how you might periodically materialise entailments in your database.

Appendix A – RDFS Entailment Rules

Here’s the full list of entailments for RDFS.

ID If S contains: then S RDFS entails recognizing D:
rdfs1any IRI aaa in Daaa rdf:type rdfs:Datatype .
rdfs2aaa rdfs:domain xxx .
yyy aaa zzz .
yyy rdf:type xxx .
rdfs3aaa rdfs:range xxx .
yyy aaa zzz .
zzz rdf:type xxx .
rdfs4axxx aaa yyy .xxx rdf:type rdfs:Resource .
rdfs4bxxx aaa yyy.yyy rdf:type rdfs:Resource .
rdfs5xxx rdfs:subPropertyOf yyy .
yyy rdfs:subPropertyOf zzz .
xxx rdfs:subPropertyOf zzz .
rdfs6xxx rdf:type rdf:Property .xxx rdfs:subPropertyOf xxx .
rdfs7aaa rdfs:subPropertyOf bbb .
xxx aaa yyy .
xxx bbb yyy .
rdfs8xxx rdf:type rdfs:Class .xxx rdfs:subClassOf rdfs:Resource .
rdfs9xxx rdfs:subClassOf yyy .
zzz rdf:type xxx .
zzz rdf:type yyy .
rdfs10xxx rdf:type rdfs:Class .xxx rdfs:subClassOf xxx .
rdfs11xxx rdfs:subClassOf yyy .
yyy rdfs:subClassOf zzz .
xxx rdfs:subClassOf zzz .
rdfs12xxx rdf:type rdfs:ContainerMembershipProperty .xxx rdfs:subPropertyOf rdfs:member .
rdfs13xxx rdf:type rdfs:Datatype .xxx rdfs:subClassOf rdfs:Literal .