semweb

Quantum Reasoners Hold Key to Future Web

Last year, a company called DWave Systems announced their quantum computer (the ‘Orion’) – another milestone on the road to practical quantum computing. Their controversial claims seem worthy in their own right but they are particularly important to the semantic web (SW) community. The significance to the SW community was that their quantum computer solved problems akin to Grover’s Algorithm speeding up queries of disorderly databases.

Semantic web databases are not (completely) disorderly and there are many ways to optimize the search for matching triples to a graph pattern. What strikes me is that the larger the triple store, the more compelling the case for using some kind of quantum search algorithm to find matches. DWave are currently trialing 128qbit processors, and they claim their systems can scale, so I (as a layman) can see no reason why such computers couldn’t be used to help improve the performance of queries in massive triple stores.

 What I wonder is:

  1. what kind of indexing schemes can be used to impose structure on the triples in a store?
  2. how can one adapt a B-tree to index each element of a triple rather than just a single primary key – three indexes seems extravagant.
  3. are there quantum algorithms that can beat the best of these schemes?
  4. is there is a place for quantum superposition in a graph matching algorithm (to simultaneously find matching triples then cancel out any that don’t match all the basic graph patterns?)
  5. if DWave’s machines could solve NP-Complete problems, does that mean that we would then just use OWL-Full?
  6. would the speed-ups then be great enough to consider linking everyday app data to large scale-upper ontologies?
  7. is a contradiction in a ‘quantum reasoner’ (i.e. a reasoner that uses a quantum search engine) something that can never occur because it just cancels out and never appears in the returned triples? Would any returned conclusion be necessarily true (relative to the axioms of the ontology?)

Any thoughts?

UPDATE
DWave are now working with Google to help them improve some of their machine learning algorithms. I wonder whether there will be other research into the practicality of using DWave quantum computing systems in conjunction with inference engines? This could, of course, open up whole new vistas of services that could be provided by Google (or their competitors). Either way, it gives me a warm feeling to know that every time I do a search, I’m getting the results from a quantum computer (no matter how indirectly). Nice.

Relational Modeling? Not as we know it!

Marcello Cantos commented on my recent post about the ways in which RDF can transcend the object-oriented model. He posed the question of what things RDF can represent more easily than the relational model. I know Marcello is a very high calibre software engineer, so it’s not just an idle question from a relational dinosaur, but a serious question from someone who can push the envelope far with a relational database.

Since an ontology is most frequently defined (in compsci) as a specification of a conceptualization, a relational model is a kind of ontology. That means a relational model is by definition a knowledge representation system. That’d be my answer if I just wanted to sidestep the real thrust of his question; Is the relational model adequate to do what can be done by RDF?

That’s a more interesting question, and I’d be inclined to say everything I said in my previous post about the shortcomings of object oriented programming languages applies equally to the relational model. But lets take another look at the design features of RDF that make it useful for representation of ‘knowledge’.

○ URI based
○ Triple format
○ Extensible
○ Layered
○ Class based
○ Meta-model

URI Based

By using URIs as a token of identification and definition, and by making identifications and definitions readable, interchangeable and reusable the designers of RDF exposed the conceptualisation of the ontology to the world at large. Could you imagine defining a customer in your database as ‘everything in XYZ company’s CRM’s definition of a customer, plus a few special fields of our own‘. It is not practical. Perhaps you might want to say, everything in their database less some fields that we’re not interested in. Again – not possible. Relational models are not as flexible as the concepts that they need to represent. That is also the real reason why interchange formats never caught on – they were just not able to adapt to the ways that people needed to use them. RDF is designed from the outset to be malleable.

Triple Format

At their foundation, all representations make statements about the structure or characteristics of things. All statements must have the form (or can be transformed into that format). The relational model strictly defines the set of triples that can be expressed about a thing. For example, imagine a table ‘Star’ that has some fields:

Star (
	StarId INT,
	CommonName nvarchar(256),
	Magnitude decimal NOT NULL,
	RA decimal NOT NULL,
	DEC decimal NOT NULL,
	Distance decimal NOT NULL,
	SpectralType nvarchar(64)
	)

Now if we had a row

(123, 'Deneb', 1.25, 300.8, 45.2, 440, 'A2la')

That would be equivalent to a set of triples represented in N3 like this:

[]
  StartId 123;
  CommonName "Deneb";
  Magnitude 1.25^xsd:decimal;
  RA 300.8^xsd:decimal;
  DEC 45.2^xsd:decimal;
  Distance 440^xsd:decimal;
  SpectralType "A2la" .

Clearly there’s a great deal of overlap between these two systems and the one is convertible into the other. But what happens when we launch a new space probe capable of measuring some new feature of the star that was never measurable before? Or what happens when we realise that to plot our star very far into the future we need to store radial velocity, proper motion and absolute magnitude. We don’t have fields for that, and there’s no way in the database to add them without extensive modifications to the database.

RDF triple stores (or runtime models or files for that matter) have no particular dependence on the data conforming to a prescribed format. More importantly class membership and instance-hood are more decoupled so that a ‘thing’ can exist without automatically being in a class. In OO languages you MUST have a type, just as in RDBMSs, a row MUST come from some table. We can define an instance that has all of the properties defined in table ‘Star’ plus a few others gained from the Hipparchos catalog and a few more gleaned from the Tycho-1 catalog. It does not break the model nor invalidate the ‘Star’ class-hood to have this extra information, it just happens that we know more about Deneb in our database than some other stars.

This independent, extensible, free-form, standards-based language is capable of accommodating any knowledge that you can gather about a thing. If you add meta-data about the thing then more deductions can be made about it, but its absence doesn’t stop you from adding or using the data in queries.

Extensible, Layered, Class Based with Meta-model

Being extensible, in the case of RDF, means a few things. It means that RDF supports OO-style multiple inheritance relationships. See my previous post to see that this is the tip of the iceberg for RDF class membership. That post went into more detail about how class membership was not based on some immutable Type property that once assigned can never by removed. Instead it, can be based on more or less flexible criteria.

Extensibility in RDF also means providing a way to make complex statements about the modelling language itself. For example once the structure of triples is defined (plus URIs that can be in subjects, predicates or objects) in the base RDF language, then RDF has a way to define complex relationships. The language was extended with RDF Schema which in turn was extended with several layers in OWL, which will in turn be extended by yet more abstract layers.

Is there a mechanism for self reference in SQL? I can’t think of a way of defining one structure in a DB in terms of the structure of another. There’s no way that I can think of of being explicit about the nature of the relationship between two entities. Is there a way for you to state in your relational model facts like this:

{?s CommonName ?c.} => {?s Magnitude ?m. ?m greaterThan 6.}

i.e. if it has a common name then it must be visible to the naked eye. I guess you’d do that with a relational view so that you could query whether the view ‘nakedEyeStars’ contains star 123. Of course CommonName could apply to botanical entities (plants) as well as to stars, but I imagine you’d struggle to create a view that merged data from the plant table and the star table.

So, in conclusion, there’s plenty of ways that RDF specifically addresses the problems it seeks to address – data interchange, standards definition, KR, mashups – in a distributed web-wide way. RDBMSs address the problems faced by programmers at the coal face in the 60s and 70s – efficient, standardized, platform-independent data storage and retrieval. The imperative that created a need for RDBMSs in the 60s is not going away, so I doubt databases will be going away any time soon either. In fact they can be exposed to the world as triples without too much trouble. The problem is that developers need more than just data storage and retrieval. They need intelligent data storage and retrieval.

New Resources for LinqToRdf

John Mueller recently sent through a link to a series of articles on working with RDF. As well as being a useful introduction to working with RDF, they use LinqToRdf for code examples.

They provide information on hosting RDF files as well as querying them using LinqToRdf. They show how easy it is to get semantic web applications up and running on .NET in no time at all. Please read the articles and share the links around.

John also told me about his new book LINQ for Dummies, which has a section on LinqToRdf. I’ve not had a chance to read it yet. I would welcome any feedback, which I’ll pass through to John. I understand that the content is broadly similar to the articles on DevSource.com, placing more emphasis on LINQ than RDF.  Again, please take a look and let me know what you think.

Not another mapping markup language!

Kingsley Idehen has again graciously given LinqToRdf some much needed link-love. He mentioned it in a post that was primarily concerned with the issues of mapping between the ontology, relational and object domains. His assertion is that LinqtoRdf, being an offshoot of an ORM related initiative, is reversing the natural order of mappings. He believes that in the world of ORM systems, the emphasis should be in mapping from the relational to the object domain.

I think that he has a point, but not for the reason he’s putting forward. I think that the natural direction of mapping stems from the relative richness of the domains being mapped. The impedence mismatch between the relational and object domains stems from (1) the implicitness of meaning in the relationships of relational systems and (2) the representation of relationships and (3) type mismatches.

If the object domain has great expressiveness and explicit meaning in relationships it has a ‘larger’ language than that expressible using relational databases. Relationships are still representable, but their meaning is implicit. For that reason you would have to confine your mappings to those that can be represented in the target (relational) domain. In that sense you get a priority inversion that forces the lowest common denominator language to control what gets mapped.

The same form of inversion occurs between the ontological and object domains, only this time it is the object domain that is the lowest common denominator. OWL is able to represent such things as restriction classes and multiple inheritance and sub-properties that are hard or impossible to represent in languages like C# or Java. When I heard of the RDF2RDB working group at the W3C, I suggested (to thunderous silence) that they direct their attentions to coming up with a general purpose mapping ontology that could be used for performing any kind of mapping.

I felt that it would have been extremely valuable to have a standard language for defining mappings. Just off the top of my head I can think of the following places where it would be useful:

  1. Object/Relational Mapping Systems (O/R or ORM)
  2. Ontology/Object Mappings (such as in LinqToRdf)
  3. Mashups (merging disparate data sources)
  4. Ontology Reconciliation – finding intersects between two sets of concepts
  5. Data cleansing
  6. General purpose data access layer automation
  7. Data export systems
  8. Synchronization Systems (i.e. keeping systems like CRM and AD in sync)
  9. mapping objects/tables onto UIs
  10. etc

You can see that most of these are perennial real-world problems that programmers are ALWAYS having to contend with. Having a standard language (and API?) would really help with all of these cases.

I think such an ontology would be a nice addition to OWL or RDF Schema, allowing a much richer definition of equivalence between classes (or groups or parts of classes). Right now one can define a one-to-one relationship using the owl:equivalentClass property. It’s easy to imagine that two ontology designers might approach a domain from such orthogonal directions that they find it hard to define any conceptual overlap between entities in their ontologies. A much more complex language is required to allow the reconciliation of widely divergent models.

I understand that by focusing their attentions on a single domain they increase their chances of success, but what the world needs from an organization like the W3C is the kind of abstract thinking that gave rise to RDF, not another mapping markup language!


Here’s a nice picture of how LinqToRdf interacts with Virtuoso (thanks to Kingsley’s blog).

How LINQ uses LinqToRdf to talk to SPARQL stores

How LINQ uses LinqToRdf to talk to SPARQL stores

Announcing LinqToRdf v0.8

I’m very pleased to announce the release of version 0.8 of LinqToRdf. This release is significant for a couple of reasons. Firstly, because it provides a preview release of RdfMetal and secondly because it is the first release containing changes contributed by someone other than yours truly. The changes in this instance being provided by Carl Blakeley of OpenLink Software.

LinqToRdf v0.8 has received a few major chunks of work:

  • New installers for both the designer and the whole framework
    WIX was proving to be a pain, so I downgraded to the integrated installer generator in Visual Studio.
  • A preview release of RdfMetal. I brought this release forward a little, on Carl Blakeley’s request, to coincide with a post he’s preparing on using OpenLink Virtuoso with LinqToRdf, so RdfMetal is not as fully baked as I’d planned. But it’s still worth a look. Expect a minor release in the next few weeks with additional fixes/enhancements.

I’d like to extend a very big thank-you to Carl for the the work he’s done in recent weeks to help extend and improve the mechanisms LinqToRdf uses to represent and traverse relationships. His contributions also include improvements in representing default graphs, and referencing multiple ontologies within a single .NET class. He also provided fixes around the quoting of URIs and some other fixes in the ways LinqToRdf generates SPARQL for default graphs. Carl also provided an interesting example application using OpenLink Virtuoso’s hosted version of Musicbrainz that is significantly richer than the test ontology I created for the unit tests and manuals.

I hope that Carl’s contributions represent an acknowledgement by OpenLink that not only does LinqToRdf support Virtuoso, but that there is precious little else in the .NET space that stands a chance of attracting developers to the semantic web. .NET is a huge untapped market for semantic web product vendors. LinqToRdf is, right now, the best way to get into semantic web development on .NET.

Look out for blog posts from Carl in the next day or two, about using LinqToRdf with OpenLink Virtuoso.

Announcing LinqToRdf 0.3 and LinqToRdf Designer 0.3

The third release of LinqToRdf has been uploaded to GoogleCode. Go to the project web site for links to the latest release.

LinqToRdf Changes:
– support for SPARQL type casting
– numerous bug fixes
– better support for identity projections
– more SPARQL relational operators
– latest versions of SemWeb & SPARQL Engine, incorporating recent bug
fixes and enhancements of each of them

I have also released a new graphical designer to auto-generate C# entity models as well as N3 ontology specifications from UML-like designs. This new download is an extension to Visual Studio 2008 beta 2, and should make working with LinqToRdf easier for those who are not that familiar with the W3 Semantic Web specifications.

Please let me know how you get on with them.