Semantic Web

Quantum Reasoners Hold Key to Future Web

Last year, a company called DWave Systems announced their quantum computer (the ‘Orion’) – another milestone on the road to practical quantum computing. Their controversial claims seem worthy in their own right but they are particularly important to the semantic web (SW) community. The significance to the SW community was that their quantum computer solved problems akin to Grover’s Algorithm speeding up queries of disorderly databases.

Semantic web databases are not (completely) disorderly and there are many ways to optimize the search for matching triples to a graph pattern. What strikes me is that the larger the triple store, the more compelling the case for using some kind of quantum search algorithm to find matches. DWave are currently trialing 128qbit processors, and they claim their systems can scale, so I (as a layman) can see no reason why such computers couldn’t be used to help improve the performance of queries in massive triple stores.

 What I wonder is:

  1. what kind of indexing schemes can be used to impose structure on the triples in a store?
  2. how can one adapt a B-tree to index each element of a triple rather than just a single primary key – three indexes seems extravagant.
  3. are there quantum algorithms that can beat the best of these schemes?
  4. is there is a place for quantum superposition in a graph matching algorithm (to simultaneously find matching triples then cancel out any that don’t match all the basic graph patterns?)
  5. if DWave’s machines could solve NP-Complete problems, does that mean that we would then just use OWL-Full?
  6. would the speed-ups then be great enough to consider linking everyday app data to large scale-upper ontologies?
  7. is a contradiction in a ‘quantum reasoner’ (i.e. a reasoner that uses a quantum search engine) something that can never occur because it just cancels out and never appears in the returned triples? Would any returned conclusion be necessarily true (relative to the axioms of the ontology?)

Any thoughts?

UPDATE
DWave are now working with Google to help them improve some of their machine learning algorithms. I wonder whether there will be other research into the practicality of using DWave quantum computing systems in conjunction with inference engines? This could, of course, open up whole new vistas of services that could be provided by Google (or their competitors). Either way, it gives me a warm feeling to know that every time I do a search, I’m getting the results from a quantum computer (no matter how indirectly). Nice.

Semantic Overflow Highlights I

Semantic Overflow has been active for a couple of weeks. We now have 155 users and 53 questions. We’ve already had some very interesting questions and some excellent detailed and thoughtful responses. I thought, on Egon’s instigation, to  bring together, from the site’s BI stats, some of the highlights of last week.

The best loved question this week came from Jerven Bolleman who wanted to know whether there was a “Simple CLI useable OWL Reasoner“. The most popular answer (and highest voted answer) came from Ivan Herman who provided tool suggestions, guidance and insights into the current research directions.

The most viewed question was from Akshat Shrivastava who asked “How do I make my homepage/personal blog semantic ?“. This question garnered 5 very good answers, including a particularly good one from Bill Roberts, who provided a lot of detail on his own experiences of doing just that.

The highest voted answer was from Egon Willighagen in answer to a question by ‘dusoft‘ as to “Why launching SemanticOverflow with too little users with zero reputation sucks?”. He very helpfully explained how to bootstrap your reputation and make a success of the site. I’m glad to say that dusoft’s was the only negative comment so far, and that the general response of other site users has been very positive. Mike McClintock even told me that “SemanticOverflow is my new favorite thing“. Thanks, Mike, I hope it stays that way!

In terms of gathering reputation, Ian Davis of Talis is the clear front runner with the following questions:

He also provided 17 very good answers. Thanks, Ian, and thanks to all the others who are already making Semantic Overflow a great site!

www.SemanticOverflow.com – the Web 2.0 Q&A site for all things Web 3.0.

www.SemanticOverflow.com is a new site based on the hugely popular StackOverflow.com, devoted to Q&A on anything related to the semantic web. The site is very new (created today) and I’m trying to get as many people to visit as I can, so please come and post your questions and together we’ll create a thriving community dedicated just to the semantic web. It’s free to join, free to ask questions, and most important of all – free to see the answers. You don’t even have to sign up to view, ask or answer questions. If you do join, all you get is kudos.
StackOverflow has been an overnight sensation because it combines many of the traditional features of wikis, newgroups and social media platforms combined with a reputation system that promotes community involvement and high quality discussion. I’m sure that as the Semantic Web starts to go exponential, Semantic Overflow will be a useful forum for more than just technical questions. I hope you’ll use it for speculation about the directions of the field itself. Unlike StackOverflow, non-technical discussions won’t be moderated out, provided they are on-topic. I think that in such an interesting and emerging space, discussion and dissemination of knowledge is critical. If you agree, please come and join us for a question or two.
Please also tell your friends, colleagues, relatives, acquaintances, pets and neighbors and ask them all to visit the site. If you have any suggestions about other places I could promote the site in – feel free to provide an answer here.

Relational Modeling? Not as we know it!

Marcello Cantos commented on my recent post about the ways in which RDF can transcend the object-oriented model. He posed the question of what things RDF can represent more easily than the relational model. I know Marcello is a very high calibre software engineer, so it’s not just an idle question from a relational dinosaur, but a serious question from someone who can push the envelope far with a relational database.

Since an ontology is most frequently defined (in compsci) as a specification of a conceptualization, a relational model is a kind of ontology. That means a relational model is by definition a knowledge representation system. That’d be my answer if I just wanted to sidestep the real thrust of his question; Is the relational model adequate to do what can be done by RDF?

That’s a more interesting question, and I’d be inclined to say everything I said in my previous post about the shortcomings of object oriented programming languages applies equally to the relational model. But lets take another look at the design features of RDF that make it useful for representation of ‘knowledge’.

○ URI based
○ Triple format
○ Extensible
○ Layered
○ Class based
○ Meta-model

URI Based

By using URIs as a token of identification and definition, and by making identifications and definitions readable, interchangeable and reusable the designers of RDF exposed the conceptualisation of the ontology to the world at large. Could you imagine defining a customer in your database as ‘everything in XYZ company’s CRM’s definition of a customer, plus a few special fields of our own‘. It is not practical. Perhaps you might want to say, everything in their database less some fields that we’re not interested in. Again – not possible. Relational models are not as flexible as the concepts that they need to represent. That is also the real reason why interchange formats never caught on – they were just not able to adapt to the ways that people needed to use them. RDF is designed from the outset to be malleable.

Triple Format

At their foundation, all representations make statements about the structure or characteristics of things. All statements must have the form (or can be transformed into that format). The relational model strictly defines the set of triples that can be expressed about a thing. For example, imagine a table ‘Star’ that has some fields:

Star (
	StarId INT,
	CommonName nvarchar(256),
	Magnitude decimal NOT NULL,
	RA decimal NOT NULL,
	DEC decimal NOT NULL,
	Distance decimal NOT NULL,
	SpectralType nvarchar(64)
	)

Now if we had a row

(123, 'Deneb', 1.25, 300.8, 45.2, 440, 'A2la')

That would be equivalent to a set of triples represented in N3 like this:

[]
  StartId 123;
  CommonName "Deneb";
  Magnitude 1.25^xsd:decimal;
  RA 300.8^xsd:decimal;
  DEC 45.2^xsd:decimal;
  Distance 440^xsd:decimal;
  SpectralType "A2la" .

Clearly there’s a great deal of overlap between these two systems and the one is convertible into the other. But what happens when we launch a new space probe capable of measuring some new feature of the star that was never measurable before? Or what happens when we realise that to plot our star very far into the future we need to store radial velocity, proper motion and absolute magnitude. We don’t have fields for that, and there’s no way in the database to add them without extensive modifications to the database.

RDF triple stores (or runtime models or files for that matter) have no particular dependence on the data conforming to a prescribed format. More importantly class membership and instance-hood are more decoupled so that a ‘thing’ can exist without automatically being in a class. In OO languages you MUST have a type, just as in RDBMSs, a row MUST come from some table. We can define an instance that has all of the properties defined in table ‘Star’ plus a few others gained from the Hipparchos catalog and a few more gleaned from the Tycho-1 catalog. It does not break the model nor invalidate the ‘Star’ class-hood to have this extra information, it just happens that we know more about Deneb in our database than some other stars.

This independent, extensible, free-form, standards-based language is capable of accommodating any knowledge that you can gather about a thing. If you add meta-data about the thing then more deductions can be made about it, but its absence doesn’t stop you from adding or using the data in queries.

Extensible, Layered, Class Based with Meta-model

Being extensible, in the case of RDF, means a few things. It means that RDF supports OO-style multiple inheritance relationships. See my previous post to see that this is the tip of the iceberg for RDF class membership. That post went into more detail about how class membership was not based on some immutable Type property that once assigned can never by removed. Instead it, can be based on more or less flexible criteria.

Extensibility in RDF also means providing a way to make complex statements about the modelling language itself. For example once the structure of triples is defined (plus URIs that can be in subjects, predicates or objects) in the base RDF language, then RDF has a way to define complex relationships. The language was extended with RDF Schema which in turn was extended with several layers in OWL, which will in turn be extended by yet more abstract layers.

Is there a mechanism for self reference in SQL? I can’t think of a way of defining one structure in a DB in terms of the structure of another. There’s no way that I can think of of being explicit about the nature of the relationship between two entities. Is there a way for you to state in your relational model facts like this:

{?s CommonName ?c.} => {?s Magnitude ?m. ?m greaterThan 6.}

i.e. if it has a common name then it must be visible to the naked eye. I guess you’d do that with a relational view so that you could query whether the view ‘nakedEyeStars’ contains star 123. Of course CommonName could apply to botanical entities (plants) as well as to stars, but I imagine you’d struggle to create a view that merged data from the plant table and the star table.

So, in conclusion, there’s plenty of ways that RDF specifically addresses the problems it seeks to address – data interchange, standards definition, KR, mashups – in a distributed web-wide way. RDBMSs address the problems faced by programmers at the coal face in the 60s and 70s – efficient, standardized, platform-independent data storage and retrieval. The imperative that created a need for RDBMSs in the 60s is not going away, so I doubt databases will be going away any time soon either. In fact they can be exposed to the world as triples without too much trouble. The problem is that developers need more than just data storage and retrieval. They need intelligent data storage and retrieval.

Object Orientation? Not as we know it.

I thought I’d start with a lyric:

That one’s my mother and
That one’s my father and
The one in the hat, that’s me.

You could be forgiven for wondering what Ani Difranco has to do with this blog’s usual themes, but rest assured, I won’t stray too far. My theme today is the limitations of the object oriented paradigm that I alluded to in my post about mapping ontologies. I mentioned in my previous post that RDF Schema and OWL were more expressive than the likes of C# and C++ in terms of the relationships they could express. This time, I’ll take the opportunity to show you what I mean by trying to emulate the English language as used by Ani Difranco.

I vividly remember the light bulb that came on over my head when I first learnt object orientation. There was a period there where I (and most of my cohorts, I should add) wandered around viewing everyday things and interactions through the prism of message passing and attributes.  It was all hopelessly nerdy, but it underlines for me the rightness of the object oriented paradigm that so much of what I saw around me fitted into the new way of looking at things.

The glamour with which object orientation bewitched us blinded us to all those things that object orientation was not good at. Clearly, the Functional and Logic Programming paradigms represent computation in ways that surpass the imperative paradigm, but in some ways they are only either complementary or inferior.  Object Orientation not only incorporates imperative programming, but also knowledge representation. The resurgence of functional and logic programming demonstrates that OO does not have all the answers in respect of computation. With the advances of RDF Schema and OWL, object orientation is now clearly lacking in the knowledge representation department.

Considering the genesis of C++ out of C, you see a language that adds (among other things) the notion of inheritance to the User Defined Data Type (structs) that were already present in C.  Inheritance in C++ allows composition of structures to building up new structures (Closure) having all of the properties of the parents, plus all the properties particular to the child. Language designers have kept the time-honoured ‘record‘ approach despite the fact that a record really doesn’t do justice to how we manipulate classes or sets in our head. And that’s where Ani Difranco comes in. 

That one’s my mother and
that one’s my father

This is a simple example of identification. Ani is referring to a depiction of people in a photograph. She then states (implicitly) that each of them are real people and that one of them is related to her by the ‘isMotherOf’ relationship and that the other is related via the ‘isFatherOf’ relationship. That’s the bread and butter of object orientation. Just instantiate a couple of instances of the Person class and store them in the Mother and Father properties of Me.

The one in the hat, that’s me.

This next line is a little more difficult to handle in a language like C#. What it does is identify an instance by virtue of its properties (wearing a hat). I could retrieve instances based on the value of properties using LINQ: 

var q = (from entity in collection where entity.hat != null select entity).Single();

But, let’s think about it for a moment. In this case it’s OK to define an instance like that, since she is referring to a specific instance of type “Person” in the song. But what if she’d been referring to a type of Person? The C# Type of the collection is defined elsewhere by a class (in the programming sense) specifying the properties that define the class (in the philosophical sense). What if I need to define the class of those entities that wear hats? We’re getting into a kind of impedance mismatch between the world of object orientation and the world of philosophy.

Set definitions in mathematics are infinitely more malleable than those of the object oriented world. A ‘thing’ can exist in multiple classes at the same time. For example, the integer ‘5’ is in the set ‘Odd Numbers’ as well as the sets ‘Prime Numbers’ and ‘Numbers less than ten’. Its identity is not dependent on the sets or classes to which it belongs. That is how class definitions work in the world of the semantic web, but not in the world of object orientation.

Back in the world of object orientation, I might derive a class from “Person” called “PersonWearingHat”, but I’d quickly run into problems since a Person wears hats optionally – that is a person is not defined by the set membership of PersonWearingHat-ness. I could instead provide a property “hat” to be populated with a value or not. If not, then it would not be retrieved by our query above. So far, I’ve managed to define a C# collection, but it is not a C# Type, and therefore cannot be used by the compiler or runtime for type checking and validation. Clearly, we need something either more dynamic or static and implicit but more sophisticated.

From a philosophical or mathematical standpoint, there is little difference between a class and a set. We normally treat them as synonymous. We define each using a language describing what is in the class/set and what is not. That’s not so different from how I define the collection of people wearing hats above. It’s the richness of THAT class definition
language which varies between OO and OWL. In OO, you can only define a class as something that can or must have a given set of properties. In OWL you can define a class in the same way, or by saying that the class is those entities with such and such properties and with values for the properties defined in an expression. In fact you can use expressions to define classes using a special idiom that I’ll demonstrate shortly.

One requirement of the RDF framework is that it provides an open ended model for describing the things you know about an entity. By open ended, I mean limitless and unconstrained. If you define a C# class in terms of the properties that it has, then you effectively limit that class to have only those properties. Clearly that won’t work for a system that wants to provide a distributed representation of knowledge about entities. But what does it mean to be a member of a class that is not defined in terms of attributes or methods?

Being a class in RDF Schema simply means defining a triple of the form

<instance> rdf:type <class URI> .

Or defining it with predicate owl:subClassOf with an object that is a class (again we find closure at work). In other words you are either a class because you are of type class or because you are derived from a thing that is of type class. That’s a very simple way to represent classhood. You still define properties of a class in much the same way as in object oriented languages.

 OWL provides a restriction mechanism for complex type definitions using the contents of properties. Here’s an example made using the N3 format of RDF.

 :V8Car
rdfs:subClassOf :Car ;
rdfs:subClassOf
[ a owl:Restriction ;
owl:onProperty :cylinders ;
owl:hasValue 8
] .

I’ve defined the class V8Car as anything of type Car that also has 8 cylinders. This is impossible in a mainstream record-oriented language. Once the type is assigned it cannot be unassigned. In this example an entity of type V6Car could have its engine replaced with one that had 8 cylinders and would immediately become an instance of the V8Car class while leaving class V6Car.

That’s just one example of class specifications that are not available to us in object-oriented languages. Remember Figure and Ground by MC Escher? Or Necker cubes? Your mind can flip flop between seeing the image in one way or the other. In OWL, you can define classes in that way too. To paraphrase the lyric at the top: the one whose head is not bare, that’s me. Here’s an example of using a negative class definition.

:Elsewhere
rdfs:subClassOf :Place ;
rdfs:subClassOf
[ owl:ComplementOf
[:isLocationOf :Me] ].

Elsewhere, for me, is anything where I am not. Class membership for everything else may change as I move around. Tell me of any programming language that can do that! There are plenty of other examples that I can bring forth from the OWL and RDF Schema specs, but I imagine you get the point now. Type membership is entirely static in the OO world, and it needn’t be. The question is, how could you implement a programming language based on such protean class definitions? Obviously the old C record approach will not do. I’ll save that for a future discussion, since it’s getting late.

New Resources for LinqToRdf

John Mueller recently sent through a link to a series of articles on working with RDF. As well as being a useful introduction to working with RDF, they use LinqToRdf for code examples.

They provide information on hosting RDF files as well as querying them using LinqToRdf. They show how easy it is to get semantic web applications up and running on .NET in no time at all. Please read the articles and share the links around.

John also told me about his new book LINQ for Dummies, which has a section on LinqToRdf. I’ve not had a chance to read it yet. I would welcome any feedback, which I’ll pass through to John. I understand that the content is broadly similar to the articles on DevSource.com, placing more emphasis on LINQ than RDF.  Again, please take a look and let me know what you think.

Not another mapping markup language!

Kingsley Idehen has again graciously given LinqToRdf some much needed link-love. He mentioned it in a post that was primarily concerned with the issues of mapping between the ontology, relational and object domains. His assertion is that LinqtoRdf, being an offshoot of an ORM related initiative, is reversing the natural order of mappings. He believes that in the world of ORM systems, the emphasis should be in mapping from the relational to the object domain.

I think that he has a point, but not for the reason he’s putting forward. I think that the natural direction of mapping stems from the relative richness of the domains being mapped. The impedence mismatch between the relational and object domains stems from (1) the implicitness of meaning in the relationships of relational systems and (2) the representation of relationships and (3) type mismatches.

If the object domain has great expressiveness and explicit meaning in relationships it has a ‘larger’ language than that expressible using relational databases. Relationships are still representable, but their meaning is implicit. For that reason you would have to confine your mappings to those that can be represented in the target (relational) domain. In that sense you get a priority inversion that forces the lowest common denominator language to control what gets mapped.

The same form of inversion occurs between the ontological and object domains, only this time it is the object domain that is the lowest common denominator. OWL is able to represent such things as restriction classes and multiple inheritance and sub-properties that are hard or impossible to represent in languages like C# or Java. When I heard of the RDF2RDB working group at the W3C, I suggested (to thunderous silence) that they direct their attentions to coming up with a general purpose mapping ontology that could be used for performing any kind of mapping.

I felt that it would have been extremely valuable to have a standard language for defining mappings. Just off the top of my head I can think of the following places where it would be useful:

  1. Object/Relational Mapping Systems (O/R or ORM)
  2. Ontology/Object Mappings (such as in LinqToRdf)
  3. Mashups (merging disparate data sources)
  4. Ontology Reconciliation – finding intersects between two sets of concepts
  5. Data cleansing
  6. General purpose data access layer automation
  7. Data export systems
  8. Synchronization Systems (i.e. keeping systems like CRM and AD in sync)
  9. mapping objects/tables onto UIs
  10. etc

You can see that most of these are perennial real-world problems that programmers are ALWAYS having to contend with. Having a standard language (and API?) would really help with all of these cases.

I think such an ontology would be a nice addition to OWL or RDF Schema, allowing a much richer definition of equivalence between classes (or groups or parts of classes). Right now one can define a one-to-one relationship using the owl:equivalentClass property. It’s easy to imagine that two ontology designers might approach a domain from such orthogonal directions that they find it hard to define any conceptual overlap between entities in their ontologies. A much more complex language is required to allow the reconciliation of widely divergent models.

I understand that by focusing their attentions on a single domain they increase their chances of success, but what the world needs from an organization like the W3C is the kind of abstract thinking that gave rise to RDF, not another mapping markup language!


Here’s a nice picture of how LinqToRdf interacts with Virtuoso (thanks to Kingsley’s blog).

How LINQ uses LinqToRdf to talk to SPARQL stores

How LINQ uses LinqToRdf to talk to SPARQL stores

Semantic Development Environments

The semantic web is a GOOD THING by definition – anything that enables us to create smarter software without also having to create Byzantine application software must be a step in the right direction. The problem is – many people have trouble translating the generic term “smarter” into a concrete idea of what they would have to do to achieve that palladian dream. I think a few concrete ideas might help to firm up people’s understanding of how the semantic web can help to deliver smarter products.

Software Development as knowledge based activity

In this post I thought it might be nice to share a few ideas I had about how OWL and SWRL could help to produce smarter software development environments. If you want to use the ideas to make money, feel free to do so, just consider them as released under the creative commons attribution license. Software development is the quintessential knowledge based activity. In the process of producing a modern application a typical developer will burn through knowledge at a colossal rate. Frequently, we will not reserve headspace for a lot of the knowledge we acquire to solve a task. Frequently, we bring together the ideas, facts, standards, API skills and problem requirements needed to solve a problem then just as quickly forget it all. The unique combination is never likely to arise again.

I’m sure we could make a few comments about how it’s more important to know where the information is than to know what it is – a fact driven home to me by my Computer Science lecturer John English, who seemed to be able to remember the contents page of every copy of the Proceedings of the ACM back to the ’60s. You might also be forgiven for thinking this wasn’t true , given the current obsession with certifications. We could also comment about how some information is more lasting than others, but my point is that every project these days seems to combine a mixture of ephemera, timeless principles and those bits that lie somewhere between the two (called ‘Best Practice’ in current parlance ;).

Requires cognitive assistance
Software development, then, is a knowledge intensive activity that brings together a variety of structured and unstructured information to allow the developer to produce a system that they endeavor to show is equivalent to a set of requirements, guidelines, nuggets of wisdom and cultural mores that are defined or mandated at the beginning of the project. Doesn’t this sound to you like exactly the environment for which the semantic web technology stack was designed?

Incidentally, the following applications don’t have much to do with the web, so perhaps they demonstrate that the term ‘Web 3.0′ is limiting and misleading. It’s the synergy of the complementary standards in the semantic web stack that makes it possible to deliver smarter products and to boost your viability in an increasingly competitive market place.

Documentation

OK, so the extended disclaimer/apology is now out of the way and I can start to talk about how the semantic web could offer help to improve the lives of developers. The first place I’ll look is at documentation. There are many types of documentation that are used in software development. In fact, there is a different form of documentation defined for each specific stage of the software lifecycle from conception of an idea through to its realization in code (and beyond). Each of these forms of documentation is more or less formally structured with different kinds of information related to documents and other deliverables that came before and after. This kind of documentation is frequently ambiguous, verbose and often gets written for the sake of compliance and then gets filed away and never sees the light of day again. Documentation for software projects needs to be precise, terse, rich and most of all useful.

Suggestion 1.

Use ontologies (perhaps standardised by the OMG) for the production of requirements. Automated tools could be used to convert these ontologies into human-readable reports or tools could be used to answer questions about specific requirements. A reasoner might be able to deduce conflicts or contradictions from a set of requirements. It might also be able to offer suggestions about implementations that have been shown to fulfill similar requirements in other projects. Clearly, the sky’s the limit in how useful an ontology, reasoner and rules language could be. It should also help documentation to be much more precise and less verbose. There is also scope for documentation reuse, specialization and for there to be diagramming and code generation driven off of documentation.

Documentation is used heavily inside the source code used by developers to write software too. It serves to provide an explanation for the purpose of a software component, to explain how to use it, to provide change notes, to generate API documentation web-sites, and to even store to-do list items or apologies for later reference. In .NET and Java, and now many other programming languages, it is common to use formal languages (like XML markup) to provide commonly used information. An ontology might be helpful in providing a rich and extensible language for representing code documentation. The use of URIs to represent unique entities means that the documentation can be the subject or other documents and can reach out to the wider ecology of data about the system.

Suggestion 2.

Provide an extensible ontology to allow the linkage of code documentation with the rest of the documentation produced for a software system. Since all parts of the software documentation process (being documented in RDF) will have unique URIs, it should be easy to link the documentation for a component to the requirements, specifications, plans, elaborations, discussions, blog posts and other miscellanea generated. Providing semantic web URIs to individual code elements helps to integrate the code itself into other semantic systems like change management and issue tracking systems. Use of URIs and ontologies within source code helps to provide a firm, rich linkage between source code and the documentation that gave rise to it.

Suggestion 3.

Boosted with richer, extensible markups to represent the meaning and wider documentation environment means that traditional intellisense can be augmented with browsers that provide access to all other pertinent documentation related to a piece of code. Imagine hovering over an object reference and getting links not only to a web site generated from the code commentary but to all the requirements that the code fulfills, to automated proofs demonstrating that the code matches the requirements, to blog posts written by the dev team and to MP3s taken during the brainstorming and design sessions during which this component was conceived.

It doesn’t take much imagination to see that some simple enhancements like these can provide a ramp for the continued integration of the IDE, allowing smoother cooperation between teams and their stakeholders. Making documentation more useful to all involved would probably increase the chances that people would give up Agile in favour of something less like the emperor’s clothes.

Suggestion 4.

Here’s some other suggestions about how documentation in the IDE could be enriched.
○ Guidelines on where devs should focus their attention when learning a new API
○ SPARQL could be exposed by code publisher
§ Could provide a means to publish documentation online
○ Automatic publishing of DOAP documents to an enterprise or online registry, allowing software registries.

Dynamic Systems

Augmenting the source code of a system with URIs that can be referenced from anywhere opens the semantic artifacts inside an application to analysis and reference from outside. Companies like Microsoft have already described their visions for the production of documentation systems that allow architects to describe how a system hangs together. This information can be used by other systems to deploy, monitor, control and scale systems in production environments.

I think that their vision barely glimpses what could be achieved through the use of automated inference systems, rich structured machine readable design documentation, and systems that are for the first time white boxes. I think that DSI-style declarative architecture documents are a good example of what might be achieved through the use of smart documentation. There is more though.

Suggestion 5.

Reflection and other analysis tools can gather information about the structure, inter-relationships and external dependencies of a software system. Such data can be fed to an inference engine to allow it to make comparisons about the runtime behavior of a production system. Rules of inference can help it to determine what the consequences of violating a rule derived from the architect or developers documentation. Perhaps it could detect when the system is misconfigured or configured in a way that will force it to struggle under load. Perhaps it can find explanations for errors and failures. Rich documentation systems should allow developers to indicate deployment guidelines (i.e. this component is thread safe, or is location independent and scalable). Such documentation can be used to predict failure modes, to direct testing regimes and to predict optimal deployment patterns for specific load profiles.

Conclusions

I wrote this post because I know I’ll never have time to pursue these ideas, but I would dearly love to see them come to pass. Why don’t you get a copy of LinqToRdf, crack open a copy of Coco/R and see whether you can implement some of these suggestions. And if you find a way to get rich doing it, then please remember me in your will.

Wanted: Volunteers for .NET semantic web framework project

 LinqToRdf* is a full-featured LINQ** query provider for .NET written in C#. It provides developers with an intuitive way to make queries on semantic web databases. The project has been going for over a year and it’s starting to be noticed by semantic web early adopters and semantic web product vendors***. LINQ provides a standardised query language and a platform enabling any developer to understand systems using semantic web technologies via LinqToRdf. It will help those who don’t have the time to ascend the semantic web learning curve to become productive quickly.

The project’s progress and momentum needs to be sustained to help it become the standard API for semantic web development on the .NET platform. For that reason I’m appealing for volunteers to help with the development, testing, documentation and promotion of the project.

Please don’t be concerned that all the best parts of the project are done. Far from it! It’s more like the foundations are in place, and now the system can be used as a platform to add new features. There are many cool things that you could take on. Here are just a few:

Reverse engineering tool
This tool will use SPARQL to interrogate a remote store to get metadata to build an entity model.

Tutorials and Documentation
The documentation desperately needs the work of a skilled technical writer. I’ve worked hard to make LinqToRdf an easy tool to work with, but the semantic web is not a simple field. If it were, there’d be no need for LinqToRdf after all. This task will require an understanding of the LINQ, ASP.NET, C#, SPARQL, RDF, Turtle, and SemWeb.NET systems. It won’t be a walk in the park.

 

Supporting SQL Server
The SemWeb.NET API has recently added support to SQL Server, which has not been exploited inside LinqToRdf (although it may be easy to do).  This task would also involve thinking about robust scalable architectures for semantic web applications in the .NET space.

 

Porting LinqToRdf to Mono
LINQ and C# 3.0 support in Mono is now mature enough to make this a desirable prospect. Nobody’s had the courage yet to tackle it. Clearly, this would massively extend the reach of LinqToRdf, and it would be helped by the fact that some of the underlying components are developed for Mono by default.

 

SPARQL Update (SPARUL) Support
LinqToRdf provides round-tripping only for locally stored RDF. Support of SPARQL Update would allow data round-tripping on remote stores. This is not a fully ratified standard, but it’s only a matter of time.

 

Demonstrators using large scale web endpoints
There are now quite a few large scale systems on the web with SPARQL endpoints. It would be a good demonstration of LinqToRdf to be able to mine them for useful data.

 

These are just some of the things that need to be done on the project. I’ve been hoping to tackle them all for some time, but there’s just too much for one man to do alone. If you have some time free and you want to learn more about LINQ or the Semantic Web, there is not a better project on the web for you to join.  If you’re interested, reply to this letting me know how you could contribute, or what you want to tackle. Alternatively join the LinqToRdf discussion group and reply to this message there.

 

Thanks,

 

Andrew Matthews

 

* http://code.google.com/p/linqtordf

** http://msdn.microsoft.com/en-us/netframework/aa904594.aspx

*** http://virtuoso.openlinksw.com/Whitepapers/html/linqtordf/linqtordf1.htm

Announcing LinqToRdf v0.8

I’m very pleased to announce the release of version 0.8 of LinqToRdf. This release is significant for a couple of reasons. Firstly, because it provides a preview release of RdfMetal and secondly because it is the first release containing changes contributed by someone other than yours truly. The changes in this instance being provided by Carl Blakeley of OpenLink Software.

LinqToRdf v0.8 has received a few major chunks of work:

  • New installers for both the designer and the whole framework
    WIX was proving to be a pain, so I downgraded to the integrated installer generator in Visual Studio.
  • A preview release of RdfMetal. I brought this release forward a little, on Carl Blakeley’s request, to coincide with a post he’s preparing on using OpenLink Virtuoso with LinqToRdf, so RdfMetal is not as fully baked as I’d planned. But it’s still worth a look. Expect a minor release in the next few weeks with additional fixes/enhancements.

I’d like to extend a very big thank-you to Carl for the the work he’s done in recent weeks to help extend and improve the mechanisms LinqToRdf uses to represent and traverse relationships. His contributions also include improvements in representing default graphs, and referencing multiple ontologies within a single .NET class. He also provided fixes around the quoting of URIs and some other fixes in the ways LinqToRdf generates SPARQL for default graphs. Carl also provided an interesting example application using OpenLink Virtuoso’s hosted version of Musicbrainz that is significantly richer than the test ontology I created for the unit tests and manuals.

I hope that Carl’s contributions represent an acknowledgement by OpenLink that not only does LinqToRdf support Virtuoso, but that there is precious little else in the .NET space that stands a chance of attracting developers to the semantic web. .NET is a huge untapped market for semantic web product vendors. LinqToRdf is, right now, the best way to get into semantic web development on .NET.

Look out for blog posts from Carl in the next day or two, about using LinqToRdf with OpenLink Virtuoso.