programming

Master category for all IT or CompSci related content

Automata-Based Programming With Petri Nets – Part 1

Petri Nets are extremely powerful and expressive, but they are not as widely used in the software development community as deterministic state machines. That’s a pity – they allow us to solve problems beyond the reach of conventional state machines. This is the first in a mini-series on software development with Petri Nets. All of the code for a full feature-complete Petri Net library is available online at GitHub. You’re welcome to take a copy, play with it and use it in your own projects. Code for this and subsequent articles can be found at http://github.com/aabs/PetriNets.

(more…)

Less Intrusive Visitors

Forgive the recent silence – I’ve been in my shed.

Frequently, I need some variation on the Visitor or HierarchicalVisitor patterns
to analyse or transform an object graph. Recent work on a query builder
for an old-skool query API sent my thoughts once again to the Visitor pattern. I
normally hand roll these frameworks based on my experiences with recursive
descent compilers, but this time I thought I’d produce a more GoF-compliant
implementation.

The standard implementation of the visitor looks a lot like the first code example. First you
define some sort of domain model (often following the composite pattern).
This illustration doesn’t bother with composite. I’ll show one later on, with an
accompanying HierarchicalVisitor implementation.

abstract class BaseElement {
  void Accept(IVisitor v);
  }
class T1 : BaseElement {
  void Accept(IVisitor v) {
    v.visit(this);
  }
}
class T2 : BaseElement {
  void Accept(IVisitor v) {
    v.visit(this);
  }
}
class T3 : BaseElement {
  void Accept(IVisitor v) {
    v.visit(this);
  }
}
interface IVisitor{
  void Visit (T1 t1);
  void Visit (T2 t2);
  void Visit (T3 t3);
}

Here’s an implementation of the visitor, normally you’d give default
implementations via and abstract base class. I’ll show how that’s done later.

class MyVisitor : IVisitor {
  void Visit (T1 t1) {
    // do something here
  }
  void Visit (T2 t2) {
    // do something here
  }
  void Visit (T3 t3) {
    // do something here
  }
}

The accept methods are on the domain model entities themselves. What if I have a
composite graph of objects that are not conveniently derived from some abstract
class or interface for my convenience? What if I want to iterate or navigate
the structures in alternate ways. What if I don’t want to (or can’t) pollute
my domain model with visitation code?

I thought it might be cleaner to factor out the responsibility for the
dispatching into another class – a Dispatcher. I provide the Dispatcher from my
client code and am still able to visit each element in turn. Surprisingly, the
result is slightly cleaner than the standard implementation of the pattern,
sacrificing nothing, but gaining a small increment in applicability.

Let’s contrast this canonical implementation with one that uses anemic objects
for the domain model. First we need to define a little composite pattern to
iterate over. This time, I’ll give the abstract base class for the entities
and for the visitors and show a composite pattern as well.

abstract class AbstractBase {
  public string Name {get;set;}
}
class Composite : AbstractBase {
  public string NonTerminalIdentifier { get; set; }
  public Composite(string nonTerminalIdentifier) {
    Name = "Composite";
    NonTerminalIdentifier = nonTerminalIdentifier;
  }
  public List SubParts = new List();
}
class Primitive1 : AbstractBase {
  public Primitive1() {
    Name = "Primitive1";
  }
}
class Primitive2 : AbstractBase {
  public Primitive2() {
    Name = "Primitive2";
  }
}

A composite class plus a couple of primitives. Next, Lets look at the visitor
interface.

interface IVisitor {
  void Visit(Primitive1 p1);
  void Visit(Primitive2 p2);
  bool StartVisit(Composite c);
  void EndVisit(Composite c);
}

According to the discussions at the Portland pattern repository, this could be
called the HierarchicalVisitor pattern, but I suspect most applications of
visitor are over hierarchical object graphs, and they mostly end up like this so
I won’t dwell on the name too much. True to form, it provides mechanisms to
visit each type of element allowed in our object graph. Next, the Dispatcher that
controls the navigation over the object graph. This is the departure from the
canonical model. A conventional implementation of visitor places this code in
the composite model itself, which seems unnecessary. Accept overloads are
provided for each type of the domain model.

class Dispatcher {
  public static void Accept(Primitive1 p1, TV visitor)
    where TV : IVisitor {
    visitor.Visit(p1);
  }
  public static void Accept(Primitive2 p2, TV visitor)
    where TV : IVisitor {
    visitor.Visit(p2);
  }
  public static void Accept(Composite c, TV visitor)
    where TV : IVisitor {
    if (visitor.StartVisit(c)) {
      foreach (var subpart in c.SubParts) {
        if (subpart is Primitive1) {
          Accept(subpart as Primitive1, visitor);
        }
        else if (subpart is Primitive2) {
          Accept(subpart as Primitive2, visitor);
        }
        else if (subpart is Composite) {
          Accept(subpart as Composite, visitor);
      }
    }
    visitor.EndVisit(c);
    }
  }
}

The dispatcher’s first parameter is the object graph element
itself. This provides the context that was implicit with the conventional
implementation. This is a trade-off. On the one hand you cannot access any
private object information inside the dispatch code. On the other hand you can
have multiple different dispatchers for different tasks. Another drawback with
an ‘external’ dispatcher is the need for old-fashioned dispatcher switch
statements in the Composite acceptor. The Composite stores its sub-parts as
references to the AbstractBase class, so it needs to decide manually what the
Accept method is that must handle the sub-part in question.

The implementation for a visitor is much the same as in a normal implementation.
A default implementation of the visit functions is given that
does nothing. To implement a HierarchicalVisitor, the
default StartVisit must return true to allow iteration of the
subparts of a Composite to proceed.

class BaseVisitor : IVisitor {
  public virtual void Visit(Primitive1 p1) { }
  public virtual void Visit(Primitive2 p2) { }
  public virtual bool StartVisit(Composite c) {
    return true;
  }
  public virtual void EndVisit(Composite c) { }
}

Here’s a Visitor that simply records the name of who gets visited.

class Visitor : BaseVisitor {
  public override void Visit(Primitive1 p1) {
    Debug.WriteLine("p1");
  }
  public override void Visit(Primitive2 p2) {
    Debug.WriteLine("p2");
  }
  public override bool StartVisit(Composite c) {
    Debug.WriteLine("->c");
    return true;
  }
  public override void EndVisit(Composite c) {
    Debug.WriteLine("<-c");
  }
}

Given an object graph of type Composite, it is simple to use this little framework.

Dispatcher.Accept(objGraph, new Visitor1());

I like this way of working with visitors more than the conventional
implementation – it makes it possible to provide a good visitor implementation on
thrid party frameworks (yes, I’m thinking of LINQ expression trees). It is no
more expensive to extend with new visitors, and it has the virtue that you can
navigate an object graph in any fashion you like.

Sequential script loading on demand

This little script uses the JQuery getScript command, enforcing sequential loading order to ensure script dependencies are honoured:

function LoadScriptsSequentially(scriptUrls, callback)
{
    if (typeof scriptUrls == 'undefined') throw "Argument Error: URL array is unusable";
    if (scriptUrls.length == 0 && typeof callback == 'function') callback();
    $.getScript(scriptUrls.shift(), function() { LoadScriptsSequentially(scriptUrls, callback); });
}

Here’s how you use it:

function InitialiseQueryFramework(runTests)
{
    LoadScriptsSequentially([
        "/js/inheritance.js",
        "/js/expressions.js",
        "/js/refData.js",
        "/js/queryRenderer.js",
        "/js/sha1.js",
        "/js/queryCache.js",
        "/js/unit-tests.js"],
        function()
        {
            queryFramework = new QueryManager("#query");
            if (runTests) RunTestSuite();
        });
}

I love javascript now and can’t understand why I avoided it for years. I particularly love the hybrid fusion of functional and procedural paradigms that possible in JS. You can see that at work in the parameters being passed into the recursive call to LoadScriptsSequentially.

What do you think? Is there a better/nicer way to do this?

Can AOP help fix bad architectures?

I recently posted a question on Stack Overflow on the feasibility of using IL rewriting frameworks to rectify bad design after the fact. The confines of the answer comment area were too small to give the subject proper treatment so I though a new blog post was in order. Here’s the original question:

I’ve recently been playing around with PostSharp, and it brought to mind a problem I faced a few years back: A client’s developer had produced a web application, but they had not given a lot of thought to how they managed state information – storing it (don’t ask me why) statically on the Application instance in IIS. Needless to say the system didn’t scale and was deeply flawed and unstable. But it was a big and very complex system and thus the cost of redeveloping it was prohibitive. My brief at the time was to try to refactor the code-base to impose proper decoupling between the components.

At the time I tried to using some kind of abstraction mechanism to allow me to intercept all calls to the static resource and redirect them to a component that would properly manage the state data. The problem was there was about 1000 complex references to be redirected (and I didn’t have a lot of time to do it in). Manual coding (even with R#) proved to be just too time consuming – we scrapped the code base and rewrote it properly. it took over a year to rewrite.

What I wonder now is – had I had access to an assembly rewriter and/or Aspect oriented programming system (such as a PostSharp) could I have easily automated the refactoring process of finding the direct references and converted them to interface references that could be redirected automatically and supplied by factories.

Has anyone out there used PostSharp or similar systems to rehabilitate pathological legacy systems? How successful were the projects? Did you find after the fact that the effort was worth it? Would you do it again?

I subsequently got into an interesting (though possibly irrelevant) discussion with Ira Baxter on program transformation systems, AOP and the kind of intelligence a program needs to have in order to be able to refactor a system, preserving the business logic whilst rectifying any design flaws in the system.  There’s no space to discuss the ideas, so I want to expand the discussion here.

The system I was referring to had a few major flaws:

  1. The user session information (of which there was a lot) was stored statically on a specific instance of IIS. This necessitated the use of sticky sessions to ensure the relevant info was around when user requests came along.
  2. Session information was lost every time IIS recycled the app pool, thus causing the users to lose call control (the app was phone-related).
  3. State information was glommed into a tight bolus of data that could not be teased apart, so refactoring the app was an all-or-nothing thing.

As you can guess, tight coupling/lack of abstraction and direct resource dispensation were key flaws that prevented the system from being able to implement fail-over, disaster recovery, scaling, extension and maintenance.

This product is  in a very competitive market and needs to be able to innovate to stay ahead, so the client could ill afford to waste time rewriting code while others might be catching up. My question was directed in hindsight to the problem of whether one could retroactively fix the problems, without having to track down, analyse and rectify each tightly coupled reference between client code and state information and within the state information itself.

What I needed at the time was some kind of declarative mechanism whereby I could confer the following properties on a component:

  1. location independence
  2. intercepted object creation
  3. transactional persistence

Imagine, that we could do it with a mechanism like PostSharp’s multicast delegates:

[assembly: DecoupledAndSerialized(    
    AspectPriority = -1,
    AttributeTargetAssemblies = "MyMainAssembly",
    AttributeTargetTypes = "MainAssembly.MyStateData",
    AttributeTargetMembers = "*")]

What would this thing have to do to be able to untie the knots that people get themselves into?

  1. It would have to be able to intercept every reference to MainAssembly.MyStateData and replace the interaction with one that got the instance from some class factory that could go off to a database or some distant server instead.

    that is – introduce an abstraction layer and some IoC framework.
  2. It must ensure that the component managing object persistence checked into and out of the database appropriately (atomically) and all contention over that data was properly managed.
  3. It must ensure that all session specific data was properly retrieved and disposed for each request.

This is not a pipe dream by any means – there are frameworks out there that are designed to place ‘shims’ between layers of a system to allow the the shim to be expanded out into a full-blown proxy that can communicate through some inter-machine protocol or just devolve to plain old in-proc communication while in a dev environment. The question is, can you create a IL rewriter tool that is smart enough to work out how to insert the shims based on its own knowledge of good design principles? Could I load it up with a set of commandments graven in stone, like “never store user session data in HttpContext.Application”? If it found a violation of such a cardinal sin, could it insert a proxy that would redirect the flow away from the violated resource, exploiting some kind of resource allocation mechanism that wasn’t so unscaleable?

From my own experience, these systems require you to be able to define service boundaries to allow the system to know what parts to make indirect and which parts to leave as-is. Juval Lowy made a strong case for the idea that every object should be treated as a service, and all we lack is a language that will seamlessly allow us to work with services as though they were objects. Imagine if we could do that, providing an ‘abstractor tool’ as part of the build cycle. While I think he made a strong case, my experience of WCF (which was the context of his assertions) is that it would be more onerous to configure the blasted thing than it would be to refactor all those references. But if I could just instruct my IL rewriter to do the heavy lifting, then I might be more likely to consider the idea.

Perhaps PostSharp has the wherewithal to address this problem without us having to resort to extremes or to refactor a single line? PostSharp has two incredible features that make it a plausible candidate for such a job. the first is the Multicast delegate feature that would allow me to designate a group of APIs declaratively as the service boundary of a component. The original code is never touched, but by using an OnMethodInvocationAspect you could intercept every call to the API turning them into remote service invocations. The second part of the puzzle is compile-time instantiation of an Aspect – in this system your aspects are instantiated at compile time, given an opportunity to perform some analysis and then to serialize the results of that analysis for runtime use when the aspect is invoked as part of a call-chain. The combination of these two allows you to perform an arbitrary amount of compile time analysis prior to generating a runtime proxy object that could intercept those method calls necessary to enforce the rules designated in the multicast delegate. The aspect could perform reflective analysis and comparison against a rules base (just like FxCop) but with an added collection of refactorings it could take to alleviate the problem. The user might still have to provide hints about where to get and store data, or how the app was to be deployed, but given high level information, perhaps the aspect could self configure?

Now that would be a useful app – a definite must-have addition to any consultant’s toolkit.

Fowler’s Technical Debt Quadrant – Giving the co-ordinates where Agile is contraindicated.

Martin Fowler’s bliki has a very good post on what he calls the ‘technical debt quadrant‘. This post succinctly sums up the difference between those who produce poor designs in ways that are contrary to their best interests, and those who do so knowingly and reluctantly.

In the past I’ve noted that many of those who profess to being agile are really just defaulters in the bank of technical debt. They confuse incurring inadvertent reckless technical debt with being Agile. The YAGNI (You Ain’t Gonna Need It) maxim is an example of something that, when used prudently and knowingly, is a sensible (temporary) acceptance of short term technical debt. In the hands of those who recklessly and inadvertently incur technical debt, it becomes a waiver for all sorts of abuses of good software development practice. It’s not uncommon in small cash-strapped start-ups to hear people invoke the YAGNI maxim as they contemplate key architectural issues, confusing cost-cutting with architectural parsimony. I think it’s safe to say that those who live in the dark waters of the bottom left of the quadrant need the formality of a more ‘regulated’  process to support them as they grope their way into the light.

Does formality have to mean expensive? I don’t think so. I think in most cases it’s a matter of perception – people assume that ‘formality’ implies back-breaking bureaucracy. It needn’t, it just means creating a framework that helps people to stop, take a breath and look around them. Without that, people often lose their way and drive headlong into crippling technical debt. I’ve seen large companies that consciously avoided using up-front design subsequently paying the cost of that debt for years afterwards in maintenance. One notable example would regularly burn through its OpEx budget within the first 6 months of each year – they had a product that was so badly designed, inconsistent and hard to maintain that they lost all forward momentum, threw away their monopoly and effectively lost their place in the market. All through lack of up-front design and architecture.

For all companies where stakeholders are either so stressed or so inexperienced that they opt for the lower left of Fowler’s quadrant, Agile is contraindicated.

PostSharp Laos – Beautiful AOP.

I’ve recently been using PostSharp 1.5 (Laos) to implement various features such as logging, tracing, API performance counter recording, and repeatability on the softphone app I’ve been developing. Previously, we’d been either using hand-rolled code generation systems to augment the APIs with IDisposable-style wrappers, or hand coded the wrappers within the implementation code. The problem was that by the time we’d added all of the above, there were hundreds of lines of code to maintain around the few lines of code that actually provided a business benefit.

Several years ago, when I worked for Avanade, I worked on a very large scale project that used the Avanade Connected Architecture (ACA.NET) – a proprietary competitor for PostSharp. We found Aspect Oriented Programming (AOP) to be a great way to focus on the job at hand and reliably dot all the ‘i’s and cross all the ‘t’s in a single step at a later date.

ACA.NET, at that time, used a precursor of the P&P Configuration Application Block and performed a form of post build step to create external wrappers that instantiated the aspect call chain prior to invoking your service method. That was a very neat step that could allow configurable specifications of applicable aspects. It allowed us to develop the code in a very naive in-proc way, and then augment the code with top level exception handlers, transactionality etc at the same time that we changed the physical deployment architecture. Since that time, I’ve missed the lack of such a tool, so it was a pleasure to finally acquaint myself  with PostSharp.

I’d always been intending to introduce PostSharp here, but I’d just never had time to do it. Well, I finally found the time in recent weeks and was able to do that most gratifying thing – remove and simplify code, improve performance and code quality, reduced maintenance costs and increased the ease with I introduce new code policies all in a single step. And all without even scratching the surface of what PostSharp is capable of.

Here’s a little example of the power of AOP using PostSharp, inspired by Elevate’s memoize extension method. We try to distinguish as many of our APIs as possible into Pure and Impure. Those that are impure get database locks, retry handlers etc. Those that are pure in a functional sense can be cached, or memoized. Those that are not pure in a functional sense are those that while not saving any data still are not one-to-one between arguments and result, sadly that’s most of mine (it’s a distributed event driven app).

[Serializable]
public class PureAttribute : OnMethodInvocationAspect
{
    Dictionary<int, object> PreviousResults = new Dictionary<int, object>();

    public override void OnInvocation(MethodInvocationEventArgs eventArgs)
    {
        int hashcode = GetArgumentArrayHashcode(eventArgs.Method, eventArgs.GetArgumentArray());
        if (PreviousResults.ContainsKey(hashcode))
            eventArgs.ReturnValue = PreviousResults[hashcode];
        else
        {
            eventArgs.Proceed();
            PreviousResults[hashcode] = eventArgs.ReturnValue;
        }
    }

    public int GetArgumentArrayHashcode(MethodInfo method, params object[] args)
    {
        StringBuilder result = new StringBuilder(method.GetHashCode().ToString());

        foreach (object item in args)
            result.Append(item);
        return result.ToString().GetHashCode();
    }
}

I love what I achieved here, not least for the fact that it took me no more than about 20 lines of code to do it. But that’s not the real killer feature, for me. It’s the fact that PostSharp Laos has MulticastAttributes, that allow me to apply the advice to numerous methods in a single instruction, or even numerous classes or even every method of every class of an assembly. I can specify what to attach the aspects to by using regular expressions, or wildcards. Here’s an example that applies an aspect to all public methods in class MyServiceClass.

[assembly: Pure(
    AspectPriority = 2,
    AttributeTargetAssemblies = "MyAssembly",
    AttributeTargetTypes = "UiFacade.MyServiceClass",
    AttributeTargetMemberAttributes = MulticastAttributes.Public,
    AttributeTargetMembers = "*")]

Here’s an example that uses a wildcard to NOT apply the aspect to those methods that end in “Impl”.

[assembly: Pure(
    AspectPriority = 2,
    AttributeTargetAssemblies = "MyAssembly",
    AttributeTargetTypes = "UiFacade.MyServiceClass",
    AttributeTargetMemberAttributes = MulticastAttributes.Public,
    AttributeExclude = true,
    AttributeTargetMembers = "*Impl")]

Do you use AOP? What aspects do you use, other than the usual suspects above?

Does this seem nice to you?

After years of recoiling at the sight of code like this, am I supposed now to embrace it in a spirit of reconciliation?

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            dynamic blah = GetTheBlah();
            Console.WriteLine(blah);
        }

        private static dynamic GetTheBlah()
        {
            if (DateTime.Now.Millisecond % 3 == 0)
                return 0;
            else
                return "hello world!";
        }
    }
}

need to wash my hands.

Relational Modeling? Not as we know it!

Marcello Cantos commented on my recent post about the ways in which RDF can transcend the object-oriented model. He posed the question of what things RDF can represent more easily than the relational model. I know Marcello is a very high calibre software engineer, so it’s not just an idle question from a relational dinosaur, but a serious question from someone who can push the envelope far with a relational database.

Since an ontology is most frequently defined (in compsci) as a specification of a conceptualization, a relational model is a kind of ontology. That means a relational model is by definition a knowledge representation system. That’d be my answer if I just wanted to sidestep the real thrust of his question; Is the relational model adequate to do what can be done by RDF?

That’s a more interesting question, and I’d be inclined to say everything I said in my previous post about the shortcomings of object oriented programming languages applies equally to the relational model. But lets take another look at the design features of RDF that make it useful for representation of ‘knowledge’.

○ URI based
○ Triple format
○ Extensible
○ Layered
○ Class based
○ Meta-model

URI Based

By using URIs as a token of identification and definition, and by making identifications and definitions readable, interchangeable and reusable the designers of RDF exposed the conceptualisation of the ontology to the world at large. Could you imagine defining a customer in your database as ‘everything in XYZ company’s CRM’s definition of a customer, plus a few special fields of our own‘. It is not practical. Perhaps you might want to say, everything in their database less some fields that we’re not interested in. Again – not possible. Relational models are not as flexible as the concepts that they need to represent. That is also the real reason why interchange formats never caught on – they were just not able to adapt to the ways that people needed to use them. RDF is designed from the outset to be malleable.

Triple Format

At their foundation, all representations make statements about the structure or characteristics of things. All statements must have the form (or can be transformed into that format). The relational model strictly defines the set of triples that can be expressed about a thing. For example, imagine a table ‘Star’ that has some fields:

Star (
	StarId INT,
	CommonName nvarchar(256),
	Magnitude decimal NOT NULL,
	RA decimal NOT NULL,
	DEC decimal NOT NULL,
	Distance decimal NOT NULL,
	SpectralType nvarchar(64)
	)

Now if we had a row

(123, 'Deneb', 1.25, 300.8, 45.2, 440, 'A2la')

That would be equivalent to a set of triples represented in N3 like this:

[]
  StartId 123;
  CommonName "Deneb";
  Magnitude 1.25^xsd:decimal;
  RA 300.8^xsd:decimal;
  DEC 45.2^xsd:decimal;
  Distance 440^xsd:decimal;
  SpectralType "A2la" .

Clearly there’s a great deal of overlap between these two systems and the one is convertible into the other. But what happens when we launch a new space probe capable of measuring some new feature of the star that was never measurable before? Or what happens when we realise that to plot our star very far into the future we need to store radial velocity, proper motion and absolute magnitude. We don’t have fields for that, and there’s no way in the database to add them without extensive modifications to the database.

RDF triple stores (or runtime models or files for that matter) have no particular dependence on the data conforming to a prescribed format. More importantly class membership and instance-hood are more decoupled so that a ‘thing’ can exist without automatically being in a class. In OO languages you MUST have a type, just as in RDBMSs, a row MUST come from some table. We can define an instance that has all of the properties defined in table ‘Star’ plus a few others gained from the Hipparchos catalog and a few more gleaned from the Tycho-1 catalog. It does not break the model nor invalidate the ‘Star’ class-hood to have this extra information, it just happens that we know more about Deneb in our database than some other stars.

This independent, extensible, free-form, standards-based language is capable of accommodating any knowledge that you can gather about a thing. If you add meta-data about the thing then more deductions can be made about it, but its absence doesn’t stop you from adding or using the data in queries.

Extensible, Layered, Class Based with Meta-model

Being extensible, in the case of RDF, means a few things. It means that RDF supports OO-style multiple inheritance relationships. See my previous post to see that this is the tip of the iceberg for RDF class membership. That post went into more detail about how class membership was not based on some immutable Type property that once assigned can never by removed. Instead it, can be based on more or less flexible criteria.

Extensibility in RDF also means providing a way to make complex statements about the modelling language itself. For example once the structure of triples is defined (plus URIs that can be in subjects, predicates or objects) in the base RDF language, then RDF has a way to define complex relationships. The language was extended with RDF Schema which in turn was extended with several layers in OWL, which will in turn be extended by yet more abstract layers.

Is there a mechanism for self reference in SQL? I can’t think of a way of defining one structure in a DB in terms of the structure of another. There’s no way that I can think of of being explicit about the nature of the relationship between two entities. Is there a way for you to state in your relational model facts like this:

{?s CommonName ?c.} => {?s Magnitude ?m. ?m greaterThan 6.}

i.e. if it has a common name then it must be visible to the naked eye. I guess you’d do that with a relational view so that you could query whether the view ‘nakedEyeStars’ contains star 123. Of course CommonName could apply to botanical entities (plants) as well as to stars, but I imagine you’d struggle to create a view that merged data from the plant table and the star table.

So, in conclusion, there’s plenty of ways that RDF specifically addresses the problems it seeks to address – data interchange, standards definition, KR, mashups – in a distributed web-wide way. RDBMSs address the problems faced by programmers at the coal face in the 60s and 70s – efficient, standardized, platform-independent data storage and retrieval. The imperative that created a need for RDBMSs in the 60s is not going away, so I doubt databases will be going away any time soon either. In fact they can be exposed to the world as triples without too much trouble. The problem is that developers need more than just data storage and retrieval. They need intelligent data storage and retrieval.

New Resources for LinqToRdf

John Mueller recently sent through a link to a series of articles on working with RDF. As well as being a useful introduction to working with RDF, they use LinqToRdf for code examples.

They provide information on hosting RDF files as well as querying them using LinqToRdf. They show how easy it is to get semantic web applications up and running on .NET in no time at all. Please read the articles and share the links around.

John also told me about his new book LINQ for Dummies, which has a section on LinqToRdf. I’ve not had a chance to read it yet. I would welcome any feedback, which I’ll pass through to John. I understand that the content is broadly similar to the articles on DevSource.com, placing more emphasis on LINQ than RDF.  Again, please take a look and let me know what you think.

Not another mapping markup language!

Kingsley Idehen has again graciously given LinqToRdf some much needed link-love. He mentioned it in a post that was primarily concerned with the issues of mapping between the ontology, relational and object domains. His assertion is that LinqtoRdf, being an offshoot of an ORM related initiative, is reversing the natural order of mappings. He believes that in the world of ORM systems, the emphasis should be in mapping from the relational to the object domain.

I think that he has a point, but not for the reason he’s putting forward. I think that the natural direction of mapping stems from the relative richness of the domains being mapped. The impedence mismatch between the relational and object domains stems from (1) the implicitness of meaning in the relationships of relational systems and (2) the representation of relationships and (3) type mismatches.

If the object domain has great expressiveness and explicit meaning in relationships it has a ‘larger’ language than that expressible using relational databases. Relationships are still representable, but their meaning is implicit. For that reason you would have to confine your mappings to those that can be represented in the target (relational) domain. In that sense you get a priority inversion that forces the lowest common denominator language to control what gets mapped.

The same form of inversion occurs between the ontological and object domains, only this time it is the object domain that is the lowest common denominator. OWL is able to represent such things as restriction classes and multiple inheritance and sub-properties that are hard or impossible to represent in languages like C# or Java. When I heard of the RDF2RDB working group at the W3C, I suggested (to thunderous silence) that they direct their attentions to coming up with a general purpose mapping ontology that could be used for performing any kind of mapping.

I felt that it would have been extremely valuable to have a standard language for defining mappings. Just off the top of my head I can think of the following places where it would be useful:

  1. Object/Relational Mapping Systems (O/R or ORM)
  2. Ontology/Object Mappings (such as in LinqToRdf)
  3. Mashups (merging disparate data sources)
  4. Ontology Reconciliation – finding intersects between two sets of concepts
  5. Data cleansing
  6. General purpose data access layer automation
  7. Data export systems
  8. Synchronization Systems (i.e. keeping systems like CRM and AD in sync)
  9. mapping objects/tables onto UIs
  10. etc

You can see that most of these are perennial real-world problems that programmers are ALWAYS having to contend with. Having a standard language (and API?) would really help with all of these cases.

I think such an ontology would be a nice addition to OWL or RDF Schema, allowing a much richer definition of equivalence between classes (or groups or parts of classes). Right now one can define a one-to-one relationship using the owl:equivalentClass property. It’s easy to imagine that two ontology designers might approach a domain from such orthogonal directions that they find it hard to define any conceptual overlap between entities in their ontologies. A much more complex language is required to allow the reconciliation of widely divergent models.

I understand that by focusing their attentions on a single domain they increase their chances of success, but what the world needs from an organization like the W3C is the kind of abstract thinking that gave rise to RDF, not another mapping markup language!


Here’s a nice picture of how LinqToRdf interacts with Virtuoso (thanks to Kingsley’s blog).

How LINQ uses LinqToRdf to talk to SPARQL stores

How LINQ uses LinqToRdf to talk to SPARQL stores