South America? Innocent victims in the war on terror!

Are you kidding me? tectonic causes for south american earthquakes was a total sham! Think about it! Everyone knows that there’s a war on. And have you noticed that the CIA has started to act very strangely? They obviously don’t want this story getting out. I mean, what would happen if people began asking Why are they all near the USA? Well, they may be able to fool the sheeple, but the members of the USGS aren’t swallowing their story. Look, don’t take it from me; General Colin Powell is convinced as well. But we have to act fast, because Yellowstone will vaporize the USA. I just wanted you to be aware of this, in case I disappear.

wired

Less Intrusive Visitors

Forgive the recent silence – I’ve been in my shed.

Frequently, I need some variation on the Visitor or HierarchicalVisitor patterns
to analyse or transform an object graph. Recent work on a query builder
for an old-skool query API sent my thoughts once again to the Visitor pattern. I
normally hand roll these frameworks based on my experiences with recursive
descent compilers, but this time I thought I’d produce a more GoF-compliant
implementation.

The standard implementation of the visitor looks a lot like the first code example. First you
define some sort of domain model (often following the composite pattern).
This illustration doesn’t bother with composite. I’ll show one later on, with an
accompanying HierarchicalVisitor implementation.

abstract class BaseElement {
  void Accept(IVisitor v);
  }
class T1 : BaseElement {
  void Accept(IVisitor v) {
    v.visit(this);
  }
}
class T2 : BaseElement {
  void Accept(IVisitor v) {
    v.visit(this);
  }
}
class T3 : BaseElement {
  void Accept(IVisitor v) {
    v.visit(this);
  }
}
interface IVisitor{
  void Visit (T1 t1);
  void Visit (T2 t2);
  void Visit (T3 t3);
}

Here’s an implementation of the visitor, normally you’d give default
implementations via and abstract base class. I’ll show how that’s done later.

class MyVisitor : IVisitor {
  void Visit (T1 t1) {
    // do something here
  }
  void Visit (T2 t2) {
    // do something here
  }
  void Visit (T3 t3) {
    // do something here
  }
}

The accept methods are on the domain model entities themselves. What if I have a
composite graph of objects that are not conveniently derived from some abstract
class or interface for my convenience? What if I want to iterate or navigate
the structures in alternate ways. What if I don’t want to (or can’t) pollute
my domain model with visitation code?

I thought it might be cleaner to factor out the responsibility for the
dispatching into another class – a Dispatcher. I provide the Dispatcher from my
client code and am still able to visit each element in turn. Surprisingly, the
result is slightly cleaner than the standard implementation of the pattern,
sacrificing nothing, but gaining a small increment in applicability.

Let’s contrast this canonical implementation with one that uses anemic objects
for the domain model. First we need to define a little composite pattern to
iterate over. This time, I’ll give the abstract base class for the entities
and for the visitors and show a composite pattern as well.

abstract class AbstractBase {
  public string Name {get;set;}
}
class Composite : AbstractBase {
  public string NonTerminalIdentifier { get; set; }
  public Composite(string nonTerminalIdentifier) {
    Name = "Composite";
    NonTerminalIdentifier = nonTerminalIdentifier;
  }
  public List SubParts = new List();
}
class Primitive1 : AbstractBase {
  public Primitive1() {
    Name = "Primitive1";
  }
}
class Primitive2 : AbstractBase {
  public Primitive2() {
    Name = "Primitive2";
  }
}

A composite class plus a couple of primitives. Next, Lets look at the visitor
interface.

interface IVisitor {
  void Visit(Primitive1 p1);
  void Visit(Primitive2 p2);
  bool StartVisit(Composite c);
  void EndVisit(Composite c);
}

According to the discussions at the Portland pattern repository, this could be
called the HierarchicalVisitor pattern, but I suspect most applications of
visitor are over hierarchical object graphs, and they mostly end up like this so
I won’t dwell on the name too much. True to form, it provides mechanisms to
visit each type of element allowed in our object graph. Next, the Dispatcher that
controls the navigation over the object graph. This is the departure from the
canonical model. A conventional implementation of visitor places this code in
the composite model itself, which seems unnecessary. Accept overloads are
provided for each type of the domain model.

class Dispatcher {
  public static void Accept(Primitive1 p1, TV visitor)
    where TV : IVisitor {
    visitor.Visit(p1);
  }
  public static void Accept(Primitive2 p2, TV visitor)
    where TV : IVisitor {
    visitor.Visit(p2);
  }
  public static void Accept(Composite c, TV visitor)
    where TV : IVisitor {
    if (visitor.StartVisit(c)) {
      foreach (var subpart in c.SubParts) {
        if (subpart is Primitive1) {
          Accept(subpart as Primitive1, visitor);
        }
        else if (subpart is Primitive2) {
          Accept(subpart as Primitive2, visitor);
        }
        else if (subpart is Composite) {
          Accept(subpart as Composite, visitor);
      }
    }
    visitor.EndVisit(c);
    }
  }
}

The dispatcher’s first parameter is the object graph element
itself. This provides the context that was implicit with the conventional
implementation. This is a trade-off. On the one hand you cannot access any
private object information inside the dispatch code. On the other hand you can
have multiple different dispatchers for different tasks. Another drawback with
an ‘external’ dispatcher is the need for old-fashioned dispatcher switch
statements in the Composite acceptor. The Composite stores its sub-parts as
references to the AbstractBase class, so it needs to decide manually what the
Accept method is that must handle the sub-part in question.

The implementation for a visitor is much the same as in a normal implementation.
A default implementation of the visit functions is given that
does nothing. To implement a HierarchicalVisitor, the
default StartVisit must return true to allow iteration of the
subparts of a Composite to proceed.

class BaseVisitor : IVisitor {
  public virtual void Visit(Primitive1 p1) { }
  public virtual void Visit(Primitive2 p2) { }
  public virtual bool StartVisit(Composite c) {
    return true;
  }
  public virtual void EndVisit(Composite c) { }
}

Here’s a Visitor that simply records the name of who gets visited.

class Visitor : BaseVisitor {
  public override void Visit(Primitive1 p1) {
    Debug.WriteLine("p1");
  }
  public override void Visit(Primitive2 p2) {
    Debug.WriteLine("p2");
  }
  public override bool StartVisit(Composite c) {
    Debug.WriteLine("->c");
    return true;
  }
  public override void EndVisit(Composite c) {
    Debug.WriteLine("<-c");
  }
}

Given an object graph of type Composite, it is simple to use this little framework.

Dispatcher.Accept(objGraph, new Visitor1());

I like this way of working with visitors more than the conventional
implementation – it makes it possible to provide a good visitor implementation on
thrid party frameworks (yes, I’m thinking of LINQ expression trees). It is no
more expensive to extend with new visitors, and it has the virtue that you can
navigate an object graph in any fashion you like.

Sequential script loading on demand

This little script uses the JQuery getScript command, enforcing sequential loading order to ensure script dependencies are honoured:

function LoadScriptsSequentially(scriptUrls, callback)
{
    if (typeof scriptUrls == 'undefined') throw "Argument Error: URL array is unusable";
    if (scriptUrls.length == 0 && typeof callback == 'function') callback();
    $.getScript(scriptUrls.shift(), function() { LoadScriptsSequentially(scriptUrls, callback); });
}

Here’s how you use it:

function InitialiseQueryFramework(runTests)
{
    LoadScriptsSequentially([
        "/js/inheritance.js",
        "/js/expressions.js",
        "/js/refData.js",
        "/js/queryRenderer.js",
        "/js/sha1.js",
        "/js/queryCache.js",
        "/js/unit-tests.js"],
        function()
        {
            queryFramework = new QueryManager("#query");
            if (runTests) RunTestSuite();
        });
}

I love javascript now and can’t understand why I avoided it for years. I particularly love the hybrid fusion of functional and procedural paradigms that possible in JS. You can see that at work in the parameters being passed into the recursive call to LoadScriptsSequentially.

What do you think? Is there a better/nicer way to do this?

Quantum Reasoners Hold Key to Future Web

Last year, a company called DWave Systems announced their quantum computer (the ‘Orion’) – another milestone on the road to practical quantum computing. Their controversial claims seem worthy in their own right but they are particularly important to the semantic web (SW) community. The significance to the SW community was that their quantum computer solved problems akin to Grover’s Algorithm speeding up queries of disorderly databases.

Semantic web databases are not (completely) disorderly and there are many ways to optimize the search for matching triples to a graph pattern. What strikes me is that the larger the triple store, the more compelling the case for using some kind of quantum search algorithm to find matches. DWave are currently trialing 128qbit processors, and they claim their systems can scale, so I (as a layman) can see no reason why such computers couldn’t be used to help improve the performance of queries in massive triple stores.

 What I wonder is:

  1. what kind of indexing schemes can be used to impose structure on the triples in a store?
  2. how can one adapt a B-tree to index each element of a triple rather than just a single primary key – three indexes seems extravagant.
  3. are there quantum algorithms that can beat the best of these schemes?
  4. is there is a place for quantum superposition in a graph matching algorithm (to simultaneously find matching triples then cancel out any that don’t match all the basic graph patterns?)
  5. if DWave’s machines could solve NP-Complete problems, does that mean that we would then just use OWL-Full?
  6. would the speed-ups then be great enough to consider linking everyday app data to large scale-upper ontologies?
  7. is a contradiction in a ‘quantum reasoner’ (i.e. a reasoner that uses a quantum search engine) something that can never occur because it just cancels out and never appears in the returned triples? Would any returned conclusion be necessarily true (relative to the axioms of the ontology?)

Any thoughts?

UPDATE
DWave are now working with Google to help them improve some of their machine learning algorithms. I wonder whether there will be other research into the practicality of using DWave quantum computing systems in conjunction with inference engines? This could, of course, open up whole new vistas of services that could be provided by Google (or their competitors). Either way, it gives me a warm feeling to know that every time I do a search, I’m getting the results from a quantum computer (no matter how indirectly). Nice.

Semantic Overflow Highlights I

Semantic Overflow has been active for a couple of weeks. We now have 155 users and 53 questions. We’ve already had some very interesting questions and some excellent detailed and thoughtful responses. I thought, on Egon’s instigation, to  bring together, from the site’s BI stats, some of the highlights of last week.

The best loved question this week came from Jerven Bolleman who wanted to know whether there was a “Simple CLI useable OWL Reasoner“. The most popular answer (and highest voted answer) came from Ivan Herman who provided tool suggestions, guidance and insights into the current research directions.

The most viewed question was from Akshat Shrivastava who asked “How do I make my homepage/personal blog semantic ?“. This question garnered 5 very good answers, including a particularly good one from Bill Roberts, who provided a lot of detail on his own experiences of doing just that.

The highest voted answer was from Egon Willighagen in answer to a question by ‘dusoft‘ as to “Why launching SemanticOverflow with too little users with zero reputation sucks?”. He very helpfully explained how to bootstrap your reputation and make a success of the site. I’m glad to say that dusoft’s was the only negative comment so far, and that the general response of other site users has been very positive. Mike McClintock even told me that “SemanticOverflow is my new favorite thing“. Thanks, Mike, I hope it stays that way!

In terms of gathering reputation, Ian Davis of Talis is the clear front runner with the following questions:

He also provided 17 very good answers. Thanks, Ian, and thanks to all the others who are already making Semantic Overflow a great site!

www.SemanticOverflow.com – the Web 2.0 Q&A site for all things Web 3.0.

www.SemanticOverflow.com is a new site based on the hugely popular StackOverflow.com, devoted to Q&A on anything related to the semantic web. The site is very new (created today) and I’m trying to get as many people to visit as I can, so please come and post your questions and together we’ll create a thriving community dedicated just to the semantic web. It’s free to join, free to ask questions, and most important of all – free to see the answers. You don’t even have to sign up to view, ask or answer questions. If you do join, all you get is kudos.
StackOverflow has been an overnight sensation because it combines many of the traditional features of wikis, newgroups and social media platforms combined with a reputation system that promotes community involvement and high quality discussion. I’m sure that as the Semantic Web starts to go exponential, Semantic Overflow will be a useful forum for more than just technical questions. I hope you’ll use it for speculation about the directions of the field itself. Unlike StackOverflow, non-technical discussions won’t be moderated out, provided they are on-topic. I think that in such an interesting and emerging space, discussion and dissemination of knowledge is critical. If you agree, please come and join us for a question or two.
Please also tell your friends, colleagues, relatives, acquaintances, pets and neighbors and ask them all to visit the site. If you have any suggestions about other places I could promote the site in – feel free to provide an answer here.

Can AOP help fix bad architectures?

I recently posted a question on Stack Overflow on the feasibility of using IL rewriting frameworks to rectify bad design after the fact. The confines of the answer comment area were too small to give the subject proper treatment so I though a new blog post was in order. Here’s the original question:

I’ve recently been playing around with PostSharp, and it brought to mind a problem I faced a few years back: A client’s developer had produced a web application, but they had not given a lot of thought to how they managed state information – storing it (don’t ask me why) statically on the Application instance in IIS. Needless to say the system didn’t scale and was deeply flawed and unstable. But it was a big and very complex system and thus the cost of redeveloping it was prohibitive. My brief at the time was to try to refactor the code-base to impose proper decoupling between the components.

At the time I tried to using some kind of abstraction mechanism to allow me to intercept all calls to the static resource and redirect them to a component that would properly manage the state data. The problem was there was about 1000 complex references to be redirected (and I didn’t have a lot of time to do it in). Manual coding (even with R#) proved to be just too time consuming – we scrapped the code base and rewrote it properly. it took over a year to rewrite.

What I wonder now is – had I had access to an assembly rewriter and/or Aspect oriented programming system (such as a PostSharp) could I have easily automated the refactoring process of finding the direct references and converted them to interface references that could be redirected automatically and supplied by factories.

Has anyone out there used PostSharp or similar systems to rehabilitate pathological legacy systems? How successful were the projects? Did you find after the fact that the effort was worth it? Would you do it again?

I subsequently got into an interesting (though possibly irrelevant) discussion with Ira Baxter on program transformation systems, AOP and the kind of intelligence a program needs to have in order to be able to refactor a system, preserving the business logic whilst rectifying any design flaws in the system.  There’s no space to discuss the ideas, so I want to expand the discussion here.

The system I was referring to had a few major flaws:

  1. The user session information (of which there was a lot) was stored statically on a specific instance of IIS. This necessitated the use of sticky sessions to ensure the relevant info was around when user requests came along.
  2. Session information was lost every time IIS recycled the app pool, thus causing the users to lose call control (the app was phone-related).
  3. State information was glommed into a tight bolus of data that could not be teased apart, so refactoring the app was an all-or-nothing thing.

As you can guess, tight coupling/lack of abstraction and direct resource dispensation were key flaws that prevented the system from being able to implement fail-over, disaster recovery, scaling, extension and maintenance.

This product is  in a very competitive market and needs to be able to innovate to stay ahead, so the client could ill afford to waste time rewriting code while others might be catching up. My question was directed in hindsight to the problem of whether one could retroactively fix the problems, without having to track down, analyse and rectify each tightly coupled reference between client code and state information and within the state information itself.

What I needed at the time was some kind of declarative mechanism whereby I could confer the following properties on a component:

  1. location independence
  2. intercepted object creation
  3. transactional persistence

Imagine, that we could do it with a mechanism like PostSharp’s multicast delegates:

[assembly: DecoupledAndSerialized(    
    AspectPriority = -1,
    AttributeTargetAssemblies = "MyMainAssembly",
    AttributeTargetTypes = "MainAssembly.MyStateData",
    AttributeTargetMembers = "*")]

What would this thing have to do to be able to untie the knots that people get themselves into?

  1. It would have to be able to intercept every reference to MainAssembly.MyStateData and replace the interaction with one that got the instance from some class factory that could go off to a database or some distant server instead.

    that is – introduce an abstraction layer and some IoC framework.
  2. It must ensure that the component managing object persistence checked into and out of the database appropriately (atomically) and all contention over that data was properly managed.
  3. It must ensure that all session specific data was properly retrieved and disposed for each request.

This is not a pipe dream by any means – there are frameworks out there that are designed to place ‘shims’ between layers of a system to allow the the shim to be expanded out into a full-blown proxy that can communicate through some inter-machine protocol or just devolve to plain old in-proc communication while in a dev environment. The question is, can you create a IL rewriter tool that is smart enough to work out how to insert the shims based on its own knowledge of good design principles? Could I load it up with a set of commandments graven in stone, like “never store user session data in HttpContext.Application”? If it found a violation of such a cardinal sin, could it insert a proxy that would redirect the flow away from the violated resource, exploiting some kind of resource allocation mechanism that wasn’t so unscaleable?

From my own experience, these systems require you to be able to define service boundaries to allow the system to know what parts to make indirect and which parts to leave as-is. Juval Lowy made a strong case for the idea that every object should be treated as a service, and all we lack is a language that will seamlessly allow us to work with services as though they were objects. Imagine if we could do that, providing an ‘abstractor tool’ as part of the build cycle. While I think he made a strong case, my experience of WCF (which was the context of his assertions) is that it would be more onerous to configure the blasted thing than it would be to refactor all those references. But if I could just instruct my IL rewriter to do the heavy lifting, then I might be more likely to consider the idea.

Perhaps PostSharp has the wherewithal to address this problem without us having to resort to extremes or to refactor a single line? PostSharp has two incredible features that make it a plausible candidate for such a job. the first is the Multicast delegate feature that would allow me to designate a group of APIs declaratively as the service boundary of a component. The original code is never touched, but by using an OnMethodInvocationAspect you could intercept every call to the API turning them into remote service invocations. The second part of the puzzle is compile-time instantiation of an Aspect – in this system your aspects are instantiated at compile time, given an opportunity to perform some analysis and then to serialize the results of that analysis for runtime use when the aspect is invoked as part of a call-chain. The combination of these two allows you to perform an arbitrary amount of compile time analysis prior to generating a runtime proxy object that could intercept those method calls necessary to enforce the rules designated in the multicast delegate. The aspect could perform reflective analysis and comparison against a rules base (just like FxCop) but with an added collection of refactorings it could take to alleviate the problem. The user might still have to provide hints about where to get and store data, or how the app was to be deployed, but given high level information, perhaps the aspect could self configure?

Now that would be a useful app – a definite must-have addition to any consultant’s toolkit.