Automata-Based Programming With Petri Nets – Part 1

Petri Nets are extremely powerful and expressive, but they are not as widely used in the software development community as deterministic state machines. That’s a pity – they allow us to solve problems beyond the reach of conventional state machines. This is the first in a mini-series on software development with Petri Nets. All of the code for a full feature-complete Petri Net library is available online at GitHub. You’re welcome to take a copy, play with it and use it in your own projects. Code for this and subsequent articles can be found at

Continue reading

Semantic Development Environments

The semantic web is a GOOD THING by definition – anything that enables us to create smarter software without also having to create Byzantine application software must be a step in the right direction. The problem is – many people have trouble translating the generic term “smarter” into a concrete idea of what they would have to do to achieve that palladian dream. I think a few concrete ideas might help to firm up people’s understanding of how the semantic web can help to deliver smarter products.

Software Development as knowledge based activity

In this post I thought it might be nice to share a few ideas I had about how OWL and SWRL could help to produce smarter software development environments. If you want to use the ideas to make money, feel free to do so, just consider them as released under the creative commons attribution license. Software development is the quintessential knowledge based activity. In the process of producing a modern application a typical developer will burn through knowledge at a colossal rate. Frequently, we will not reserve headspace for a lot of the knowledge we acquire to solve a task. Frequently, we bring together the ideas, facts, standards, API skills and problem requirements needed to solve a problem then just as quickly forget it all. The unique combination is never likely to arise again.

I’m sure we could make a few comments about how it’s more important to know where the information is than to know what it is – a fact driven home to me by my Computer Science lecturer John English, who seemed to be able to remember the contents page of every copy of the Proceedings of the ACM back to the ’60s. You might also be forgiven for thinking this wasn’t true , given the current obsession with certifications. We could also comment about how some information is more lasting than others, but my point is that every project these days seems to combine a mixture of ephemera, timeless principles and those bits that lie somewhere between the two (called ‘Best Practice’ in current parlance ;).

Requires cognitive assistance
Software development, then, is a knowledge intensive activity that brings together a variety of structured and unstructured information to allow the developer to produce a system that they endeavor to show is equivalent to a set of requirements, guidelines, nuggets of wisdom and cultural mores that are defined or mandated at the beginning of the project. Doesn’t this sound to you like exactly the environment for which the semantic web technology stack was designed?

Incidentally, the following applications don’t have much to do with the web, so perhaps they demonstrate that the term ‘Web 3.0’ is limiting and misleading. It’s the synergy of the complementary standards in the semantic web stack that makes it possible to deliver smarter products and to boost your viability in an increasingly competitive market place.


OK, so the extended disclaimer/apology is now out of the way and I can start to talk about how the semantic web could offer help to improve the lives of developers. The first place I’ll look is at documentation. There are many types of documentation that are used in software development. In fact, there is a different form of documentation defined for each specific stage of the software lifecycle from conception of an idea through to its realization in code (and beyond). Each of these forms of documentation is more or less formally structured with different kinds of information related to documents and other deliverables that came before and after. This kind of documentation is frequently ambiguous, verbose and often gets written for the sake of compliance and then gets filed away and never sees the light of day again. Documentation for software projects needs to be precise, terse, rich and most of all useful.

Suggestion 1.

Use ontologies (perhaps standardised by the OMG) for the production of requirements. Automated tools could be used to convert these ontologies into human-readable reports or tools could be used to answer questions about specific requirements. A reasoner might be able to deduce conflicts or contradictions from a set of requirements. It might also be able to offer suggestions about implementations that have been shown to fulfill similar requirements in other projects. Clearly, the sky’s the limit in how useful an ontology, reasoner and rules language could be. It should also help documentation to be much more precise and less verbose. There is also scope for documentation reuse, specialization and for there to be diagramming and code generation driven off of documentation.

Documentation is used heavily inside the source code used by developers to write software too. It serves to provide an explanation for the purpose of a software component, to explain how to use it, to provide change notes, to generate API documentation web-sites, and to even store to-do list items or apologies for later reference. In .NET and Java, and now many other programming languages, it is common to use formal languages (like XML markup) to provide commonly used information. An ontology might be helpful in providing a rich and extensible language for representing code documentation. The use of URIs to represent unique entities means that the documentation can be the subject or other documents and can reach out to the wider ecology of data about the system.

Suggestion 2.

Provide an extensible ontology to allow the linkage of code documentation with the rest of the documentation produced for a software system. Since all parts of the software documentation process (being documented in RDF) will have unique URIs, it should be easy to link the documentation for a component to the requirements, specifications, plans, elaborations, discussions, blog posts and other miscellanea generated. Providing semantic web URIs to individual code elements helps to integrate the code itself into other semantic systems like change management and issue tracking systems. Use of URIs and ontologies within source code helps to provide a firm, rich linkage between source code and the documentation that gave rise to it.

Suggestion 3.

Boosted with richer, extensible markups to represent the meaning and wider documentation environment means that traditional intellisense can be augmented with browsers that provide access to all other pertinent documentation related to a piece of code. Imagine hovering over an object reference and getting links not only to a web site generated from the code commentary but to all the requirements that the code fulfills, to automated proofs demonstrating that the code matches the requirements, to blog posts written by the dev team and to MP3s taken during the brainstorming and design sessions during which this component was conceived.

It doesn’t take much imagination to see that some simple enhancements like these can provide a ramp for the continued integration of the IDE, allowing smoother cooperation between teams and their stakeholders. Making documentation more useful to all involved would probably increase the chances that people would give up Agile in favour of something less like the emperor’s clothes.

Suggestion 4.

Here’s some other suggestions about how documentation in the IDE could be enriched.
○ Guidelines on where devs should focus their attention when learning a new API
○ SPARQL could be exposed by code publisher
§ Could provide a means to publish documentation online
○ Automatic publishing of DOAP documents to an enterprise or online registry, allowing software registries.

Dynamic Systems

Augmenting the source code of a system with URIs that can be referenced from anywhere opens the semantic artifacts inside an application to analysis and reference from outside. Companies like Microsoft have already described their visions for the production of documentation systems that allow architects to describe how a system hangs together. This information can be used by other systems to deploy, monitor, control and scale systems in production environments.

I think that their vision barely glimpses what could be achieved through the use of automated inference systems, rich structured machine readable design documentation, and systems that are for the first time white boxes. I think that DSI-style declarative architecture documents are a good example of what might be achieved through the use of smart documentation. There is more though.

Suggestion 5.

Reflection and other analysis tools can gather information about the structure, inter-relationships and external dependencies of a software system. Such data can be fed to an inference engine to allow it to make comparisons about the runtime behavior of a production system. Rules of inference can help it to determine what the consequences of violating a rule derived from the architect or developers documentation. Perhaps it could detect when the system is misconfigured or configured in a way that will force it to struggle under load. Perhaps it can find explanations for errors and failures. Rich documentation systems should allow developers to indicate deployment guidelines (i.e. this component is thread safe, or is location independent and scalable). Such documentation can be used to predict failure modes, to direct testing regimes and to predict optimal deployment patterns for specific load profiles.


I wrote this post because I know I’ll never have time to pursue these ideas, but I would dearly love to see them come to pass. Why don’t you get a copy of LinqToRdf, crack open a copy of Coco/R and see whether you can implement some of these suggestions. And if you find a way to get rich doing it, then please remember me in your will.

Object Modeling is Vocabulary Design

Andrew Cantos raised some interesting philosophical points in reply to my partially tongue in cheek post The Great Domain Model Debate – Solved the other day. As ever, my short reply turned into a blog post and this is it. Andrew’s point was that there is a metaphorical link between objects in a domain model and elementary particles in some physical system. The ability of these elements to take part in the wider system is often a function of their sheer simplicity rather than being loaded with complex properties. He use the example of Oxygen, as an example of something that can take part in many reactions, but which does not define or characterize those reactions. I extended the metaphor to observe that the same holds true when comparing Anemic Domain Models with their Rich brethren.

I like his metaphor. The metaphor I tend use when I think about this issue is related to human languages. Words are poor carriers of meaning on their own, in the same sense that rich objects are poor carriers of business functionality. A word’s specific value comes within the dynamic context of a sentence. I.e it’s exact meaning and value can only be resolved when composed together in a richer context.

Likewise, the same happens in an OO system – the analogue of the ‘sentence’ here is the thread of execution, or the transaction script or whathaveyou. they give meaning to the data carried by the anemic object. Without that context the object is worthless. What a RDM seeks to do is carry with the object the full set of possible contexts. It also seeks to restrict that set of contexts to a manageable set.

I can sympathize with that aim – ADM’s do little to guarantee that they get used right. RDMs do. However, I think that as with a science, an enterprise system needs to have a commonly agreed shared vocabulary. With that, a greater richness of communication becomes possible. If however, you were restricted in the ways you could use these words, you may have greater precision, but communication would become a chore, and you probably wouldn’t bother.

You can extend this whole ‘enterprise vocabulary’ metaphor even further. If you look at a typical, poorly coded or governed system, you will often see a situation where there are a multitude of little DTOs all of which contain roughly the same data, but just those fields that are needed to service a given page or screen. This situation is analogous to the situation in an immature science where there are many words freighted with slightly different meanings. Confusion results when a speaker intends different things from the listener. So too in software, where the lack of a commonly agreed object model serves to add confusion to the development process, and to increase he likelihood of errors creeping into a system.

What does this imply? It seems to me that the right approach (assuming the metaphor holds true) is that there ought to be a  well defined, definitive and widely shared object model within an organization. All system should use it, and the organization should mint new classes with great care and forethought. This of course ties in with the efforts of various groups in the Semantic Web area, who are attempting to do just that in domains a widely flung as life sciences and disaster relief. The fact that the efforts are inter-organizational means that there will be less tolerance for poorly advised ‘pragmatism’.

Which can only be a good thing in the long run. Right?

Dynamic Strongly-Typed Configuration in C#

    I’ve written at great length in the past about the perils of configuration, and I thought I’d written as much as I was willing on the topic. But I thought it was worth describing this solution, since it was so neat, and easy, and had most of the benefits of text based configuration and strongly typed inline configuration. I was recently messing about with some WCF P2P code, and the setup code had some configuration that looked like a likely candidate for a strongly typed configuration object that wouldn’t change frequently. I think this solution neatly addresses one of the main objections to hard coded configuration, which is that we do sometimes need to change configuration data at runtime without having to take down the servers or recompile them.

    The idea behind this solution stems from the use of a plug-in architecture such as the forthcoming System.AddIn namespace to arrive in VS2008. In that you get the options to load a namespace from a designated directory and make use of types found inside of the assembly. Why not use the same approach with configuration? We can dynamically load configuration assemblies and then use a single configuration setting to specify which type from those assemblies would be used as the new configuration. This has all the benefits normally reserved for text based dynamic configuration such as System.Configuration.ConfigurationManager, but with the added benefits of strong typing, inheritance, calculated configuration settings and added performance of POCOs.

    My WCF program was a simple chat client that I hope to be able to use between members of my family. Typical configurations were MeshAddress, and CredentialType that are unlikely to ever change frequently. Each of these configuration settings was defined on an interface called IChatClientConfig. Implementing that full interface was my default configuration class called DefaultChatConfig. That provided all of my defaults, and is perfectly usable. I then specialized that class with some others, for example with a different mesh address for chatting with people at work. A class diagram for the configuration objects are shown below.


    Each class just provides a new implementation for the field that it provides a different value for.

    Loading the configuration is extremely simple. First you have to say which one of those classes you want to use for your configuration.

        <add key="P2PConfigSettings" value="ChatClient.Configuration.TechChatConfig, ChatClient.Configuration, Version="/>

    This simple app setting is the fully qualified type name of the TechChatConfig class on the bottom right of the diagram above. Which will be a default chat configuration with whatever tech chat configuration added. That’s all the prerequisites for loading configuration. Not all I need to do to load the configuration is this:

    private static IChatClientConfig GetConfigObject()
        string configType = ConfigurationManager.AppSettings["P2PConfigSettings"];
        Type t = Type.GetType(configType);
        return Activator.CreateInstance(t) as IChatClientConfig;

    Get whatever type I specified as a string from the configuration file, get the type specified by that string create and instance and return it. Simple. That configuration could be then stored as a singleton or whatever you need to do.

    [ServiceBehavior(InstanceContextMode = InstanceContextMode.Single)]
    public partial class Window1 : IPeerChat
        private IChatClientConfig configuration;

    In my case I just stored it in the window object I was using it for – my chat client only has one window! Now I can just use it, whenever I do any comms.

    private NetPeerTcpBinding CreateBindingForMesh()
        NetPeerTcpBinding binding = new NetPeerTcpBinding();
        binding.Resolver.Mode = config.PeerResolverMode;
        binding.Security.Transport.CredentialType = config.CredentialType;
        binding.MaxReceivedMessageSize = config.MaxReceivedMessageSize;
        return binding;

    So you see that the process is very simple. With the addition of an AddIn model we could use a file system monitor to watch the configuration file, detect changes and reload the configuration object singleton using the mechanism described above. That fulfils most of the requirements that we have for type safety, performance, dynamism, intelligence, and object orientation. Very few configuration scenarios that fall outside of the bounds of this solution should be solved using local configuration settings anyway – in those cases you really ought to be looking at an administration console and database.

Using Mock Objects When Testing LINQ Code

I was wondering the other day whether LINQ could be used with NMock easily. One problem with testing code that has not been written to work with unit tests is that if you test business logic, you often end up making multiple round-trips to the database for each test run. With a very large test suite that can turn a few minute’s work into hours for a test suite. the best approach to this is to use mock data access components to dispense canned results, rather than going all the way through to the database.

After a little thought it became clear that all you have to do is override the IOrderedQueryable<T>.GetEnumerator() method to return an enumerator to a set of canned results and you could pretty much impersonate a LINQ to SQL Table (which is the IOrderedQueryable implementation for LINQ to SQL). I had a spare few minutes the other day while the kids were going to sleep and I decided to give it a go, to see what was involved.

I’m a great believer in the medicinal uses of mock objects. Making your classes testable using mocking enforces a level of encapsulation that adds good structure to your code. I find that the end results are often much cleaner if you design your systems with mocking in mind.

Lets start with a class that you were querying over in your code. This is the type that you are expecting to get back from your query.

public class MyEntity
    public string Name
        get { return name; }
        set { name = value; }

    public int Age
        get { return age; }
        set { age = value; }

    public string Desc
        get { return desc; }
        set { desc = value; }

    private string name;
    private int age;
    private string desc;

Now you need to create a new context object derived from the DLINQ DataContext class, but providing a new constructor function. You can create other ways to insert the data you want your query to return, but the constructor is all that is necessary for this simple example.

public class MockContext : DataContext
    #region constructors

    public MockContext(IEnumerable col):base("")
        User = new MockQuery<MyEntity>(col);
    // other constructors removed for readability
    public MockQuery<MyEntity> User;

Note that you are passing in an untyped IEnumerable rather than an IEnumerable<T> or a concrete collection class. The reason is that when you make use of projections in LINQ, the type gets transformed along the way. Consider the following:

var q = from u in db.User
        where u.Name.Contains("Andrew") && u.Age < 40
        select new {u.Age};

The result of db.User is an IOrderedQueryable<User> query class which is derived from IEnumerable<User>. But the result that goes into q is an IEnumerable of some anonymous type created specially for the occasion. there is a step along the way when the IQueryable<User> gets replaced with an IQueryable<AnonType>. If I set the type on the enumerator of the canned results, I would have to keep track of them with each call to CreateQuery in my Mock Query class. By using IEnumerable, I can just pass it around till I need it, then just enumerate the collection with a custom iterator, casting the types to what I ultimately need as I go.

The query object also has a constructor that takes an IEnumerable, and it keeps that till GetEnumerator() gets called later on. CreateQuery and CloneQueryForNewType just pass the IEnumerable around till the time is right. GetEnumerator just iterates the collection in the cannedResponse iterator casting them to the return type expected for the resulting query.

public class MockQuery<T> : IOrderedQueryable<T>
    private readonly IEnumerable cannedResponse;

    public MockQuery(IEnumerable cannedResponse)
        this.cannedResponse = cannedResponse;

    private Expression expression;
    private Type elementType;

    #region IQueryable<T> Members

    IQueryable<S> IQueryable<T>.CreateQuery<S>(Expression expression)
        MockQuery<S> newQuery = CloneQueryForNewType<S>();
        newQuery.expression = expression;
        return newQuery;

    private MockQuery<S> CloneQueryForNewType<S>()
        return new MockQuery<S>(cannedResponse);

    #region IEnumerable<T> Members
    IEnumerator<T> IEnumerable<T>.GetEnumerator()
        foreach (T t in cannedResponse)
            yield return t;

    #region IQueryable Members
    Expression IQueryable.Expression
        get { return System.Expressions.Expression.Constant(this); }

    Type IQueryable.ElementType
        get { return elementType; }

For the sake of readability I have left out the required interface methods that were not implemented, since they play no part in this solution. Now lets look at a little test harness:

class Program
    static void Main(string[] args)
        MockContext db = new MockContext(GetMockResults());

        var q = from u in db.User
                where u.Name.Contains("Andrew") && u.Age < 40
                select u;
        foreach (MyEntity u in q)
            Debug.WriteLine(string.Format("entity {0}, {1}, {2}", u.Name, u.Age, u.Desc));

    private static IEnumerable GetMockResults()
        for (int i = 0; i < 20; i++)
            MyEntity r = new MyEntity();
            r.Name = "name " + i;
            r.Age = 30 + i;
            r.Desc = "desc " + i;
            yield return r;

The only intrusion here is the explicit use of MockContext. In the production code that is to be tested, you can’t just go inserting MockContext where you would have used the SqlMetal generated context. You need to use a class factory that will allow you to provide the MockContext on demand in a unit test, but dispense a true LINQ to SQL context when in production. That way, all client code will just use mock data without knowing it.

Here’s the pattern that I generally follow. I got it from the Java community, but I can’t remember where:

class DbContextClassFactory
    class Environment
        private static bool inUnitTest = false;

        public static bool InUnitTest
            get { return Environment.inUnitTest; }
            set { Environment.inUnitTest = value; }
        private static DataContext objectToDispense = null;

        public static DataContext ObjectToDispense
            get { return Environment.objectToDispense; }
            set { Environment.objectToDispense = value; }

    public object GetDB()
        if (Environment.InUnitTest)
            return Environment.ObjectToDispense;
        return new TheRealContext() as DataContext;

Now you can create your query like this:

DbContextClassFactory.Environment.ObjectToDispense = new MockContext(GetMockResults());
var q = from u in DbContextClassFactory.GetDB() where ...

And your client code will use the MockContext if there is one, otherwise it will use a LINQ to SQL context to talk to the real database. Perhaps we should call this Mockeries rather than Mock Queries. What do you think?

GroupJoins in LINQ

OWL defines two types of property: DatatypeProperty and ObjectProperty. An object property links instances from two Classes, just like a reference in .NET between two objects. In OWL you define it like this:

<owl:ObjectProperty rdf:ID=”isOnAlbum”>
  <rdfs:domain rdf:resource=”#Track”/>
  <rdfs:range rdf:resource=”#Album”/>

A DatatypeProperty is similar to a .NET property that stores some kind of primitive type like a string or an int. In OWL it looks like this:

<owl:DatatypeProperty rdf:ID=”fileLocation”>
  <rdfs:domain rdf:resource=”#Track” />   
  <rdfs:range  rdf:resource=”&xsd;string”/>

The format is very much the same, but the task of querying for primitive types in LINQ and SPARQL is easy compared to performing a one to many query like a SQL Join. So far, I have confined my efforts to DatatypeProperties, and tried not to think about ObjectProperties too much. But the time of reckoning has come – I’ve not got much else left to do on LinqToRdf except ObjectProperties.

Here’s the kind of LINQ join I plan to implement:

public void TestJoin()
    TestContext db = new TestContext(CreateSparqlTripleStore());
    var q = from a in db.Album 
            join t in db.Track on a.Name equals t.AlbumName into tracks
            select new Album{Name = a.Name, Tracks = tracks};
    foreach(var album in q){
        foreach (Track track in album.Tracks)

This uses a GroupJoin to let me collect matching tracks and store them in a temporary variable called tracks. I then insert the tracks into the Tracks property on the album I’m newing up in the projection. I need to come up with a SPARQL equivalent syntax, and convert the expression passed for the join into that. SPARQL is a graph based query language, so I am going to be converting my requests into the usual SPARQL triple format, and then using the details from the NewExpression on the query to work out where to put the data when I get it back.

With the non-join queries I have been testing my query provider on, I have observed that for each syntactical component of the query I was passed an Expression tree, representing its contents. With a GroupJoin, you get one, and it contains everything you need to perform the query. My first quandary is over the process of converting this new expression structure into a format that my existing framework can understand. Below is a snapshot of the expression tree created for the join I showed above.

GroupJoin Expression contents

There are five parameters in the expression:

  1. The query object on the Album. That’s the “a in db.Album” part.
  2. The query object on the Track. The “t in db.Track” part.
  3. A lambda function from an album to its Name.
  4. A lambda function from a track to its AlbumName.
  5. A projection creating a new Album, and assigning the tracks collected to the Tracks collection on the newly created Album.

Parameters 1 & 2 are LinqToRdf queries that don’t need to be parsed and converted. I can’t just ask them to render a query for me, since they don’t have any information of value to offer me other than the OriginalType that they were created with. They have received no expressions filtering the kind of data that they return, and they’ll never have their results enumerated. These query objects are just a kind of clue for the GroupJoin about how to compose the query. They can tell it where the data that it’s looking for is to be found.

Here’s how I would guess the SPARQL query would look:

SELECT ?Name ?Title ?GenreName <snip> 
    _:a a a:Album .
    _:t a a:Track .
    _:a a:name ?Name.
    _:t a:albumName ?Name .
    OPTIONAL {_:t a: ?Title}
    OPTIONAL {_:t a: ?GenreName}

We can get the names for blank nodes _:a and _:t from the parameter collections of the GroupJoins parameters 3 and 4 respectively. We know that we will be equating ?Name on _:a and ?Name on _:t since those are the lambda functions provided and that’s the format of the join. The rest of the properties are included in optional sections so that if they are not present it won’t stop the details of the OWL instance coming back. By using

    _:a a:name ?Name.
    _:t a:albumName ?Name .

We achieve the same as equality, since two things that are equal to the same are equal to each other. That restricts the tracks to those that are part of an album at the same time.

I’m not sure yet what I will do about the projection, since there is an intermediate task that needs to be performed: to insert the temporary variable ‘tracks’ into the Album object after it has been instantiated. More on that once I’ve found out more.

Easily Bored?

Darren Neimke posted some interesting thoughts today about the way developers lose their drive on a project, and how it’s reflected in SCRUM meetings. He thought that it might be due to the SCRUM meetings themselves. Daniel Crowley-Wilson has another idea – the developers are just bored.

Developers relish challenges and opportunities to do new things, and solve novel problems. As Daniel says, about midway through a project, there is little novelty in the problems left to be solved. At the end there is just the soul destroying finishing touches, which we all know have to be done, but which we hate doing.

I think that ‘good’ developers (Daniel’s phrase) are a particular breed. They are stimulus hungry people. They tend to quickly become immune to the initial piquancy of stimuli, entertainments or whatever interests them this week. They are not likely to remain interested in a domain or technology for long.

Evidence of this trait can be seen by the amount of staff turnover that most software vendors suffer, or the amount of technological churn that the developers tend to create. Those are the negatives – there’s also positives, like the rampant pace of forward progress. Developers can (given practice & solitude) sustain a high level of attention on a topic for long periods of time. I think this just exacerbates the problem of their easy boredom in the long run.

Because of the two characteristics of easy boredom and manic singlemindedness, Darren’s solution will probably not solve the problem either – the problem is that they require fresh inputs. Both in the job and in the SCRUM meetings. Perhaps your best bet, Darren, is to periodically change the format of the SCRUM meetings, and mix up the teams if you can.

Here’s a simple test to see whether a team member falls into this category – ask them some of the following:

  • how many hobbies have they had
  • how often do they change their desktop backgrounds
  • how frequently do they change jobs
  • how many projects do they have on the go, or up their sleeves
  • how many ideas for killer apps have they had (and not followed up)