C#

Not another mapping markup language!

Kingsley Idehen has again graciously given LinqToRdf some much needed link-love. He mentioned it in a post that was primarily concerned with the issues of mapping between the ontology, relational and object domains. His assertion is that LinqtoRdf, being an offshoot of an ORM related initiative, is reversing the natural order of mappings. He believes that in the world of ORM systems, the emphasis should be in mapping from the relational to the object domain.

I think that he has a point, but not for the reason he’s putting forward. I think that the natural direction of mapping stems from the relative richness of the domains being mapped. The impedence mismatch between the relational and object domains stems from (1) the implicitness of meaning in the relationships of relational systems and (2) the representation of relationships and (3) type mismatches.

If the object domain has great expressiveness and explicit meaning in relationships it has a ‘larger’ language than that expressible using relational databases. Relationships are still representable, but their meaning is implicit. For that reason you would have to confine your mappings to those that can be represented in the target (relational) domain. In that sense you get a priority inversion that forces the lowest common denominator language to control what gets mapped.

The same form of inversion occurs between the ontological and object domains, only this time it is the object domain that is the lowest common denominator. OWL is able to represent such things as restriction classes and multiple inheritance and sub-properties that are hard or impossible to represent in languages like C# or Java. When I heard of the RDF2RDB working group at the W3C, I suggested (to thunderous silence) that they direct their attentions to coming up with a general purpose mapping ontology that could be used for performing any kind of mapping.

I felt that it would have been extremely valuable to have a standard language for defining mappings. Just off the top of my head I can think of the following places where it would be useful:

  1. Object/Relational Mapping Systems (O/R or ORM)
  2. Ontology/Object Mappings (such as in LinqToRdf)
  3. Mashups (merging disparate data sources)
  4. Ontology Reconciliation – finding intersects between two sets of concepts
  5. Data cleansing
  6. General purpose data access layer automation
  7. Data export systems
  8. Synchronization Systems (i.e. keeping systems like CRM and AD in sync)
  9. mapping objects/tables onto UIs
  10. etc

You can see that most of these are perennial real-world problems that programmers are ALWAYS having to contend with. Having a standard language (and API?) would really help with all of these cases.

I think such an ontology would be a nice addition to OWL or RDF Schema, allowing a much richer definition of equivalence between classes (or groups or parts of classes). Right now one can define a one-to-one relationship using the owl:equivalentClass property. It’s easy to imagine that two ontology designers might approach a domain from such orthogonal directions that they find it hard to define any conceptual overlap between entities in their ontologies. A much more complex language is required to allow the reconciliation of widely divergent models.

I understand that by focusing their attentions on a single domain they increase their chances of success, but what the world needs from an organization like the W3C is the kind of abstract thinking that gave rise to RDF, not another mapping markup language!


Here’s a nice picture of how LinqToRdf interacts with Virtuoso (thanks to Kingsley’s blog).

How LINQ uses LinqToRdf to talk to SPARQL stores

How LINQ uses LinqToRdf to talk to SPARQL stores

Announcing LinqToRdf v0.8

I’m very pleased to announce the release of version 0.8 of LinqToRdf. This release is significant for a couple of reasons. Firstly, because it provides a preview release of RdfMetal and secondly because it is the first release containing changes contributed by someone other than yours truly. The changes in this instance being provided by Carl Blakeley of OpenLink Software.

LinqToRdf v0.8 has received a few major chunks of work:

  • New installers for both the designer and the whole framework
    WIX was proving to be a pain, so I downgraded to the integrated installer generator in Visual Studio.
  • A preview release of RdfMetal. I brought this release forward a little, on Carl Blakeley’s request, to coincide with a post he’s preparing on using OpenLink Virtuoso with LinqToRdf, so RdfMetal is not as fully baked as I’d planned. But it’s still worth a look. Expect a minor release in the next few weeks with additional fixes/enhancements.

I’d like to extend a very big thank-you to Carl for the the work he’s done in recent weeks to help extend and improve the mechanisms LinqToRdf uses to represent and traverse relationships. His contributions also include improvements in representing default graphs, and referencing multiple ontologies within a single .NET class. He also provided fixes around the quoting of URIs and some other fixes in the ways LinqToRdf generates SPARQL for default graphs. Carl also provided an interesting example application using OpenLink Virtuoso’s hosted version of Musicbrainz that is significantly richer than the test ontology I created for the unit tests and manuals.

I hope that Carl’s contributions represent an acknowledgement by OpenLink that not only does LinqToRdf support Virtuoso, but that there is precious little else in the .NET space that stands a chance of attracting developers to the semantic web. .NET is a huge untapped market for semantic web product vendors. LinqToRdf is, right now, the best way to get into semantic web development on .NET.

Look out for blog posts from Carl in the next day or two, about using LinqToRdf with OpenLink Virtuoso.

State Machines in C# 3.0 using T4 Templates

UPDATE: The original code for this post, that used to be available via a link on this page, is no longer available. I’m afraid that if you want to try this one out, you’ll have to piece it together using the snippets contained in this post. Sorry for the inconvenience – blame it on ISP churn.


Some time back I wrote about techniques for implementing non-deterministic finite automata (NDFAs) using some of the new features of C# 3.0. Recently I’ve had a need to revisit that work to provide a client with a means to generate a bunch of really complex state machines in a lightweight, extensible and easily understood model. VS 2008 and C# 3.0 are pretty much the perfect platform for the job – they combine partial classes and methods, lambda functions and T4 templates making it a total walk in the park. This post will look at the prototype system I put together. This is a very code intensive post – sorry about that, but it’s late and apparently my eyes are very red, puffy and panda like.

State machines are the core of many applications – yet we often find people hand coding them with nested switch statements and grizzly mixtures of state control and business logic. It’s a nightmare scenario making code completely unmaintainable for anything but the most trivial applications.

The key objective for a dedicated application framework that manages a state machine is to provide a clean way to break out the code that manages the state machine from the code that implements the activities performed as part of the state machine. C# 3.0 has a nice solution for this – partial types and methods.

Partial types and methods

A partial type is a type whose definition is not confined to a single code module – it can have multiple modules. Some of those can be written by you, others can be written by a code generator. Here’s an example of a partial class definition:

public partial class MyPartialClass{}

By by declaring the class to be partial, you say that other files may contain parts of the class definition. the point of this kind of structure is that you might have piece of code that you want to write by hand, and others that you want to have driven from a code generator, stuff that gets overwritten every time the generator runs. If your code got erased every time you ran the generator, you’d get bored very quickly. You need a way to chop out the bits that don’t change. Typically, these will be framework or infrastructure stuff.

Partial classes can also have partial methods. Partial methods allow you to define a method signature in case someone wants to define it in another part of the partial class. This might seem pointless, but wait and see – it’s nice. Here’s how you declare a partial method:

// the code generated part public partial class MyPartialClass {
    partial void DoIt(int x);
}

You can then implement it in another file like so:

// the hand-written part partial class MyPartialClass {
    partial void DoIt(int x)
    {
        throw new NotImplementedException();
    }
}

This is all a little abstract, right now, so let’s see how we can use this to implement a state machine framework. First we need a way to define a state machine. I’m going to use a simple XML file for this:

<?xml version="1.0" encoding="utf-8" ?> <StateModels> <StateModel ID="My" start="defcon1"> <States> <State ID="defcon1" name="defcon1"/> <State ID="defcon2" name="defcon2"/> <State ID="defcon3" name="defcon3"/> </States> <Inputs> <Input ID="diplomaticIncident" name="diplomaticIncident"/> <Input ID="assassination" name="assassination"/> <Input ID="coup" name="coup"/> </Inputs> <Transitions> <Transition from="defcon1" to="defcon2" on="diplomaticIncident"/> <Transition from="defcon2" to="defcon3" on="assassination"/> <Transition from="defcon3" to="defcon1" on="coup"/> </Transitions> </StateModel> </StateModels>

Here we have a really simple state machine with three states (defcon1, defcon2 and defcon3) as well as three kinds of input (diplomaticIncident, assassination and coup). Please excuse the militarism – I just finished watching a season of 24, so I’m all hyped up. This simple model also defines three transitions. it creates a model like this:

Microsoft released the Text Template Transformation Toolkit (T4) system with Visual Studio 2008. This toolkit has been part of GAT and DSL tools in the past, but this is the first time that it has been available by default in VS. It allows an ASP.NET syntax for defining templates. Here’s a snippet from the T4 template that generates the state machine:

<#@ template language="C#" #>
<#@ assembly name="System.Xml.dll" #>
<#@ import namespace="System.Xml" #>

<#
    XmlDocument doc = new XmlDocument();
    doc.Load(@"TestHarness\model.xml");
    XmlElement xnModel = (XmlElement)doc.SelectSingleNode("/StateModels/StateModel");
    string ns = xnModel.GetAttribute("ID");
    XmlNodeList states = xnModel.SelectNodes("descendant::State");
    XmlNodeList inputs = xnModel.SelectNodes("descendant::Input");
    XmlNodeList trns = xnModel.SelectNodes("descendant::Transition");
    #>
using System;
using System.Collections.Generic;

namespace <#=ns#> {
public enum <#=ns#>States : int{
<#
string sep = "";
foreach(XmlElement s in states)
    {
    Write(sep + s.GetAttribute("ID"));
    WriteLine(@"// " + s.GetAttribute("name"));
    sep = ",";
    }

#>
} // end enum <#=ns#>States

public enum <#=ns#>Inputs : int{
<#
sep = "";
foreach(XmlElement s in inputs)
    {
    Write(sep + s.GetAttribute("ID"));
    WriteLine(@"// " + s.GetAttribute("name"));
    sep = ",";
    }

#>
} // end enum <#=ns#>States

public partial class <#=ns#>StateModel{

        public <#=ns#>StateModel()
        {
            SetupStates();
            SetupTransitions();
            SetStartState();
        }
...

Naturally, there’s a lot in the template, but we’ll get to that later. First we need a representation of a state. You’ll see from the template that an enum get’s generated called <#=ns#>States. Here’s what it looks like for the defcon model.

public enum MyStates : int {
defcon1// defcon1 ,defcon2// defcon2 ,defcon3// defcon3 } // end enum MyStates 

This is still a bit too bare for my liking. I can’t attach an event model to these states, so here’s a class that can carry around one of these values:

public class State {
    public int Identifier { get; set; }
public delegate void OnEntryEventHandler(object sender, OnEntryEventArgs e);
    // ...public event OnEntryEventHandler OnEntryEvent;
    // ...}

There’s a lot left out of this, but the point is that as well as storing an identifier for a state, it has events for both entry into and exit from the state. This can be used by the event framework of the state machine to provide hooks for your custom state transition and entry code. The same model is used for transitions:

public class StateTransition {
    public State FromState { get; set; }
    public State ToState { get; set; }
public event OnStateTransitioningEventHandler OnStateTransitioningEvent;
public event OnStateTransitionedEventHandler OnStateTransitionedEvent;
}

Here’s the list of inputs that can trigger a transition between states:

public enum MyInputs : int {
diplomaticIncident// diplomaticIncident ,assassination// assassination ,coup// coup } // end enum MyStates

The template helps to define storage for the states and transitions of the model:

public static Dictionary<<#= ns#>States, State> states                = new Dictionary<<#= ns#>States, State>();
public static Dictionary<string, StateTransition> arcs                = new Dictionary<string, StateTransition>();
public State CurrentState { get; set; }

which for the model earlier, will yield the following:

public static Dictionary<MyStates, State> states = new Dictionary<MyStates, State>();
public static Dictionary<string, StateTransition> arcs = new Dictionary<string, StateTransition>();
public State CurrentState { get; set; }

Now we can create entries in these tables for the transitions in the model:

private void SetStartState()
{
    CurrentState = states[<#= ns#>States.<#=xnModel.GetAttribute("start")#>];
}

private void SetupStates()
{
<#
foreach(XmlElement s in states)
    {
    WriteLine("states[" + ns + "States."+s.GetAttribute("ID")+"] =               new State { Identifier = (int)"+ns+"States."+s.GetAttribute("ID")+" };");
    WriteLine("states[" + ns + "States."+s.GetAttribute("ID")+"].OnEntryEvent               += (x, y) => OnEnter_"+s.GetAttribute("ID")+"();");
    WriteLine("states[" + ns + "States."+s.GetAttribute("ID")+"].OnExitEvent               += (x, y) => OnLeave_"+s.GetAttribute("ID")+"(); ;");
    }
#>
}
private void SetupTransitions()
{
<#
foreach(XmlElement s in trns)
    {
    #>
    arcs["<#=s.GetAttribute("from")#>_<#=s.GetAttribute("on")#>"] = new StateTransition
    {
        FromState = states[<#= ns#>States.<#=s.GetAttribute("from")#>],
        ToState = states[<#= ns#>States.<#=s.GetAttribute("to")#>]
    };
    arcs["<#=s.GetAttribute("from")#>_<#=s.GetAttribute("on")#>"].OnStateTransitioningEvent              += (x,y)=>MovingFrom_<#=s.GetAttribute("from")#>_To_<#=s.GetAttribute("to")#>;
    arcs["<#=s.GetAttribute("from")#>_<#=s.GetAttribute("on")#>"].OnStateTransitionedEvent              += (x,y)=>MovedFrom_<#=s.GetAttribute("from")#>_To_<#=s.GetAttribute("to")#>;
    <#
    }
    #>
}

which is where the fun starts. First notice that we create a new state for each state in the model and attach a lambda to the entry and exit events of each state. For our model that would look like this:

private void SetupStates()
{
    states[MyStates.defcon1] = new State {Identifier = (int) MyStates.defcon1};
    states[MyStates.defcon1].OnEntryEvent += (x, y) => OnEnter_defcon1();
    states[MyStates.defcon1].OnExitEvent += (x, y) => OnLeave_defcon1();

    states[MyStates.defcon2] = new State {Identifier = (int) MyStates.defcon2};
    states[MyStates.defcon2].OnEntryEvent += (x, y) => OnEnter_defcon2();
    states[MyStates.defcon2].OnExitEvent += (x, y) => OnLeave_defcon2();

    states[MyStates.defcon3] = new State {Identifier = (int) MyStates.defcon3};
    states[MyStates.defcon3].OnEntryEvent += (x, y) => OnEnter_defcon3();
    states[MyStates.defcon3].OnExitEvent += (x, y) => OnLeave_defcon3();
}

For the Transitions the same sort of code gets generated, except that we have some simple work to generate a string key for a specific <state, input> pair. Here’s what comes out:

private void SetupTransitions()
{
    arcs["defcon1_diplomaticIncident"] = new StateTransition {
                 FromState = states[MyStates.defcon1],
                 ToState = states[MyStates.defcon2]
             };
    arcs["defcon1_diplomaticIncident"].OnStateTransitioningEvent                  += (x, y) => MovingFrom_defcon1_To_defcon2;
    arcs["defcon1_diplomaticIncident"].OnStateTransitionedEvent                 += (x, y) => MovedFrom_defcon1_To_defcon2;
    arcs["defcon2_assassination"] = new StateTransition {
                 FromState = states[MyStates.defcon2],
                 ToState = states[MyStates.defcon3]
            };
    arcs["defcon2_assassination"].OnStateTransitioningEvent                += (x, y) => MovingFrom_defcon2_To_defcon3;
    arcs["defcon2_assassination"].OnStateTransitionedEvent                += (x, y) => MovedFrom_defcon2_To_defcon3;
    arcs["defcon3_coup"] = new StateTransition {
                 FromState = states[MyStates.defcon3],
                 ToState = states[MyStates.defcon1]
           };
    arcs["defcon3_coup"].OnStateTransitioningEvent                += (x, y) => MovingFrom_defcon3_To_defcon1;
    arcs["defcon3_coup"].OnStateTransitionedEvent                += (x, y) => MovedFrom_defcon3_To_defcon1;
}

You can see that for each state and transition event I’m adding lambdas that invoke methods that are also being code generated. these are the partial methods described earlier. Here’s the generator:

foreach(XmlElement s in states)
    {
    WriteLine("partial void OnLeave_"+s.GetAttribute("ID")+"();");
    WriteLine("partial void OnEnter_"+s.GetAttribute("ID")+"();");
    }
foreach(XmlElement s in trns)
    {
    WriteLine("partial void MovingFrom_"+s.GetAttribute("from")+"_To_"+s.GetAttribute("to")+"();");
    WriteLine("partial void MovedFrom_"+s.GetAttribute("from")+"_To_"+s.GetAttribute("to")+"();");
    }

Which gives us:

partial void OnLeave_defcon1();
partial void OnEnter_defcon1();
partial void OnLeave_defcon2();
partial void OnEnter_defcon2();
partial void OnLeave_defcon3();
partial void OnEnter_defcon3();
partial void MovingFrom_defcon1_To_defcon2();
partial void MovedFrom_defcon1_To_defcon2();
partial void MovingFrom_defcon2_To_defcon3();
partial void MovedFrom_defcon2_To_defcon3();
partial void MovingFrom_defcon3_To_defcon1();
partial void MovedFrom_defcon3_To_defcon1();

The C# 3.0 spec states that if you don’t choose to implement one of these partial methods then the effect is similar to attaching a ConditionalAttribute to it – it gets taken out and no trace is left of it ever having been declared. That’s nice, because for some state models you may not want to do anything other than make the transition.

We now have a working state machine with masses of extensibility points that we can use as we see fit. Say we decided to implement a few of these methods like so:

public partial class MyStateModel {
    partial void OnEnter_defcon1()
    {
        Debug.WriteLine("Going Into defcon1.");
    }
    partial void OnEnter_defcon2()
    {
        Debug.WriteLine("Going Into defcon2.");
    }
    partial void OnEnter_defcon3()
    {
        Debug.WriteLine("Going Into defcon3.");
    }
}

Here’s how you’d invoke it:

MyStateModel dfa = new MyStateModel();
dfa.ProcessInput((int) MyInputs.diplomaticIncident);
dfa.ProcessInput((int) MyInputs.assassination);
dfa.ProcessInput((int) MyInputs.coup);

And here’s what you’d get:

Going Into defcon2.
Going Into defcon3.
Going Into defcon1.

There’s a lot you can do to improve the model I’ve presented (like passing context info into the event handlers, and allowing some of the event handlers to veto state transitions). But I hope that it shows how the partials support in conjunction with T4 templates makes light work of this perennial problem. This could easily save you from writing thousands of lines of tedious and error prone boiler plate code. That for me is a complete no-brainer.

What I like about this model is the ease with which I was able to get code generation. I just added a file with extension ‘.tt’ to VS 2008 and it immediately started generating C# from it. All I needed to do at that point was load up my XML file and feed it into the template. I like the fact that the system is lightweight. There is not a mass of framework that takes over the state management, it’s infinitely extensible, and it allows a very quick turnaround time on state model changes.

What do you think? How would you tackle this problem?

Announcing LinqToRdf v0.6

I’ve just uploaded LinqToRdf v0.6 with improved designer support for Visual Studio .NET 2008.

The release includes the following high-points:

  • LinqToRdf Designer and VS.NET 2008 extension completely rewritten
  • LinqToRdf Installer now includes the installer of LinqToRdf Designer (at no extra cost)
  • Project and Item templates now installed as part of LinqToRdf Designer
  • Generated object and data properties now get their own EntitySet or EntityRef.
  • Generates LINQ to SQL-style DataContext objects to hide query creation. Much Cleaner.

The user experience for LinqToRdf should be greatly improved in this release.  I focussed on getting project and item templates set up that would allow you to either create a dedicated LinqToRdf project that would have all the assembly references set up for you, or to create a new LinqToRdf designer file, that would generate C# code based on the new Attribute model introduced a few versions back.

The VS.NET extensions are not installed by default, instead they are created in the LinqToRdf directory. If you do install them, then you will find that visual studio will now have a LinqToRdf will have a new project type.

clip_image001[5]

You also have the LinqToRdf designer file type, which has been around for a version or two:

image

The Solution view is like this:

clip_image001[7]

The designer view is the same as ever:

clip_image001[9]

Things are coming along, and the download stats for version 0.4 were actually quite healthy (at least i think they were) so I expect this version to be the most popular yet.

Expect to see the lazy-loading relationship representation process fully documented in the coming days.

Enjoy.

Functional Programming in C# – Higher-Order Functions

  1. Functional Programming – Is it worth your time?
  2. Functional Programming in C# – Higher-Order Functions

This is the second in a series on the basics of functional programming using C#. My topic today is one I touched on last time, when I described the rights and privileges of a function as a first class citizen. I’m going to explore Higher-Order Functions this time. Higher-Order Functions are functions that themselves take or return functions. Meta-functions, if you like.

As I explained last time, my programming heritage is firmly in the object-oriented camp. For me, the construction, composition and manipulation of composite data structures is second nature. A higher-order function is the equivalent from the functional paradigm. You can compose, order and recurse a tree of functions in just the same way as you manipulate your data. I’m going to describe a few of the techniques for doing that using an example of pretty printing some source code for display on a web site.

I’ve just finished a little project at Readify allowing us to conduct code reviews whenever an interesting code change gets checked into our TFS servers. A key feature of that is pretty-printing the source before rendering it. Obviously, if you’re displaying XHTML on an XHTML page, your browser will get confused pretty quickly unless you take steps to HTML-escape all the XHTML entities that might corrupt the display. The examples I’ll show will highlight the difference between the procedural and functional approaches.

This example shows a fairly typical implementation that takes a file that’s been split into lines:

public static string[] RenderLinesProcedural(string[] lines)
{
    for (int i = 0; i < lines.Count(); i++)
    {
      lines[i] = EscapeLine(lines[i]);
    }
    return lines;
}

public static string EscapeLine(string line)
{
  Debug.WriteLine(“converting ” + line);
  return line.Replace(” “, ”  “)
      .Replace(“\t”, ”  “)
      .Replace(“<“, “<“)
      .Replace(“>”, “>”);
}

There’s a few things worth noticing here. In C#, strings are immutable. That means that whenever you think that you are changing a string, you’re not. In the background, the CLR is constructing a modified copy of the string for you. The Array of strings on the other hand is not immutable, therefore a legitimate procedural approach is to make an in-place modification of the original collection and pass that back.  The EscapeLine method repeatedly makes modified copies of the line string passing back the last copy.

Despite C# not being a pure functional programming language[1], it’s still doing a lot of copying in this little example. My early impression was that pure functional programming (where all values are immutable) would be inefficient because of all the copying goign on. Yet here is a common-or-garden object oriented language that uses exactly the same approach to managing data, and we all use it without a qualm. In case you didn’t know, StringBuilder is what you should be using if you need to make in-place modifications to strings.

Let’s run the procedural code and record what happens:

private static void TestProcedural()
{
   string[] originalLines = new string[] { “<head>”, “</head>” };
   Debug.WriteLine(“Converting the lines”);
   IEnumerable<string> convertedStrings = RenderLinesProcedural(originalLines);
   Debug.WriteLine(“Converted the lines?”);

   foreach (string s in convertedStrings)
    {
    Debug.WriteLine(s);
    }
}

Here’s the output:

clip_image001

As you can see, the lines all got converted before we even got to the “converted the lines?” statement. That’s called ‘Eager Evaluation’, and it certainly has its place in some applications. Now lets use Higher-Order Functions:

public static IEnumerable<string> RenderLinesFunctional(IEnumerable<string> lines)
{
    return lines.Map(s => EscapeString(s));
}

static IEnumerable<R> Map<T, R>(this IEnumerable<T> seq, Func<T, R> f)
{
   foreach (var t in seq)
     yield return f(t);
}

static string EscapeString(string s)
{
   Debug.WriteLine(“converting ” + s);
   return s.Replace(”  “, “&nbsp;&nbsp;”)
     .Replace(“\t”, “&nbsp;&nbsp;”)
     .Replace(“<“, “&lt;”)
     .Replace(“>”, “&gt;”);
}

private static void TestFunctional()
{
   string[] originalLines = new string[] { “<head>”, “</head>” };
   Debug.WriteLine(“Converting the lines”);
   IEnumerable<string> convertedStrings = RenderLinesFunctional(originalLines);
   Debug.WriteLine(“Converted the lines?”);

   foreach (string s in convertedStrings)
    {
    Debug.WriteLine(s);
    }
}

This time the output looks different:

clip_image001[5]

At the time that the “Converted the Lines?” statement gets run, the lines have not yet been converted. This is called ‘Lazy Evaluation[2]‘, and it’s a powerful weapon in the functional armamentarium. For the simple array that I’m showing here, the technique looks like overkill but imagine that you were using a paged control on a big TFS installation like Readify’s TFSNow. You might have countless code reviews going on. If you rendered every line of code in all the files being viewed, you would waste both processor and bandwidth resources needlessly.

So what did I do to change the way this program worked so fundamentally? Well the main thing was to opt to use the IEnumerable interface, which then gave me the scope to provide an alternative implementation to representing the collection. in the procedural example, the return type was a string array, so I was bound to create and populate the array before returning from the function. That’s a point worth highlighting: Use iterators as return types where possible – they allow you to mix paradigms. Converting to IEnumerables is not enough. I could change the signature of TestProcedural to use iterators, but it would still have used Eager Evaluation.

The next thing I did was use the Map function to return a functional iterator rather than a concrete object graph as was done in the procedural example. I created Map here to demonstrate that there was no funny LINQ business going on in the background. In most cases I would use the Enumerable.Select() extension method from LINQ to do the same thing. Map is a function that is common in functional programming, it allows the lazy transformation of a stream or collection into something more useful. Map is the crux of the transformation – it allows you to insert a function into the simple process of iterating a collection.

Map is a Higher-Order Function, it accepts a function as a parameter and applies it to a collection on demand. Eventually you will need to deal with raw data – such as when you bind it to a GridView. Till that point you can hold off on committing resources that may not get used. Map is not the only HOF that we can use in this scenario. We’re repeatedly calling String.Replace in our functions. Perhaps we can generalise the idea of repeatedly calling a function with different parameters.

Func<T, T> On<T>(this Func<T, T> f, Func<T, T> g)
{
    return t => g(f(t));
}

This method encapsulates the idea of composing functions. I’m creating a function that returns the result of applying the inner function to an input value of type T, and then applying the outer function to the result. In normal mathematical notation this would be represented by the notation “g o f”, meaning g applied to f. Composition is a key way of building up more complex functions. It’s the linked list of the functional world – well it would be if the functional world were denied normal data structures, which it isn’t. :P

Notice that I’m using an extension method here, to make it nicer to deal with functions in your code. The next example is just a test method to introduce the new technique.

private static void TestComposition()
{
    var seq1 = new int[] { 1, 3, 5, 7, 11, 13, 19 };
    var g = ((Func<int, int>)(a => a + 2)).On(b => b * b).On(c => c + 1);
    foreach (var i in seq1.Map(g))
    {
      Debug.WriteLine(i);
    }
}

TestComposition uses the ‘On’ extension to compose functions into more complex functions. The actual function is not really that important, the point is that I packaged up a group of functions to be applied in order to an input value and then stored that function for later use. You might think that that’s no big deal, since the function could be achieved by even the most trivial procedure. But this is dynamically composing functions – think about what you could do with dynamically composable functions that don’t require complex control logic to make them work properly. Our next example shows how this can be applied to escaping strings for display on a web page.

void TestComposition2()
{
   var strTest = @”<html><body>hello world</body></html>”;
   string[][] replacements = new[]
     {
       new[]{“&”, “&amp;”},
       new[]{”  “, “&nbsp;&nbsp;”},
       new[]{“\t”, “&nbsp;&nbsp;”},
       new[]{“<“, “&lt;”},
       new[]{“>”, “&gt;”}
     };

  Func<string, string> f = x => x;
  foreach (string[] strings in replacements)
  {
     var s0 = strings[0];
     var s1 = strings[1];
     f = f.On(s => s.Replace(s0, s1));
  }

  Debug.WriteLine(f(strTest));
}

This procedure is again doing something quite significant – it’s taking a data structure and using that to guide the construction of a function that performs some data-driven processing on other data. Imagine that you took this from config data or a database somewhere. The function that gets composed is a fast, directly executable, encapsulated, interface free, type safe, dynamically generated unit of functionality. It has many of the benefits of the Gang Of Four Strategy Pattern[3].

The techniques I’ve shown in this post demonstrate some of the power of the functional paradigm. I described how you can combine higher order functions with iterators to give a form of lazy evaluation. I also showed how you can compose functions to build up fast customised functions that can be data-driven. I’ve also shown a simple implementation of the common Map method that allows a function to be applied to each of the elements of a collection. Lastly I provided a generic implementation of a function composition mechanism that allows you to build up complex functions within a domain.

Next time I’ll introduce the concept of closure, which we’ve seen here at work in the ‘On’ composition function.

Some references:

1. Wikipedia: Pure Functions

2. Wikipedia: Lazy Evaluation

3. Wikipedia: Strategy Pattern

Functional programming – Is it worth your time?

Short Answer: Yes!

Regular readers of the The Wandering Glitch know I focused lots of attention on LINQ and the new wave of language innovation in C# 3.0. I’m intrigued by functional programming in C#. At university, I focused on languages like C, C++, Eiffel and Ada. I’ve never since needed to learn functional programming techniques – who uses them, after all? Functional programming had always seemed like a distant offshoot of some  Bourbakiste school of mathematical programming unconcerned with practical issues of software development. Don’t get me wrong – I find that attractive, but it was always hard to justify the time, when there was so much else of practical worth that I needed to study. So the years passed, and I never came near. Functional programming was suffering from bad PR. But times change.

A fundamental change is under way in how we develop software. Declarative, Functional, Model-driven, Aspect-oriented and Logic Programming are all examples where new ways of representing and solving problems can pay huge dividends in programmer productivity and system maintainability.  Suddenly, it no longer seems that functional programming is a means to try out obscure new forms of lambda calculus. Now it seems that there are fast, powerful, easy to understand techniques to be learnt that will make my systems more robust and smaller.

 

I regretted not learning functional programming – I felt that there were ideas I was missing out on. And that made me envious. So, now is as good a time as any to address that deficiency. Another deficiency I want to address is the dearth of posts on the Glitch. I got tied up in producing a SPARQL tutorial for IBM which swallowed up my evenings. After that I had in mind to pursue an idea for a blog post on the relationships between LINQ, and Meta-mathematical structures like Groups and Categories. I got a major dose of intellectual indigestion, which stopped me from producing anything. The only way I’ll get productive again is to break the topics I want to cover into bite-sized chunks. that’s enough apologia – here’s the post.

Functional Programming is probably simpler than you think. It’s based on the idea that there is often very little distinction between programs an data. Consider this function ‘f': 

f(x): x + 5

This function ‘f’ adds five to whatever you pass into f. What do I mean when I say ‘f’. I’m talking about the function, not using it. It came completely naturally for you to go along with me and describe the function ‘f’ as a thing. Here’s what I mean:

  g(f, x): f(x) + 7

This function ‘g’ adds 7 to the result of calling ‘f’ on x. So the final result would be ‘(x + 5 ) + 7′. You see, that wasn’t really a complex concept at all. Yet that’s the essence of functional programming. To put it another way:

Functions are first class citizens.

Which means that:

  • They can be named by variables.
  • They can be passed as arguments to procedures.
  • They can be returned as values of procedures.
  • They can be incorporated into data structures. [1]

It should also mean that you can compose your own functions as I did with ‘f’ and ‘g’ earlier. Another possibly less vital feature to empower this charter for the rights and privileges of functions is the ‘lambda’ (or λ) function. A lambda function is simply a way to create function on the fly, without having to give it a name. Compare this C# function:

int f(int x){return x + 5;}

With this one:

int f(int x)
{
  int c = 5;
  return x + c;
}

They both perform the same function, but the second one pointlessly created a name for the value ‘5’. The first example got by perfectly well without having to give a name to the value it was working with. Well, the same principle applies to lambda functions. Here’s a C# example that does what ‘g’ did above:

int g(Func<int , int> f, int x){return f(x) + 7;}

The ‘Func<int, int> f’ syntax is a new piece of C#, used to represent that f is a function that takes a single int and returns an int. you can probably already see that this function ‘g’ could be used with many different functions, but sometimes we don’t want to exercise our right to be able to name those functions with variables. To just create a function, without naming it (to use an ‘anonymous function’ in .NET parlance) you use the new lambda function syntax in C# 3.0:

int x = 3;
int z = g(y => y + 5, x);

‘g’ gets an anonymous function and an integer as parameter, runs the function with the parameter, adds 7 to what comes out of the function and then returns the result. Pretty cool. We’ve exercised our second right – to be able to pass functions into procedures. What about the first right? Well we sort of already had that with parameter ‘f’ in the function ‘g’ earlier. Lets look at another example:

int Foo()
{
  Func<int , int> bar = y => y + 5;
  // …
  return bar(56);
}

We’ve kept our function around in a format that is very flexible. It hovers in a middle ground between program and data. If, like me, you have a procedural and imperative heritage – you regard anything that you can store, return and pass around as data. But when you can run that data as code, then the lines begin to get a little blurred.

The next right that we need to claim is the ability to return functions as values. We have all the machinery needed to do that now. If we can pass something into a function, then we could pass it straight out again. If we can create lambdas we can return them rather than use them or pass them into other functions. Here’s an example based on the function ‘g’ earlier:

Func<int , int> H()
{
  return (int a) => a + 7;
}

This is powerful – rather than give you the result of adding a number to some value you pass in, this function gives you a function that you can use to perform the function. you don’t need to know what the function is, just how to run it. Sounds like a perfect recipe for business rules. Obviously, adding numbers like that is trivial, but the principle can be applied to functions of great complexity. This can be lazy too – you can provide a function to calculate the result when you need it and not before. Think LINQ to SQL queries, that don’t incur the expense of hitting the DB until necessary.

The last right needed to be a first class functional citizen is also achieved through the capabilities that have been explained already (in the case of C# at least). If we can create a function and assign it to a variable, then we can do the same to a compound data structure. Here’s a slightly more elaborate example (thanks to Paul Stovell for the idea):

public class MySwitcher<T , R>
{
Func<T , bool> Pred{get;set;}
Func<T , R> Iffer{get;set;}
Func<T , R> Elser{get;set;}

public MySwitcher(Func<T , bool> pred,
  Func<T , R> iffer,
  Func<T , R> elser)
{
  Pred = pred;
  Iffer = iffer;
  Elser = elser;
}
R Run(T input)
{
  if(Pred(input))
  return Iffer(input);
  return Elser(input);
}
}

This class keeps two functions around for later use. It also keeps a predicate function (a function that returns a yes/no answer) to decide which of them to use for a given piece of data. This could be used, for example, in a UI to decide between different ways to filter or render data based on some criteria.

I hope this very simple introduction shows you that not only does C# (and .NET 3.5 generally) now support functional programming, but that the arsenal of the functional programmer is very small and easy to learn. Next time around I hope to show you just how powerful these simple techniques can be.

[1] Abelson & Sussman: the structure and interpretation of computer programs. 2ed. MIT Press. 1998.

Announcing LinqToRdf 0.3 and LinqToRdf Designer 0.3

The third release of LinqToRdf has been uploaded to GoogleCode. Go to the project web site for links to the latest release.

LinqToRdf Changes:
– support for SPARQL type casting
– numerous bug fixes
– better support for identity projections
– more SPARQL relational operators
– latest versions of SemWeb & SPARQL Engine, incorporating recent bug
fixes and enhancements of each of them

I have also released a new graphical designer to auto-generate C# entity models as well as N3 ontology specifications from UML-like designs. This new download is an extension to Visual Studio 2008 beta 2, and should make working with LinqToRdf easier for those who are not that familiar with the W3 Semantic Web specifications.

Please let me know how you get on with them.

Continue if not null operator? Yes please!

Olmo made a very worthwhile suggestion on the LINQ forums recently. His suggestion was for a new operator to be added to the C# language to allow us to do away with the following kind of pesky construct:



 
string x;
if(a != null &amp;&amp; a.Address != null &amp;&amp; a.Address.FirstLine != null)
    x = a.Address.FirstLine;

instead he suggested a new ?. operator so that we could produce something like this:


 
x = a?.Address?.FirstLine;

and at the first sign of failure it would just yield a null or the equivalent. I like this suggestion, it is elegant, and provides a similar meaning to the coalescing operator. The C# team have been adding lots of these little bits of syntactic sugar recently, so why not add another that will save us lots of thrown exceptions or grisly if statements?

LinqToRdf now works on the Visual Studio 2008 Beta 2

I should have brought the code up to date weeks back – but other things got in the way. Still – all the unit tests are in the green.  And the code has been minimally converted over to the new .NET 3.5 framework. I say ‘minimally’ because with the introduction of beta 2 there is now an interface for IQueryProvider that seems to be a dispenser for objects that support IQueryable. I suspect that with IQueryProvider, there is now a canonical architecture that is recommended by the LINQ team. Probably that will mean moving more responsibility into the RDF<T> class away from the QuerySupertype.  Time (and more documentation from MS) will tell.

There are several new expression types that are not yet supported (such as the coalescing operator on nullable types) – it remains to be seen whether they are supportable in SPARQL at all. Further research required. The solution doesn’t currently support WIX – I’m not sure whether WIX 3 will work with 2008 yet. Again, more research required.  What that means is that there will not be any MSI releases produced till WIX supports the latest drop of VS.NET.

Enjoy – and don’t forget to give us plenty of feedback on your experiences

To got to the google code project click here.

Dynamic Strongly-Typed Configuration in C#

    I’ve written at great length in the past about the perils of configuration, and I thought I’d written as much as I was willing on the topic. But I thought it was worth describing this solution, since it was so neat, and easy, and had most of the benefits of text based configuration and strongly typed inline configuration. I was recently messing about with some WCF P2P code, and the setup code had some configuration that looked like a likely candidate for a strongly typed configuration object that wouldn’t change frequently. I think this solution neatly addresses one of the main objections to hard coded configuration, which is that we do sometimes need to change configuration data at runtime without having to take down the servers or recompile them.

    The idea behind this solution stems from the use of a plug-in architecture such as the forthcoming System.AddIn namespace to arrive in VS2008. In that you get the options to load a namespace from a designated directory and make use of types found inside of the assembly. Why not use the same approach with configuration? We can dynamically load configuration assemblies and then use a single configuration setting to specify which type from those assemblies would be used as the new configuration. This has all the benefits normally reserved for text based dynamic configuration such as System.Configuration.ConfigurationManager, but with the added benefits of strong typing, inheritance, calculated configuration settings and added performance of POCOs.

    My WCF program was a simple chat client that I hope to be able to use between members of my family. Typical configurations were MeshAddress, and CredentialType that are unlikely to ever change frequently. Each of these configuration settings was defined on an interface called IChatClientConfig. Implementing that full interface was my default configuration class called DefaultChatConfig. That provided all of my defaults, and is perfectly usable. I then specialized that class with some others, for example with a different mesh address for chatting with people at work. A class diagram for the configuration objects are shown below.

    clip_image001

    Each class just provides a new implementation for the field that it provides a different value for.

    Loading the configuration is extremely simple. First you have to say which one of those classes you want to use for your configuration.

    <appSettings>
        <add key="P2PConfigSettings" value="ChatClient.Configuration.TechChatConfig, ChatClient.Configuration, Version=1.0.0.0"/>
    </appSettings>
    

    This simple app setting is the fully qualified type name of the TechChatConfig class on the bottom right of the diagram above. Which will be a default chat configuration with whatever tech chat configuration added. That’s all the prerequisites for loading configuration. Not all I need to do to load the configuration is this:

    private static IChatClientConfig GetConfigObject()
    {
        string configType = ConfigurationManager.AppSettings["P2PConfigSettings"];
        Type t = Type.GetType(configType);
        return Activator.CreateInstance(t) as IChatClientConfig;
    }
    

    Get whatever type I specified as a string from the configuration file, get the type specified by that string create and instance and return it. Simple. That configuration could be then stored as a singleton or whatever you need to do.

    [ServiceBehavior(InstanceContextMode = InstanceContextMode.Single)]
    public partial class Window1 : IPeerChat
    {
        private IChatClientConfig configuration;
    

    In my case I just stored it in the window object I was using it for – my chat client only has one window! Now I can just use it, whenever I do any comms.

    private NetPeerTcpBinding CreateBindingForMesh()
    {
        NetPeerTcpBinding binding = new NetPeerTcpBinding();
        binding.Resolver.Mode = config.PeerResolverMode;
        binding.Security.Transport.CredentialType = config.CredentialType;
        binding.MaxReceivedMessageSize = config.MaxReceivedMessageSize;
        return binding;
    }
    

    So you see that the process is very simple. With the addition of an AddIn model we could use a file system monitor to watch the configuration file, detect changes and reload the configuration object singleton using the mechanism described above. That fulfils most of the requirements that we have for type safety, performance, dynamism, intelligence, and object orientation. Very few configuration scenarios that fall outside of the bounds of this solution should be solved using local configuration settings anyway – in those cases you really ought to be looking at an administration console and database.