Code Generation Patterns – Part 1

This time I'm going to show you the design pattern that I use for code generation. Generally I use this for analysis of hierarchical structures, such as assemblies or database metadata. You could use it in other scenarios, where data is less structured. If you are familiar with the operation of a SAX parser and .NET events then you'll have no trouble understanding the design.

SAX-like Parser

A SAX parser works by iterating through its material (in this case XML, but that's immaterial) firing off events for everything that it finds. It doesn't do anything with what it finds, it just acts as an event generator that embodies some knowledge of the internal structure of the incoming material. It is up to whoever is listening to the events to do something with them. In the case of a SAX parser an XML DOM tree is probably built up, but not necessary, and that is the point. You have the option to tailor your processing of the sequence of events according to your requirements.

event scanner & generator

In my design pattern the objective is to decouple event generation from event handling. As far as the event handler is concerned, it is the order of events and their content that matters, not where they originated from. Likewise for the scanner it is not important how the data is handled, only that it navigates through the data properly. So we decouple and encapsulate the two aspects of the process and enable the enhancement and extension of the system easily.

multicast events

The use of events means that you can multicast events frm the scanner to as many handlers as you need.

typed events vs overloaded events

A natural question that occurs to me when I liken the pattern I use for code generation with that of a SAX parser is why I use a plethora of events with typed parameters rather than an overloaded interface with a single event and event parameter type. With SAX events come in many shapes and sizes and are sent out through a single event interface. In my design I have chosen an interface and a set of events that represent each of the possible entities that we are interested in. The interface for the assembly scanner looks like this:

In a previous project I used the delegate/event interface as a common language between a set of components of an ORM system. The data initially made its way into the system through a reverse engineering database metadata schema reader. Apart from being a mouthful to say, it read the structure of a SQL Server database, and generated a set of events that were multicast in the ways I've described above. One of the listeners built an in-memory model of the schema. Later on we needed to persist this model to disk as a configuration file that would be used at runtime as the class map of the ORM (if this intrigues you, goto Scott Ambler's web site for some very influential papers). So when the ORM service started up it loaded the configuration file and read through that firing off exactly the same events. The same listener was stationed on the other end of the event interface, so I was able to guarantee that the class map model was brought into memory in the same way as was used for the code generation cycle. My point is that this typed event interface acts as a common language between components participating in the data's lifecycle. If provided a natural depth first search model for passing data about in a way that allowed on the fly processing. It also allowed a complete decoupling of the data traversal mechanism from the data handling, the sequence of events was all that mattered to my code generation system, not the format on disk or even in memory – for all it cared the data could be stored as metadata in an RDBMS or nodes in an XML file or DOM. The decoupling is total, but not at the expense of vagueness as can be the case with catch all interfaces.

public class AssemblyScannerEventNotifier : MarshalByRefObject { public delegate void NewAssemblyHandler(object sender,   NewAssemblyEventArgs e); public event NewAssemblyHandler newAssembly; public void NotifyNewAssembly(Assembly assembly) { NewAssemblyEventArgs e = new NewAssemblyEventArgs(assembly); if (newAssembly != null) newAssembly(this, e); } }

The EventArg object simply carries the assembly that has been found.

public class NewAssemblyEventArgs : EventArgs { private Assembly theAssembly = null; public Assembly TheAssembly { get{return theAssembly;} } public NewAssemblyEventArgs(Assembly assembly) { theAssembly = assembly; } }

What this allows is for us to multicast the event to a number of different interested parties. Generally that will be a small set including the code generator or some intermediate party, plus the test harness to provide a progress bar.

Joe Satriani


I have never seen anybody do such unnatural things with a guitar. I mean he was fiddling with it in ways that seemed bound to produce an awful cacophony, but somehow in his hands, the guitar produced a gorgeous sweet complex sound, that complemented the rest of the band perfectly. Wow.

I have to say, though, that if we were playing Star Qualities, he would definitely be “50% Vin Diesel, 50% Gerard O’Donnell” For those of you in the know, that is a pretty weird combination.

My brother and I once watched a busker play the guitar in Covent Garden. He had been a graduate of the Royal College of Arts, and said that he had been a classical guitarist but the technique for the guitar was too restrictive. He started to play the electric guitar in a way that allowed him to use both hands to make notes. The guitar was turned upside down so that the head of the guitar sat in his lap, and the pickups rested against his shoulder. The main body of the guitar was removed, so he was left with the strings and the pickups and not a lot else. He played the guitar by tapping the fretboard with his fingertips. I remember that he was able to play delicate Bach Fugues with ease. I remember this guy, because I remember that he had taken the guitar beyond its limitations to achieve the sound that he wanted to make. Joe Satriani does the same, but without having to abandon standard guitar technique, when he needed it. Wow.

Which codegen technique?

Now that I’ve come up with a few requirements, and shown you how I am going to partition the problem, I think it might be about time to discuss some of our technological options.

I said previously that I had a serviceable NVelocity wrapper, that I could use for the code generation. I have other options too, though, and some of them might be better in the long run, than sticking with NVelocity. So I weighed them up, and found that there were many conflicting factors.

My codegen options are:

  • Write code to a text file using Stream.WriteLine
  • Templating language (such as NVelocity)
  • XSLT transformation
  • CodeDOM code emission

I’m sure there are more options than these, but these are my main four. I judged them using the following criteria:

  • Speed
  • Ease of use
  • Flexibility
  • Readability
  • Debugability

My comparison is here. You can see that, as with most design decisions, there is a trade off to be made between speed and ease of development versus power and flexibility. With these sorts of decisions (especially when I know there are architectural factors involved) I tend to defer them until the last moment, or until something else places the casting vote. Of course, in the wider world of work, you normally get this sort of decision made for you by a pointy-haired marketing manager, who uses an etch-a-sketch as a laptop. Still, here I have the freedom to make my own decisions, and coolness and power will definitely win out over speed to market, or understandability.

Rant Over.

Apart from the CodeDOM option, I have used all of these techniques many times over the years. I’ve even used VIM macros to generated code. I’ve transformed XSD & DTD files into C++ serialization wrappers, and with Velocity and NVelocity I have done tons of code generation and generated literally millions of lines of code. The point is that the first three are tried and tested, with templates being my personal favourite. CodeDOM is not so dissimilar to systems I have used for HTML generation and XML tree navigation, so I guess it to is just another form of something familiar. Does that mean it doesn’t win on the cool front? Fraid so. When it comes down to it, there are a few optimizations that can mitigate the startup, processing and resource management costs of using a template interpreter language such as NVelocity. The power and flexibility that we get in return is hard to refuse.

As you probably guessed I am going to opt for NVelocity, since I have a perfectly good system that I don’t want to write again. I would like to have a framework to place it in that I can use with other code generation techniques if they seem worthwhile. Next time I will describe my code generation patterns and show how we can develop a limitless array of code generation handlers to allow people to use my system with XSLT or whatever. Later on I may even need to use this in conjunction with CodeDOM to perform dynamic proxy generation in a way similar to ServicedObjects in COM+.

What to expect from a DBC framework

Last time I said I’d explore the requirements on a DBC solution in a little more detail. I first experienced design by contract when I was studying object oriented design at Brighton University, UK. I was studying it under Richard Mitchell, author of “Design by Contract, by Example”. The curriculum was taught using Eiffel. Eiffel was (and is still is, as far as I know) the only programming language that has native support for design by contract. A little googling brings up quite a few links on DBC. One thing is clear from a cursory glance; DBC means different things to different people. To some it means prepending and appending your methods with guard statements like this:

public void foo() { if(this.Bar < 1) throw new ApplicationException("Bar < 1 failed"); }

This is certainly a step in the right direction. It focuses your attention on what the environment of a component is like, and what the right “operating envelope” is for the correct function of a method. You do this when you define unit tests to explore the limits of a method, seeking ways to break it. Some argue that if you have done your job properly during the testing phase, then DBC wouldn’t be necessary.

Others argue that such guard statements are against the tenets of defensive programming, but of course your part in the contract is to make sure that the code never reaches my component unless you are sure that the guards will not be triggered. So a defensive programmer would put guards around the method invocation. Naturally, I can’t let my guards down, since as a developer I can’t assume that you have done everything that you ought to do to use my component. So the checks are to stay in place, as a defensive measure against people who don’t practice defensive programming themselves. Generally you can’t handle argument errors at the point of invocation, so it makes sense to send the exception straight back to the caller.

Design By Contract is more than just a way of trapping and reporting errors in component usage. I want to be able to make a declaration like this on an interface, and have all implementers and users of the interface abide by the rules.

[Invariant(Balance >= OverdraftLimit)] interface IBankAccount { public float Balance{...} public float OverdraftLimit{...} }

This sort of programming is more about making declarations about what you want to happen – declarative programming. It’s about having declarations that have some weight. Sadly, in languages like C#, there is no intrinsic mechanism that enforces contracts like the one above, so the purpose of this project is to find a way to enforce such rules without intruding too much into the everyday routines of a programmer. It should just work.

If you look closely at my interface definition above, you’ll see that there is more going on. The interface is setting some semantic rules. Normally, an interface is a syntactic contract between component developers and consumers. But here we are able to go a step beyond that, into the realm of semantics If I produce a contract like this:

interface IBankAccount { [Require(val > 0.0)] [Ensure($after(Balance) == $before(Balance) + val)] public float CreditAccount{float val}  public float Balance{...} }

I’m not just declaring how you make calls into my method. I’m telling you what the call achieves. Previously I was only able to say what the method was called, what parameters it took, what order they were in, and what type the result was. Vital stuff, to be sure, but not useful without a specification or some sort of API documentation. With DBC I can tell you what happens when you call my method, what kinds of data should be in the parameters, and what changes in the state data after the method call. I’m describing the effects of the process, without describing the algorithm. That’s exactly what we’re after with an interface. We have managed to extend the interface to carry the sort of information that it should have been carrying to begin with! Wow! Now you should be able to see why I’m writing a blog about DBC!

In Summary…

You can see from the examples above, that we need to be able to make statements about the content of Properties, Fields and Parameters. We also need to be able to take snapshots of their value before and after the method invocation, and see whether the rules have been followed. We now have some requirements to be going on with.

Domain Analysis (kinda)

I’m blogging my progress from mid-way through the project. I have a working prototype, which is a crude implementation of what I want. It has a few limitations which require me to go back and productise it. It may seem a little arbitrary for me to describe a domain split on my second blog post, before even telling you what the requirements are. It’s not. It’s just that that was what I was thinking about, when I started this blog.

My motivation for this project is long standing. I used Eiffel at university (in 1993!), and since then have always wanted to see declarative predicates attached to interfaces and classes. Nothing ever came in the languages I use, so I always end up doing something like this. I wrote a version of this framework for C++ and Java some years ago, and I thought it’s time I did the same for C#. I also want to explore some of the more obscure (and uniquely powerful) language features of C#, and new features coming with C# v2.0.

I’ve split up the broad levels of responsibility into the following:

1. Assertion testing.
2. Assertion representation
3. Assertion code generation
4. Assertion handler assembly management
5. Assertion failure management
6. Configuration
7. Third party code integration

Some of these areas I will automatically descope, on the basis that they are done better elsewhere. For example assertion failure management is done best using a top level exception management system, or handlers in the code from which the method invocations came. The Microsoft Enterprise System Exception Handler Application Block system will allow you to handle exceptions in a variety of ways, is extensible, and configurable. Lets leave that out. Assertion testing is probably just an if/then statement. But experience tells me that all result codes are not equal. HRESULTs needed a specialist test, as do old style C methods that have zero as a success code, and everything else as failure.

Configuration I have decided to offload onto the new Microsoft Configuration Application Block. Both came from the ACA.NET framework from Avanade. Kudos to them, and no bias intended at all from me.

Assertion representation is fairly simple. I have a set of Attributes for each type of assertion that I wish to make about a program. I have taken these from Eiffel: Invariant, Require and Ensure. These represent invariant assertions that must be true in all places and at all times. Ensure and Require are pre- and post-conditions that apply to whatever they are attached to only. These attributes store a string representation of the predicate they must enforce. It is up to this program to turn those string based predicates into lightweight code that can be applied on every invocation of the method they are attached to.

What I have already for code generation is a simple NVelocity based code generation template system. It is a dumbed down version of NVelocity, but has served me well over the years, and has been used in a production environment for ORM code generation. The system is simple enough to initialise in two lines, and allows repeated use of the same template with different parameters, so is very good for doing lots of code generation.

I currently use the CodeDOM framework for the compilation of the generated source code, and am as yet undecided about what to do with the generated assemblies. Should I save them to a DLL, and keep them around? Perhaps I could save myself the code generation step on future runs. I could also use a Just-In-Time assembly generator and add assertion handlers as they are encountered.

I am also undecided about whether to generate all of the code for the assembly inline as proxies, or use some sort of Layer Supertype pattern to outsource the assertion handling work, and then have a really barebones system emitted dynamically as MSIL to invoke method on the supertype as necessary.

Assertion handler assembly management is another area where I need to make some decisions. That will come out of how I solve the code generation issues described above.

Third party code integration is that part of the design process that makes sure that the framework is usable with a variety of implementations. For example I can imagine that this should work with Dynamic and Static proxies. Static are easier, but are more intrusive potentially. It should also fit into the more common Aspect Oriented Programming Systems. Again here I am inspired by Avanade’s ACA.NET, which has a very well designed and implemented Aspect Oriented system, which I’m surprised is not in the enterprise Library Application Blocks.

So, to recap, I plan to write about how I solve the following problems in the coming weeks/months:

1. Assertion testing.
2. Assertion representation
3. Assertion code generation
4. Assertion handler assembly management
5. Third party code integration

I’ll also describe some simple usage scenarios, to put all of this into context, which I guess will send me back to my university days and my first introduction to Eiffel. Eiffel is not a .NET compatible language, so you could say I am wasting my time re-inventing the wheel, when I could just program in Eiffel. My only excuse is that I first learned to program in C, and I bonded to the syntax. Anything else seems clunky or sloppy. I know that Eiffel is neither of these things, but I’ve never found a contract for Eiffel either, so I continue to trade on my C++, Java and C# skills.

There is another, better, reason: I’m working as a Solution Architect in Australia, where I don’t get to write the programs I’m ‘designing’. I’m doing this project in my private time. I need to keep my skills alive until I get a real project. So I’m not in any hurry to get this out, but if at some stage it gets robust enough to show the world, perhaps you would like to join me in a GPLd project? There are countless people out there who could do a better job of the third party framework integrations than me. Are you one of them? Are you single and have no triplets on the way? (unlike me!) Let me know, and when the time comes, I’ll set up the project.