Preparing a Project Gutenberg ebook for use on a 6″ ereader

For a while I’ve been trying to find a nice way to convert project Gutenberg books to look pleasant on a BeBook One. I’ve finally hit on the perfect combination of tools, that produces documents ideally suited to 6″ eInk ebook readers like my BeBook. The tool chain involves using GutenMark to convert the file into LaTeX and then TeXworks to modify the geometry and typography of the LaTeX file to suit the dimensions of the document to suit the small screen of the BeBook, then MiKTeX to convert the resultant LaTeX files into PDF (using pdfLaTeX).  Go to GutenMark (plus GUItenMark) for windows, MikTeX which includes the powerful TeX editor TeXworks, install them, and ensure they are on the path.

Here’s an example of the usual LaTeX output gtom GUItenMark. Note that this is configured for double-sided printed output.

\geometry{verbose,paperwidth=5.5in,paperheight=8.5in, tmargin=0.75in,bmargin=0.75in, lmargin=1in,rmargin=1in}
\evensidemargin = -0.25in
\oddsidemargin = 0.25in

We don’t need the margins to be so large, and we don’t need a difference in the odd and even side margins, since all pages on an ereader need to look the same. Modify the geometry of the page to the following:

\geometry{verbose,paperwidth=3.5in,paperheight=4.72in, tmargin=0.5in,bmargin=0in, lmargin=0.2in,rmargin=0.2in}

This has the added benefit of slightly increasing the perceived size of the text when displayed on the screen. Comment out the odd and even side margins like so:

%\evensidemargin = -0.25in
%\oddsidemargin = 0.25in

And here is what you get:

The finished product

Since both gutenmark and pdflatex are command line tools, we can script the conversion process. The editing is done with Sed (the stream editor). I get mine from cygwin, though there are plenty of ways to get the Gnu toolset onto a windows machine these days.

/c/Program\ Files/GutenMark/binary/GutenMark.exe --config="C:\Program Files\GutenMark\GutConfigs\GutenMark.cfg" --ron --latex "$1.txt" "$1.tex"

sed 's/paperwidth=5.5in/paperwidth=3.5in/
s/\\evensidemargin/%\\evensidemargin/' <"$1.tex" >"$1.bebook.tex"

pdflatex  -interaction nonstopmode "$1.bebook.tex"

rm *.aux *.log *.toc *.tex

Now all you need to do is invoke this bash script with the (extensionless) name of the gutenberg text file, and it will give you a PDF file in return. nice.

Some pictures of Carlton Gardens


Carlton Gardens, a set on Flickr.

This was my first outing with the Pentax K-x that I got recently. In these pictures, I’m trying to get to grips with the camera, so I didn’t have any particular objective other than to take pictures.

The light was so harsh it was very difficult for me to gauge whether the exposures were working – I couldn’t see the live views or previews at all! All in all I was very surprised that any of them were worth looking at.

Note to Self: Convert UTF-8 w/ BOM to ASCII (WIX + DB) using GNU uconv

This one took me a long time to work out, and it took a non-latin alphabet user (Russian) to point me at the right tools. Yet again, I’m guilty of being a complacent anglophone.

I was producing a database installer project using WIX 3.5, and ran into all sorts of inexplicable problems, which I finally tracked down to the Byte Order Mark (BOM) on my SQL update files that I was importing into my MSI file. See here for more on that.

I discovered that the ‘varied’ toolset used in our dev environments (i.e. VS 2010, Cygwin, VIM, GIT, SVN, NAnt, MSBuild, R# etc) meant that the update scripts had steadily diffused out into Unicode space. You can find out (approximately) what the encodings are for a directory of files using the GNU file command. Here’s a selection of files that I was including in my installer:

$ file *
01.sql:          ASCII text, with CRLF line terminators
02.sql:          Little-endian UTF-16 Unicode text, with very long lines, with CRLF, CR line terminator
03.sql:          UTF-8 Unicode (with BOM) text, with CRLF line terminators
05.sql:          ASCII English text, with CRLF line terminators
06.sql:          UTF-8 Unicode (with BOM) text, with CRLF line terminators
11.sql:          ASCII C program text, with CRLF line terminators
12.sql:          UTF-8 Unicode (with BOM) text, with CRLF line terminators
23.sql:          ASCII text, with CRLF line terminators
24.sql:          UTF-8 Unicode (with BOM) text, with CRLF line terminators
25.sql:          UTF-8 Unicode (with BOM) text, with CRLF line terminators
26.sql:          ASCII text, with CRLF line terminators
27.sql:          UTF-8 Unicode (with BOM) text, with CRLF line terminators
28.sql:          UTF-8 Unicode (with BOM) text, with CRLF line terminators
29.sql:          Little-endian UTF-16 Unicode C program text, with very long lines, with CRLF, CR line
30.sql:          UTF-8 Unicode (with BOM) C program text, with very long lines, with CRLF line terminat
37.sql:          UTF-8 Unicode (with BOM) English text, with CRLF line terminators
38.sql:          Little-endian UTF-16 Unicode text, with CRLF, CR line terminators
39.sql:          Little-endian UTF-16 Unicode text, with CRLF line terminators
44.sql:          UTF-8 Unicode (with BOM) text, with CRLF line terminators
AlwaysRun0001.sql: ASCII C program text, with CRLF line terminators
AlwaysRun0002.sql: UTF-8 Unicode (with BOM) C program text, with CRLF line terminators
TestData0001.sql:        UTF-8 Unicode (with BOM) text, with very long lines, with CRLF line terminators

You can see that there appear to be a variety of encodings. I initially assumed that a quick run through d2u or u2d would fix them up, but that did nothing to change the encoding or remove the BOM. In the end I found the IBM uconv command, that has the handy ‘–remove-signature’ option that was the key to the solution. Don’t confuse this with the GNU iconv app, that doesn’t allow you to strip the BOM from the front of your files.

$ uconv --remove-signature -t ASCII TestData0001.sql > TestData0001.sql2
$ rm TestData0001.sql
$ mv TestData0001.sql2 TestData0001.sql

After that, the WIX installer worked OK, and all was right with the world. I hope this helps you if you run into the same problem.

I can’t answer the question of why WIX/MSI fails to work with non-ASCII files (other than to say that Unicode blondness is a common problem of software written by Anglophones).

Automata-Based Programming With Petri Nets – Part 1

Petri Nets are extremely powerful and expressive, but they are not as widely used in the software development community as deterministic state machines. That’s a pity – they allow us to solve problems beyond the reach of conventional state machines. This is the first in a mini-series on software development with Petri Nets. All of the code for a full feature-complete Petri Net library is available online at GitHub. You’re welcome to take a copy, play with it and use it in your own projects. Code for this and subsequent articles can be found at

Continue reading

Surreal Graham Norton Moment

Today I got my first experience of being caught in a storm of golf ball sized hailstones. We’d decided to take the day out and visit the Moomba Festival in the city centre. All was well, and the kids were in a cub scouts arena trying their hands at rock climbing. Suddenly, we heard an ominous rumbling and within seconds we were pelted with inch thick hailstones.

We quickly caught the kids as they fell off the climbing wall, and nobly covered them with our bodies as we hurried them into the canvas awnings that the scoutleaders had erected earlier in the day. We were amazed at the size of the hailstones and all started to chat amongst ourselves about the unseasonal weather, and how we’d never seen the like.

The amazement turned to nervousness as the inch wide hailstones gave way to 1.5 or 2 inch wide, golf ball sized, stones. Gradually, one of the sturdy canvas roofs was torn to shreds, as were all of the European broadleaf trees in the park (next to the botanical gardens). The visibility reduced to 20m and that mostly due to hailstones, rain and the remnants of shredded leaves.

The rain fell so fast, there was no chance that it could soak away, so inevitably as the minutes passed, the water levels started to rise. Soon we were up to our ankles in freezing water. I mean ice cold meltwater. By the time 15 mins had passed, we were pretty wet, cold, and pissed off.

The hailstones went back to their original 1″ dimensions and the boyscouts sprang into action. First, one intrepid lad leapt out into it and did a dance for us all. That raised our spirits, and insprired what can only be known as ‘das uber scout’ to show us a glimpse of Baden-Powell’s vision.

Baden-Powell must have had a vision of a Father Ted episode. He must have because das uber scout stepped out into the hail and started singing gung ho campfire songs. I, naturally, thought that das uber scout was about to embark on a solo effort, like his compadre, previously. But, almost against their will, the other scouts gathered in the tattered awning started to sing along. Quietly, at first, like they were embarrassed to be seen by non scouts. But clearly, being seen scouting in public is a liberating experience, because they soon started to throw themselves into it with gusto. Kerry and I were mortified. Were we expected to sing along? Were we expected to applaud? Were we expected to entrust our kids to psychos like these in a few years time? We opted for staring at the floor instead.

Several images leapt into my mind as I stood watching das uber scout launch into the 35th verse of ‘I’m singing in the rain’ (with dance moves and aboriginal refrains). The first image was of me in 3 or 4 years time, accompanying the twins on their first away trip. Sat around the campfire singing ‘he leapt out of the airplane, what a terrible way to die’ and having, for their sakes, to pretend I was enjoying it.

The next thought was of that episode of Father Ted where Graham Norton traps an reluctant group of teenagers in a caravan and forces them, through sheer force of cheerfulness, to Irish dance till the caravan collapsed.

Das uber scout’s grim determination to make us have a ‘good time’ was creepy but memorable. Perhaps that’s what the scouts is like?

South America? Innocent victims in the war on terror!

Are you kidding me? tectonic causes for south american earthquakes was a total sham! Think about it! Everyone knows that there’s a war on. And have you noticed that the CIA has started to act very strangely? They obviously don’t want this story getting out. I mean, what would happen if people began asking Why are they all near the USA? Well, they may be able to fool the sheeple, but the members of the USGS aren’t swallowing their story. Look, don’t take it from me; General Colin Powell is convinced as well. But we have to act fast, because Yellowstone will vaporize the USA. I just wanted you to be aware of this, in case I disappear.


Less Intrusive Visitors

Forgive the recent silence – I’ve been in my shed.

Frequently, I need some variation on the Visitor or HierarchicalVisitor patterns
to analyse or transform an object graph. Recent work on a query builder
for an old-skool query API sent my thoughts once again to the Visitor pattern. I
normally hand roll these frameworks based on my experiences with recursive
descent compilers, but this time I thought I’d produce a more GoF-compliant

The standard implementation of the visitor looks a lot like the first code example. First you
define some sort of domain model (often following the composite pattern).
This illustration doesn’t bother with composite. I’ll show one later on, with an
accompanying HierarchicalVisitor implementation.

abstract class BaseElement {
  void Accept(IVisitor v);
class T1 : BaseElement {
  void Accept(IVisitor v) {
class T2 : BaseElement {
  void Accept(IVisitor v) {
class T3 : BaseElement {
  void Accept(IVisitor v) {
interface IVisitor{
  void Visit (T1 t1);
  void Visit (T2 t2);
  void Visit (T3 t3);

Here’s an implementation of the visitor, normally you’d give default
implementations via and abstract base class. I’ll show how that’s done later.

class MyVisitor : IVisitor {
  void Visit (T1 t1) {
    // do something here
  void Visit (T2 t2) {
    // do something here
  void Visit (T3 t3) {
    // do something here

The accept methods are on the domain model entities themselves. What if I have a
composite graph of objects that are not conveniently derived from some abstract
class or interface for my convenience? What if I want to iterate or navigate
the structures in alternate ways. What if I don’t want to (or can’t) pollute
my domain model with visitation code?

I thought it might be cleaner to factor out the responsibility for the
dispatching into another class – a Dispatcher. I provide the Dispatcher from my
client code and am still able to visit each element in turn. Surprisingly, the
result is slightly cleaner than the standard implementation of the pattern,
sacrificing nothing, but gaining a small increment in applicability.

Let’s contrast this canonical implementation with one that uses anemic objects
for the domain model. First we need to define a little composite pattern to
iterate over. This time, I’ll give the abstract base class for the entities
and for the visitors and show a composite pattern as well.

abstract class AbstractBase {
  public string Name {get;set;}
class Composite : AbstractBase {
  public string NonTerminalIdentifier { get; set; }
  public Composite(string nonTerminalIdentifier) {
    Name = "Composite";
    NonTerminalIdentifier = nonTerminalIdentifier;
  public List SubParts = new List();
class Primitive1 : AbstractBase {
  public Primitive1() {
    Name = "Primitive1";
class Primitive2 : AbstractBase {
  public Primitive2() {
    Name = "Primitive2";

A composite class plus a couple of primitives. Next, Lets look at the visitor

interface IVisitor {
  void Visit(Primitive1 p1);
  void Visit(Primitive2 p2);
  bool StartVisit(Composite c);
  void EndVisit(Composite c);

According to the discussions at the Portland pattern repository, this could be
called the HierarchicalVisitor pattern, but I suspect most applications of
visitor are over hierarchical object graphs, and they mostly end up like this so
I won’t dwell on the name too much. True to form, it provides mechanisms to
visit each type of element allowed in our object graph. Next, the Dispatcher that
controls the navigation over the object graph. This is the departure from the
canonical model. A conventional implementation of visitor places this code in
the composite model itself, which seems unnecessary. Accept overloads are
provided for each type of the domain model.

class Dispatcher {
  public static void Accept(Primitive1 p1, TV visitor)
    where TV : IVisitor {
  public static void Accept(Primitive2 p2, TV visitor)
    where TV : IVisitor {
  public static void Accept(Composite c, TV visitor)
    where TV : IVisitor {
    if (visitor.StartVisit(c)) {
      foreach (var subpart in c.SubParts) {
        if (subpart is Primitive1) {
          Accept(subpart as Primitive1, visitor);
        else if (subpart is Primitive2) {
          Accept(subpart as Primitive2, visitor);
        else if (subpart is Composite) {
          Accept(subpart as Composite, visitor);

The dispatcher’s first parameter is the object graph element
itself. This provides the context that was implicit with the conventional
implementation. This is a trade-off. On the one hand you cannot access any
private object information inside the dispatch code. On the other hand you can
have multiple different dispatchers for different tasks. Another drawback with
an ‘external’ dispatcher is the need for old-fashioned dispatcher switch
statements in the Composite acceptor. The Composite stores its sub-parts as
references to the AbstractBase class, so it needs to decide manually what the
Accept method is that must handle the sub-part in question.

The implementation for a visitor is much the same as in a normal implementation.
A default implementation of the visit functions is given that
does nothing. To implement a HierarchicalVisitor, the
default StartVisit must return true to allow iteration of the
subparts of a Composite to proceed.

class BaseVisitor : IVisitor {
  public virtual void Visit(Primitive1 p1) { }
  public virtual void Visit(Primitive2 p2) { }
  public virtual bool StartVisit(Composite c) {
    return true;
  public virtual void EndVisit(Composite c) { }

Here’s a Visitor that simply records the name of who gets visited.

class Visitor : BaseVisitor {
  public override void Visit(Primitive1 p1) {
  public override void Visit(Primitive2 p2) {
  public override bool StartVisit(Composite c) {
    return true;
  public override void EndVisit(Composite c) {

Given an object graph of type Composite, it is simple to use this little framework.

Dispatcher.Accept(objGraph, new Visitor1());

I like this way of working with visitors more than the conventional
implementation – it makes it possible to provide a good visitor implementation on
thrid party frameworks (yes, I’m thinking of LINQ expression trees). It is no
more expensive to extend with new visitors, and it has the virtue that you can
navigate an object graph in any fashion you like.