Generic Algorithms in C#

This post describes generic programming and the capabilities of generics in C#. I conclude that C# Generics and C# generally are not able to solve issues that are easy in the Standard Template Library. Nothing would make me happier than if you were able to demonstrate to me that my conclusions are wrong. If thats true, please don’t keep it to yourself!!!!! 

Ever since moving to C# from C++, I’ve missed the expressive power and elegance of the Standard Template Library (STL). When generics were introduced into C# 2.0 I assumed that to gain runtime type safety, some of that expressive power had to be sacrificed. My initial impression was that it was good only for generic collections, but not for those idioms that revolutionized C++ when the STL was adopted, and after Alexandrescu published his eye-opening Modern C++ Design. I’d never found the time to confirm my misgivings, so I recently made time to explore the issues – I’ve been working on a .NET project based on legacy code which was – how can I put it delicately – based on the best practice of yesteryear! I have a pressing incentive to find clean, expressive and sexy ways to enhance the code-base and architecture. Obviously, I’m breathless with anticipation of Linq in all its forms, but I have a job to do now with the tools at hand.

I don’t think I’ve ever seen syntactic and idiomatic support in any framework to better STL as it stood a decade ago. Don’t get me wrong – I love C# – but I regret that I can’t combine the best of both worlds. Are generics in C# 2.0 too impoverished to be able to provide the power of STL? Are we able to provide the full power that has until now been the exclusive reserve of STL developers?

Algorithms + Data Structures = Programs

The STL stems from work done by Alex Stepanov and Meng Lee, and earlier work on generic programming by Stepanov and Musser. Their object was to produce a set of general purpose container classes and associated algorithms that could exist independently. In languages that don’t support generics, you have four choices when you want to provide standard algorithms on containers:

  • you implement every method that you need on each container class (The MFC approach).
  • you code all containers and algorithms in terms of some common data type (the .NET System.Collections approach)
  • you implement a code generation system that creates strongly typed containers and algorithms (the solution in GCC prior to the introduction of templates in C++)
  • you don’t bother implementing algorithms at all (The C5 approach).

None of these options is particularly desirable, so Stepanov & Co developed an idiom for coding in C++ that allowed algorithms to be developed independently of data structures. This is an important point – you can develop a sort method that will sort a linked list as easily as an array. It’s not easy. His approach was to employ iterators, a set of coding conventions, and type exports.

Iterators

Iterators are vital – they provide the topology independent mechanism for getting at the elements in a collection. The Iterators in STL are based on pointer semantics, and in the case of arrays may actually be implemented using a pointer. The key feature of a pointer is that it provides a way to dereference a collection element (using operator *) to move forward and backward through the collection (operator ++ and operator –). in .NET, IEnumerator<T> provides roughly the same functionality, and in general it is provided as a nested class within a collection. There’s nothing sacred about pointer syntax. We can implement an iterator without using operator overloading. The key thing is pointer semantics. Pointers treat a block of memory as ordered, contiguous and typed. By being typed you know how far to move forward to get to the beginning of the next object. In other words, you can easily move forward through the ordered collection of objects. In the case of C/C++ pointers by using the operator++ method, but it could just as easily be by using a MoveNext method.

Option 2 is what we are currently faced with in C#’s System.Collections – all container classes store references to System.Object. If you want to ensure type safety in your system you’ll probably find yourself with all accesses to the container using casts and try-catch blocks to ensure no foreign bodies are introduced by accident. It’s nasty, messy and these days unnecessary when you have generics. The burning question is – if you can have generic data structures does that mean you can also have generic algorithms?

The answer to that is a resounding NO! Which is the reason why a lot of ex C++ programmers were sceptical at the usefulness of generics in C#. Sure they can help you store stuff easier, but they alone can’t provide a rich language in which to define generally useful reusable functionality. In STL, iterators are hand-coded nested classes in a collection. Their behaviour is dictated by the iterator category to which they belong. Iterators are classified as forward, reverse, bi-directional, or random access. They can also be const or non-const.

It’s worth quickly noting at this point that we can’t tack iterators onto the collection classes using external methods. We can attach methods such as Begin and End to the collections, but the iterators can’t access the storage managed by those collections – they’re strictly limited to accessing public methods, properties and fields. They are limited by the accessors already provided by the collections. The iterators could potentially be limited to one form of iteration – the forward iterator of IEnumerator<T>. That isn’t a hard and fast rule, but it is a concern!

In C#, the standard collection classes define iterators by implementing the IEnumerable<T> interface. IEnumerable<T> provides a means for algorithms to get at the IEnumerator<T> interface which is the actual iterator. However, STL would classify the in-built iterators as ‘Trivial Iterators‘. With the advent of the yield keyword, we can now define more than one iterator on a collection. This overcomes one hurdle to generic algorithms – different algorithms will require different types of iterator – if the framework only supports one kind, then some algorithms may be ruled out.

The interface to IEnumerator<T> is very simple.

public interface IEnumerator<T>
{
  bool MoveNext();
  T Current { get; }
  void Reset();
}

It’s clear that we can’t do random access with this type of iterator, and we can’t move backwards through the collection either. Since yield allows only trivial iterators we will probably have to use an alternative mechanism to iterate collections. Perhaps we need to expand the capabilities of IEnumerator<T>? The iterators will need to be constrained by coding conventions (of which more later on). There is nothing in principle to stop us doing whatever was possible with iterators in C++, in the same way – after all there is no in-built runtime support for iterators in C++, yet it copes well.

Generic Algorithms

In generic methods you often define your method signatures in terms of iterators. If you defined (in C++) a vector of integers like so:

vector<int> vint = new vector<int>();

and you defined an algorithm called Double like so:
void Double<Iterator>(Iterator start, Iterator end)
{
  for(Iterator i = start; i != end; i++){
    i *= 2;
  }
}

Then you could double the elements in the collection like so:

Double(vint.begin(), vint.end());

This performs an in-place doubling of each element between start and end inclusive. Sadly, you just can’t do this in C#. It’s not possible. The C# compiler doesn’t know at compile time whether the elements returned by an Iterator are numerical and can support the ‘*=’ operator. We cannot define operators on an interface, and therefore we cannot constrain the iterators to those defined on collections that store objects that support the ‘*=’ operator. Numerical types don’t share a common base class that we could use to distinguish them from other structs. Besides, we couldn’t define a constraint against a type argument of another generic class. For example, say you defined a List of Integers, and you passed a pair of iterators to Double, Double would not know that they were iterators over a collection containing integers, because all it knows is that it has a type argument called Iterator that could be anything at all.

Lets summarise the list of problems with this simple example for later reference:

  1. Can’t define operators on an interface , thus you can’t constrain to types that support operators
  2. Can’t issue constraints against type arguments of type arguments
  3. No consistent last item
  4. No static type checking

It would be nice to write some code like this:

if(Iterator.T is Numeric){
  for(Iterator i = start; i != end; i++){
    i *= 2;
  }
}

Or, better still

void Double(Iterator start, Iterator end) where Iterator.T : Numeric
{
  // ...
}

But it is not to be – at least in this version. Even if we could define constraints that specified the operators, we would not be able to retroactively attach the interfaces to types in the framework. Oh, and another thing – IEnumerator<T> doesn’t support Equality comparison, that rules out iteration between ranges. Iterators in STL support the ‘one beyond the end’ concept that enables a collection to have a definite immutable value for the end of a collection that doesn’t get out of date if new elements are added to the end of a collection. Contrast that with the following implementation of System.CharEnumerator.Current. 

public char Current {
  get {
    if (this.index == -1) {
      throw new InvalidOperationException (Environment.GetResourceString( "InvalidOperation_EnumNotStarted"));
    }
    if (this.index >= this.str.Length) {
      throw new InvalidOperationException (Environment.GetResourceString( "InvalidOperation_EnumEnded"));
    }
    return this.currentElement;
  }
}

As soon as the iterator gets to the end of the collection it will not allow a dereference of collection elements. Instead it begins to throw exceptions. If this is typical of the framework, it doesn’t bode well for algorithms like Double above, since you can’t dereference one beyond the end of the collection without throwing exceptions.

Without a common, immutable end of collection marker that can be passed to an algorithm, we are going to have problems. Firstly, we will have to provide algorithms for both the whole collection and for a range within that collection. Perhaps that’s why the canonical generic algorithm in System.Collections.List is ForEach? Without a guaranteed end of collection iterator value, we can’t be sure of iterating the whole of a collection that is being inserted to while it is being iterated.

Shortcomings of C#

Using a convention about how collections expose their type parameters, STL allows a thorough introspection into data types used in a collection and its contained elements, without any prior knowledge of the types.  In a Genetic Algorithm, I never need to know how my Genomes are encoded to be able to evolve them. In C# I can get an iterator such as IEnumerable<T>, but if T has further detail inside it, I have no easy way to get at it, because there are no typedefs or their equivalent inside IEnumerable. The code in my example above is completely illegal. You can’t do Iterator.T because T is not accessible from Double. All it knows about is Iterator. It could define a constraint to say that its iterators must be IEnumerator<T> but even then, it still can’t work without explicitly specifying the types iterated over.

Some collection libraries use a pattern of defining T (the element type) on the generic method, and then define their interfaces to accept IList<T>or IEnumerator<T>. This is partially generic – it allows the iterated elements to be parameterised, but not the collection types or their iterators.

What’s so bad about that, you might ask. Well, the type parameters would contain redundant type definitions which would be messy and would make the code brittle. In addition, that approach would only be useful if you knew how many levels you would need to dig into the type definitions. If you wanted to look inside T of IEnumerator<T>, you would have to provide extra parameters in the cases where T was itself generic. Obviously, messing with the parameter lists of a generic type or method won’t help. So, we need some sort of type export mechanism.

So, how do I find out more about the type parameter of IEnumerable<T> without using reflection, or what elegant workarounds and/or conventions could I adopt to make such type access easy? These problems are not specific to the realm of evolutionary programming – I just chose that because it’s fun – there is currently work being done by Anders Hejlsberg and the rest of the C# design team to find (or bake in) ways to achieve this in future versions of C#.

Exporting Types

One feature of C++ that is used liberally in the STL is typedefs. typedef in C++ is useful for allowing type parameters to be inspected by algorithms. I’ll show a few examples of typedefs at work below. This is a critical difference between the template system of C++ and the generics system of C#. In C# there is very little information flow between the generic type argument and the type constraints. C++ uses static type checking to check for type errors at compile time. It does not do this by checking type arguments against a lookup table of permissible types as is the case with C#. Instead it attempts to find bindings for methods, fields and typedefs for each specific constructed type. Consider the following example. I’ve defined a class A with a public method Foo. I’ve then defined a template method Bar
that takes a vector of T. 

class A {public: int Foo(){/* ... */}}
// ...
template<typename T, typename R>
R Bar(vector<T> arg1){return arg1.Foo();}
// ...
int result = Bar<A, int>(someVector);

There is nothing about the definition of Bar
that tells it that it is dealing with objects of type A. A C#
compiler would produce a compile-time error for code like this. There is no constraint that it should restrict Bar to classes of type A or its descendants. C++ gets around this problem using static type checking. It can see that when we invoke Bar
with a vector<A>
we are OK, but with some other type we are not. It will only issue compilation errors in the case where you try to invoke Bar
with a vector of some other type. This static type checking is powerful. It allows generic methods to access typedefs in its type arguments. Imagine you were writing a genetic algorithm, and you wanted to do a random mutation on randomly chosen population members.

template<typename PopType>
PopType::GenomeType MutatePopulation(PopType population){
 for(PopType::iterator i = population.begin(); i != population.end(); i++){
  PopType::GenomeType gen = StochasticInvoke(Mutate(*i), PopType.MutationProbability);
  *i = gen;
 }

In the above example I have accessed type definitions to get iterators, and genome types used to declare local variables. Without this kind of thing we are forced to add internal types on generic parameter lists:

GenomeType MutatePopulation<PopulationType, PopulationIteratorType, GenomeType>( PopulationType pop){
 for(PopulationIteratorType i = pop.begin(); i != pop.end(); i.MoveNext()){
  GenomeType gen = StochasticInvoke(Mutate(i.Current), pop.MutationProbability);
  *i = gen;
 }
}

As explained earlier this is nasty. It means that if I were to find a different way to represent GenomeType, I would have to change every place that uses a genome, rather than just the one place that stores the genomes. With the aid of typedefs I can just modify the declaration of my population collection, and the type will flow out into every generic method that works with populations. I want to be able to define the data type of a variable once and thereafter not have to mention it again, since clients can get it from the parameters I pass around.

Type exporting using external methods

The lack of static type checking in C# means we can’t declare variables of types exported inside of type arguments. We can pass them around, though. Using extension methods (that are currently in pre-alpha release in the May CTP of C# 3.0) we can augment System.Type to provide access, with the aid of reflection.

namespace System{
  public class TypedefAttribute : Attribute { }
  public static class TypeExtensions {
    public static IDictionary<string, Type> typedefs;
    public static Type GetTypedef(this Type source, string typedef) {
      if (typedefs == null) {
        typedefs = ReflectTypeExports(source);
      }
      if (typedefs.ContainsKey(typedef))
        return typedefs[typedef];
      else
        throw new ApplicationException("No type export named '" + typedef + "'");
    }
    public static IDictionary<string, Type> ReflectTypeExports(Type source) {
      Dictionary<string, Type> result = new Dictionary<string, Type>();
      // Get all properties adorned with TypedefAttribute from source
      var q = from p in source.GetProperties()
              where p.GetCustomAttributes(typeof(TypedefAttribute), rue)
                 .Length > 0
              select p;
      foreach (System.Reflection.PropertyInfo pi in q) {
        result[pi.Name] = (Type)pi.GetValue(null, null);
      }
      return result;
    }
  }
}
namespace extensions {

public class A<T> {
    [Typedef] public static Type TType { get { return typeof(T); } }
  }
  public class B<T> {
    [Typedef] public static Type TType { get { return typeof(T); } }
  }  class Program {
    static void Main(string[] args) {
      Console.WriteLine(typeof(A<B<int>>).GetTypedef("TType").
      GetTypedef("TType").ToString());
      Console.Read();
    }
  }
}

This is fine for passing types around, but you can’t do much with it other than treat it as a variable. You can’t declare a variable like so

typeof(PopulationType).GetTypedef("GenomeType").GetTypedef("AlleleType") allele = GetRandomAllele(iterator.Current);

You could use the old

Activator.CreateInstance(typeof(PopulationType).GetTypedef("GenomeType").GetTypedef("AlleleType"));

trick, but then you’ve got a System.Object, and what do you cast it to? There’s no type safety.

Lets take a look at a good collection of generic algorithms. The PowerCollections library from Wintellect. Here’s a typical example from the generic methods class Algorithms:

public static IEnumerable<T> Rotate<T>(IList<T> source, int amountToRotate)
{
  if (source == null)
    throw new ArgumentNullException("source");
  int count = source.Count;
  if (count != 0) {
    amountToRotate = amountToRotate % count;
  if (amountToRotate < 0)
    amountToRotate += count;
  // Do it in two parts.
  for (int i = amountToRotate; i < count; ++i)
    yield return source[i];
  for (int i = 0; i < amountToRotate; ++i)
    yield return source[i];
  }
}

The method declares type parameters for the collected type, and then fixes the type for the collection. That is pretty much all you can do in C# without typedefs. What happens if you decide that your algorithm could be applied to any ordered type that can be iterated, rather than just ILists? Well, you could use iterators, but we know from the earlier discussion that currently, IEnumerator<T> doesn’t use a fixed end point for the end of a collection. We may run into consistency and thread safety issues, because the end of a collection could change continually, so it can’t be passed around.

Conclusions

We’ve seen that the standard template library is a product of convention and integrated language support. You can get a long way with just iterators. Do those iterators need to be stable in a multithreaded environment? Probably. I attended a talk at TechEd Sydney given by Joel Pobar a few weeks back, where he pointed out that for developers to exploit the coming multi-core machines, they would have to embrace multi-threading. That implies that our generic algorithms will have to be designed to work in a thread safe way. You can sacrifice flexibility in your choice of containers to allow iteration, as has been done in the PowerCollections library. You could try to augment the iterator mechanisms to allow the integrated language support of C# to dovetail with a better collections library design. This was done in the C5 library, where const iterators and bi-directional iterators have been grafted onto the existing collection iteration mechanism.

For truly generic algorithms (rather than shallow transformative operations such as the PowerCollections’ Rotate method given above) we need types to be flexible and informative. They are informative, but only about the type itself. Not about the relationships that the type has with other types. And what relationships the type does have, are not usable at compile time. That’s where the typedef mechanism in C++ comes in. In C#, we are hindered by the richness of the reflection mechanism. Types are runtime things passed around just like other objects. We need them to be as flexible and informative at compile time. Currently they aren’t. We may be able to do extraordinary things in future releases of C#, but currently we will never have the same degree of genericity as we had in C++. However, with a bit of good design and some compromise, we can produce some very flexible algorithms of lasting value.

If I could ask for one thing to be included in future versions of C# – it would be static type checking, allowing generic type arguments to carry more information than the minimum required to create a concrete type at runtime. I want typedefs.

About these ads

18 comments

  1. There have been numerous attempts to create something STL like in C#. However, none of the ones that I have seen get the iterator concept right (or even try). So …

    I have started yet another C# STL library: CSTL at http://sourceforge.net/projects/cstl/ . The key difference with my library is that I do have a concept of iterators. IEnumerator just doesn’t cut it. Here is an example


    int[] src = new int[] {1,2,3,4,5};
    List dest = new List();

    Algorithm.Copy(IteratorUtil.Begin(src), IteratorUtil.End(src), IteratorUtil.BackInsertIterator(dest));

    There is also a simpler overload:
    Algorithm.Copy(src, IteratorUtil.BackInsertIterator(dest));

    Writing algorithms is very C++ like:


    public static OutputIterator Copy(InputIterator begin, InputIterator end, OutputIterator target)
    {
    for(begin=IteratorUtil.Clone(begin), target=IteratorUtil.Clone(target);
    !begin.Equals(end);
    begin.MoveNext(), target.MoveNext())
    {
    target.Write(begin.Read());
    }
    return target;
    }

    CSTL is not ready for primetime, but I like how it is coming together.

    H^2

  2. Hi Harold,
    Thanks for reading and replying to this post.

    Excellent work!

    I like what you have achieved so far with CSTL. And I’m glad that you have made the time to do this work! It will really add to the framework, and I for one would be glad to adopt it. In fact I thought it might be the perfect tool to use in the next blog post that I have in the pipeline. I Started this blog post after revisiting some old C++ ATL/STL code for doing Genetic Algorithms I wrote back in the late 90s. The blog post started out exploring the way that I was able to mutate and manipulate complex data structures using STL in a GA. It quickly got huge and directionless so I narrowed the scope to exploring C#’s capabilities re STL. My motive is to apply the STL to GAs, though, so I need a lot of complex algorithms and a wide variety of container classes (for bit vectors, trees, lists etc). So, the next installment of the blog post is to address the application of STL style coding strategies in C#.
    At the risk of sounding inflamatory (which I really don’t mean to be) how easy would it be for you to combine your iterator and algorithm code with the work on collections already done by the C5 team? It seems that the least developed part of CSTL is the collections. Whereas, the least developed part of C5 is the iterators and algorithms. Perhaps if the two were brought together, then you might end up with something really compelling to a Common-or-garden C# developer? I know that the license would allow you to do such a thing…

    Either way, I was very encouraged to see what you’ve done.

  3. Hmm, actually all the generic type parameters dissapeared in that post, sorry about that. If you would give me an e-mail adress then I could send you the post I made. I think you will find it interesting as it solves your problem with constructing generic algorithms.

    Send me a message at: bilsaboob@hotmail.com

    Sorry for bloating your blog with that crap message above :(

  4. Nope, didn’t work that either. How do I write code so that the generic start/stop tags don’t dissapear ?

  5. sorry – blame wordpress.

    My recommendation. Wrap your code in a <code> tag, and convert all <’s and >’s to &lt; and &gt;.

    (I’ll be amazed if this comment comes out right!

  6. Hey guys!

    Very intresting articles :) and great work on the CSTL library!

    I have been working on something similar for my 3d engine. I have
    always thought that while the available .net collections in the
    framework are nice to work with they are also not as flexible
    as I would want them to be.

    Ok, so like you guys I tried to make a port of the c++ stl library,
    but soon noticed that the c# generics aren’t powerfull enough for
    expressing generic algorithms – so I thought at least.

    After several framework designs I came up with one that solves
    the most crucial part in creating a generic algorithm library.

    The most crucial problem I would say originates from the lack of
    generic parameter deduction from constraints.

    So anyhow, concider the following collections architecture:

    /*############### Start of Position navigation interfaces ################*/
    /* Just a base interface for abstraction purposes */
    public interface Position { }

    /*IForwardIterable<PosT> navigation "attribute" tells that a position
    *know how to move to the next position in a collection.
    */
    public interface IForwardIterable<PosT>
    : Position
    {
    bool HasNext { get; }
    PosT Next { get; }
    }
    /*IBackwardIterable<PosT> navigation "attribute" tells that a position
    *know how to move to the previous position in a collection.
    */
    public interface IBackwardIterable<PosT>
    : Position
    {
    bool HasPrev { get; }
    PosT Prev { get; }
    }
    /*IBidirectionIterable<PosT> navigation "attribute" tells that a position
    *know how to move both to the next and the previous position in a collection.
    */
    public interface IBidirectionIterable<PosT>
    : IForwardIterable<PosT>, IBackwardIterable<PosT>
    {
    }
    /*############### End of Position navigation interfaces ################*/

    /*############### Start of Position attribute interfaces ###############*/
    /* IReadable<T> "attribute" tells you can read from a position
    * in a collection
    */
    public interface IReadable<T>
    {
    T Value { get; }
    }
    /* IWriteable<T> "attribute" tells you can write from to a position
    * in a collection
    */
    public interface IWriteable<T>
    {
    T Value { set; }
    }
    /* IWriteable<T> "attribute" tells you can swap two positions
    * in a collection. This attribute is good for sorting algrithms
    */
    public interface ISwappable<PosT>
    {
    void swap(Position swapPosition);
    }
    public interface IReadableWriteable<T>
    : IReadable<T>, IWriteable<T>
    {
    new T Value { get; set; }
    }
    /*############### End of Position attribute interfaces ###############*/

    /*############### Start of Collections interfaces ###############*/
    public interface IReadableCollection<T>
    {
    IReadableCollection.Position<T> Begin { get; }
    IReadableCollection.Position<T> End { get; }
    }
    public static class IReadableCollection
    {
    /* The IReadableCollection positions shall be able to
    * navigate forward to the next position in the collection
    * and it shall be possible to read the value from this
    * position, hence the IReadable<T> inheritance.
    */
    public interface Position<T>
    : IForwardIterable<Position<T>>,
    IReadable<T>
    {
    }
    }

    public interface ICollection<T>
    : IReadableCollection<T>
    {
    new ICollection.Position<T> Begin { get; }
    new ICollection.Position<T> End { get; }
    }
    public static class ICollection
    {
    /* The ICollection positions can be navigated in the same
    * way the IReadableCollection positions, BUT they also
    * can be written to, hence the IWriteable<T> inheritance.
    * Note that the IForwardIterable<Position<T>> MUST be
    * inherited seemingly "again" for this NEW type of position,
    * namely ICollection.Position<T>
    */
    public interface Position<T>
    : IReadableCollection.Position<T>,
    IForwardIterable<Position<T>>,
    IWriteable<T>
    {
    }
    }

    public interface IReadableSequence<T>
    : IReadableCollection<T>
    {
    new IReadableSequence.Position<T> Begin { get; }
    new IReadableSequence.Position<T> End { get; }
    }
    public static class IReadableSequence
    {
    /* The IReadableSequence positions can be navigated in the same
    * way the IReadableCollection positions, AND they also
    * can navigate backwards to the previous position in the
    * collection, hence the IBidirectionIterable<Position<T>>
    */
    public interface Position<T>
    : IReadableCollection.Position<T>,
    IBidirectionIterable<Position<T>>
    {
    }
    }

    public interface ISequence<T>
    : IReadableSequence<T>
    {
    new ISequence.Position<T> Begin { get; }
    new ISequence.Position<T> End { get; }
    }
    public static class ISequence
    {
    /* You get the point...
    */
    public interface Position<T>
    : IReadableSequence.Position<T>,
    IBidirectionIterable<Position<T>>,
    IWriteable<T>
    {
    }
    }
    /*############### End of Collections interfaces ###############*/

    So basically, the achitecture is built on a kind of “Iterator” – the position
    navigation “attributes”. (btw, when I say “attributes” I don’t mean .net attributes
    but rather the meaning of attributes in natural language)

    The positions, know how to navigate forward/backward or in some other way like
    the case of Tree iterators. If you want you may think of them as “accessors”.

    The other crucial part of the framework are the attributes of the positions, like
    if the position can be written to, or read from or maybe if it can be swapped with
    another position in the collection it originates from.

    So that covers the first part of the framework.

    Now you might ask what, – so what? what does that help us?
    - Well the anwser lies in the “PosT” parameter of the navigational
    interfaces. You might have noticed from the collection interfaces
    that their corresponding position types each inherit from something
    like IForwardIterable<Position<T>>, IBackwardIterable<Position<T>>

    This Position<T> parameter makes that we don’t loose
    the actual type of the Position and its navigational attributes when
    converting them for navigation. For example:

    public static PosT getNextConservedPosition<PosT>(PosT nextOfThisPosition)
    where PosT : IForwardIterable<PosT>
    {
    return nextOfThisPosition.Next;
    }

    ISequence.Position pos = mySequence.Begin;

    ISequence.Position next = getNextConservedPosition(pos);

    In this simple example we see that we created a function which moves to the next
    position in a collection and returns that next position. The nifty part is that
    thanks to the above mentioned Position<T> parameter, we don’t need
    to loose any type information about what type of position we are moving forward
    from.

    In contrary to a simple implementation of the same concept where you don’t store
    any information about the real position type:

    interface ISimpleForwardIterable
    {
    ISimpleForwardIterable Next;
    }
    //or
    //Here T stands for the Value type of the instantiated collections generic parameter
    interface IOtherSimpleForwardIterable<T>
    {
    IOtherSimpleForwardIterable<T> Next;
    }

    In this simplier version we would loose all crucial information about the Implementation
    of the Position once we move to the “Next”.

    So, Lets take a look at some of the expressive power of this framework!

    We will take some simple generic algorithms like copy, copy from back to front and finding
    elements in a collection. So concider these algorithms:

    /*############### Start of Algoritms ###############*/
    public static partial class Algorithms<T>
    {
    /* Generic algorithms that copies the elements between "from" and "to" into the collection
    * to where "dest" points to.
    */
    public static void copy<SourcePosT, DestPosT>(SourcePosT from, SourcePosT to, DestPosT dest)
    where SourcePosT : IForwardIterable<SourcePosT>, IReadable<T>, IComparable<SourcePosT>
    where DestPosT : IForwardIterable<DestPosT>, IWriteable<T>
    {
    SourcePosT currentSourcePos = from;
    DestPosT currentDestPos = dest;

    do
    {
    currentDestPos.Value = currentSourcePos.Value;
    //copy while we havn't reached the destinated "to" position
    //and while we actually CAN move forward...
    } while ((currentSourcePos.CompareTo(to) != 0) &&
    currentSourcePos.HasNext &&
    currentDestPos.HasNext &&
    (currentSourcePos = currentSourcePos.Next) != null &&
    (currentDestPos = currentDestPos.Next) != null);
    }

    /* Generic algorithms that copies the elements between "from" and "to" into the collection
    * to where "dest" points to. The values are copied traversing backwards from the "from"
    * position.
    */
    public static void copyBackToFront<SourcePosT, DestPosT>(SourcePosT from, SourcePosT to, DestPosT dest)
    where SourcePosT : IBackwardIterable<SourcePosT>, IReadable<T>, IComparable<SourcePosT>
    where DestPosT : IForwardIterable<DestPosT>, IWriteable<T>
    {
    SourcePosT currentSourcePos = from;
    DestPosT currentDestPos = dest;

    do
    {
    currentDestPos.Value = currentSourcePos.Value;
    //copy while we havn't reached the destinated "to" position
    //and while we actually CAN move Backward in the source collection,
    //and while we actually CAN move Forward in the destination collection!
    } while ((currentSourcePos.CompareTo(to) != 0) &&
    currentSourcePos.HasPrev &&
    currentDestPos.HasNext &&
    (currentSourcePos = currentSourcePos.Prev) != null &&
    (currentDestPos = currentDestPos.Next) != null);
    }

    /* Generic algorithms that searches forward through a collection for the specified
    * "findValue"
    */
    public static PosT find<PosT>(PosT from, T findValue, IComparer<T> comparer)
    where PosT : IForwardIterable<PosT>, IReadable<T>
    {
    PosT currentPos = from;

    do
    {
    //Check if the current position points to a value equal to "findValue"
    if (comparer.Compare(currentPos.Value, findValue) == 0)
    return currentPos;

    //continue search while we havn't reached the destinated "to" position
    //and while we actually CAN move Forward in the collection!
    } while ( currentPos.HasNext &&
    (currentPos = currentPos.Next) != null);

    //The value was not found!
    return default(PosT);
    }
    }
    /*############### End of Algoritms ###############*/

    As you can see, in neither of the algorithms we loose any information about the Positions.
    The attributes follow us trhough the entire algorithm, and the same type of position
    can be returned.

    The crucial key elements of the algorithms are the constrants “where PosT :”. This is where
    you constraing the algorithms to a certain “iterator”.

    For example the copy algorithm, needs to be able to move forward through the source collection -
    hence the contraint IForwardIterable<SourcePosT> . But it also needs to be read
    from – hence the IReadable<T> constraint.
    The algorithm also need to be able to move forward through the destination collection – hence
    the IForwardIterable<DestPosT> constraint. And similarly the IWriteable<T> constraint.

    So, as you see. This framework shows you how to overcome the “inadequate” expressive power of
    generics, where we to some degree simulate the expressive power of c++ templates – to the degree
    of beeing able to express a fair amount of generic algorithms.

    However, there are some drawbacks with this framework. The main problem is that each collection
    type needs to hide the inherited functions and properties that work with Positions. In my examples
    only the Begin/End properties need to be hidden. This drawback can however be worked around fairly
    easy by introducing abstract classes for each of the collection types.

    The reason for the abstract classes is to hide the implementation of the inherited collections
    properties and functions that work with Positions.

    Please post your thoughts if there is something you don’t understand :P

    Finally, If anyone would decide on using this framework for a collection library or some other
    utilisation – I would appreciate some credits going to me for comming up with this sollution.

  7. Oops, I forgot to give you an example of the usage:

    public class Program
    {
    static void Main(string[] args)
    {
    IReadableCollection<int> rc = null;
    //Search for the value 10 in the collection
    IReadableCollection.Position<int> foundPos = Algorithms<int>.find(rc.Begin, 10, null);

    ISequence<int> s = null;
    //Copy the values from "rc" collection to "s" sequence
    Algorithms<int>.copy(rc.Begin, rc.End, s.Begin);

    ICollection<int> c = null;
    //Copy the values back to front from the "s" sequence to "c" collection
    Algorithms<int>.copyBackToFront(s.Begin, s.End, c.Begin);
    }
    }

    Just copy this code along with the code sections in the previous post into Visual Studio .net and you can
    try it out yourself :)

  8. Lets see if this gets formated correctly:


    public class Program
    {
    static void Main(string[] args)
    {
    IReadableCollection<int> rc = null;
    IReadableCollection.Position<int> foundPos = Algorithms<int>.find(rc.Begin, 10, null);

    ISequence<int> s = null;
    Algorithms<int>.copy(rc.Begin, rc.End, s.Begin);

    ICollection<int> c = null;
    Algorithms<int>.copyBackToFront(s.Begin, s.End, c.Begin);
    }
    }

  9. Ok, I’ll try with preformatted text instead:

    public class Program
    {
    static void Main(string[] args)
    {
    IReadableCollection<int> rc = null;
    IReadableCollection.Position<int> foundPos = Algorithms<int>.find(rc.Begin, 10, null);

    ISequence<int> s = null;
    Algorithms<int>.copy(rc.Begin, rc.End, s.Begin);

    ICollection<int> c = null;
    Algorithms<int>.copyBackToFront(s.Begin, s.End, c.Begin);
    }
    }

  10. I have done some work on the C5 Collections library to improve support for standard .NET collection interfaces. This has spawned off into the newly created NHive project. One of the things I would be interested in is integrating CSTL with C5 to get the best of both worlds.

    Regards,
    Gerke.

  11. Gerke,

    That would be wonderful news! The two combined would be an awesome framework, and I would certainly adopt it. I’m not sure what the current status of CSTL is. I can’t help wondering about the following:

    * how compatible are the two breeds of iterators in CSTL and C5?
    * If you create whole new types of iterator, would you lose the ability to use the LINQ to objects extension methods, which will add another way to do generic methods…?

    Either way, it would certainly be pleasant to put such collections to agressive use in some of my applications.

    Andrew

  12. Andrew,

    We are still in the design stage for NHive. Our intention is to create a new collection library. We will use the latest C5 library as departing point, but will not aim to maintain backward compatibility. Current thinking is to simplify the core collection classes and allow additional functionality to be layered on top/plugged in.

    When it comes to iterators, I’d consider adopting the CSTL approach. Regarding LINQ, as long as we support IEnumerable wherever possible, the LINQ extension methods should work.

    Regards,
    Gerke.

  13. I wonder how your API design efforts will proceed. I imagine that you will be targeting .NET 3.5, since if the project is a long one, that will be the latest version you’ll use by the time NHive comes to maturity. I’m just imagining how different my designs might be if I were to design a collections library now, with the use of Extension methods and LINQ, rather than the traditional ‘hierarchy of topologies’ that seems to be the norm for any large scale collections library. Maybe, the differences are ultimately pretty superficial (it is syntactic sugar after all), but in terms of readbility and genericity, you might be able to combine generic algorithms into the collections library really smoothly and without side effects using such a mechanism. It would decouple the collections cleanly from the algorithms too, which would I imagine be a major focus of your design efforts.

    What are your priorities?

  14. My first priority is to create a core set of collection classes with as little baggage (added functionality) as possible. I want to keep the generic algorithms and additional functionality (such as views and snapshots in the current C5) as decoupled as possible from the core collections. The .NET 3.5 extension methods are interesting in this regard, though it would be nice to have a decent solution for .NET 2.0 too.

    My current thinking is to provide the core collection implementations with sufficient functionality to introduce an arbitrary number of interceptors that can be used to layer additional functionality on the collections. I suspect we will also need to provide an introspection interface on the core collections to allow for example optimisation of generic algorithms. In the C5 collections a lot of the collection semantics are captured on the interfaces supported by collections and additionally on some properties. My gut feeling is that we must be able to that in a cleaner way that does not clutter the core collection API as much.

    If you would be interested in reviewing/contributing to the design of the collections, I’d be more than happy to add you to the newly created nhive-development google group. That will be the right place to post and discuss ideas at this point in time. No strings attached, I’m pretty busy otherwise myself.

  15. Hi Gerke,

    I’d be more than happy to contribute to design discussions etc. Please feel free to add me.

    My Email address is: matthews DOT andrew AT gmail DOT com

Comments are closed.