Tag Archives: extension method

C++ Extension Methods

MP900386081

Here’s another wish for C++14:  Extension Methods.

I’m a huge fan of these in C#.  In case you’re not familiar, I’ll give you a quick introduction.  An extension method is simply a free function that permits the interface of a class to be extended outside the class definition.  This is achieved through a magical bit of syntactic sugar.  In C#, extension methods are declared static, with the first parameter representing the object to be extended by the method.  Intuitively, the keyword ‘this‘ is used to mark this parameter.  Here is an example:

    
public static class StringExtensions
{
    public static bool IsNullOrEmpty(this string s)
    {
        return string.IsNullOrEmpty(s);
    }

    public static int ParseInt(this string s)
    {
        return s == null ? 0 : int.Parse(s);
    }
};

void UsingExtensionMethods()
{
    string s = "42";

    // without extension methods
    var isNull = string.IsNullOrEmpty(s);
    var intVal = int.Parse(s);

    // with extension methods
    isNull = s.IsNullOrEmpty();
    intVal = s.ParseInt();

    // the syntactic equivalent, also permitted by the user
    isNull = StringExtensions.IsNullOrEmpty(s);
    intVal = StringExtensions.ParseInt(s);
}

Okay, so there’s a marginal improvement in compactness and readability, so what’s the big deal? I’ll tell you. Extension methods are all about tooling support. Here’s what the editing experience looked like as I typed the code above into my IDE:

ExtMethods

Here, all I could remember was that I needed a method on the string class that had the word “null” in it somewhere.  And that’s all I needed to remember:  any modern IDE offers autocomplete, which is a great productivity boost.  It saves typing and it saves remembering.  The really clever thing about extension methods in .NET is that they afford an expando-like capability to strongly-typed compiled languages – something usually associated solely with dynamic languages, like Javascript or Ruby.

Koenig to the Rescue?

Some have argued that C++ does not need this feature, and that Argument Dependent Lookup is sufficient, but this misses the point of extension methods.  When we program in OO languages, the starting point for every new line of code is almost always an object.  We type in an “argument” (a value/reference/pointer to an instance of a type), usually with some help from autocomplete.  Then, we type in a member (method/property/event), again with help from autocomplete.  On the other hand, if I begin by typing the function, everything’s backwards.  Rather than scoping to the argument in question, I am considering everything that is in scope at the moment.  And there is no support the IDE can give me.  According to Anders Hejlsberg, that is precisely the reason LINQ queries buck the SQL approach and begin with the “from” keyword, rather than the “select” keyword – to support Visual Studio’s autocomplete feature (Intellisense).

Subclassing, Anyone?

Others have suggested the tried and true OO solution: subclassing.  However, this technique does not scale at all.  First, it cannot work with built-in primitive types like int.  Second, it doesn’t work for many UDTs (e.g., those with private constructors, non-virtual destructors, etc).  Third, it doesn’t support “optional” (nullable) arguments.  Fourth, it interacts poorly with code maintenance.  Given an existing body of code using a given type, say std::string, if I want to inject a new method foo() on that type, I have to subclass it, creating stringex.  Now, I have to chase all references to std::string through the codebase, replacing them with stringex.  At least, I’d have to do this as deep and far as I’d ever expect to need to call foo (and who can anticipate that?).  All remaining instances would simply be sliced back down to std::string.  Now imagine iterating that process for future maintenance cycles.  Hmmm.  Much has been written on the topic of abusing subclassing, by Herb Sutter, Scott Meyers and others.  In the end, it’s not a viable implementation strategy for extension methods.

Operator, Give Me Long Distance

Interestingly, C++ already comes very close to this capability when defining free binary operator functions, such as ostream & operator<<(ostream & os, string & s).  In this case, the compiler automatically permits a more natural infix syntax for such operators.  When combined with returning the first argument back out of the operator, chained fluent-style programming can be achieved, like so:  cout << str1 << str2 << str3.  The standard library’s IO manipulators go one step farther, defining << operators that take a free function, such as endl, permitting call chains like cout << str1 << endl.

Some (like Boost) have put all of this behavior to use by designating an arbitrary, infrequently-used operator to do the work of invoking arbitrary extension method functors.  In a nutshell, here’s how it looks:

template<typename extension>
string & operator|(string & s, extension method )
{
	method(s);
	return s;
}

function<string(string &)>
Left(int count)
{
	return [=](string & s)
	{
		return s.substr(0, count);
	};
}

void TestExtensionMethod()
{
	string s = "hello world";
	string s2 = s | Left(5);	// "hello"
	cout << s << s2;

	// oops - operator precedence fails - parens needed
	cout << (s | Left(5));
}

It’s testimony to the power of C++, but it quickly breaks down in real-world scenarios.  For example, operator precedence is not natural.  Also, we’re really back where we started.  The IDE doesn’t know we’re invoking a function on the right side of the operator, so it’s silent as the grave.  And without IDE support, there’s no gain in productivity.

The Request

In the end, there doesn’t appear to be a way to achieve extension methods with the language as it stands.  Therefore, I would propose that C++ be extended to permit an alternate calling syntax for any static free function, effectively eliding the first argument and passing it implicitly as the ‘this’ pointer.  The behavior should work closely with ADL rules to determine when and where scoping is necessary.  Of course, this feature would introduce ambiguities which the programmer would need to resolve.  An example that springs to mind immediately would be iterator begin and end methods.  Binding preference should be given to “regular” instance methods, over extension methods.  Unlike C#, I would want to permit calling an extension method from inside another method, again assuming all ambiguities are resolved.  This feature should also fully embrace template support.  Imagine generating families of helper methods on an arbitrary type (either primitive or UDT – it makes no difference).  Now we have something approximating a “mixin”.  Imagine combining that behavior in turn with template-based systems for parsing, serialization, etc.  This would be a powerful feature indeed.

P.S.  After writing this, I stumbled over the D language’s Uniform Function Call Syntax feature.  I think it fits the bill rather nicely!