Category Archives: C++ Wish List

Ideas for improving C++

C++ Extension Methods

MP900386081

Here’s another wish for C++14:  Extension Methods.

I’m a huge fan of these in C#.  In case you’re not familiar, I’ll give you a quick introduction.  An extension method is simply a free function that permits the interface of a class to be extended outside the class definition.  This is achieved through a magical bit of syntactic sugar.  In C#, extension methods are declared static, with the first parameter representing the object to be extended by the method.  Intuitively, the keyword ‘this‘ is used to mark this parameter.  Here is an example:

    
public static class StringExtensions
{
    public static bool IsNullOrEmpty(this string s)
    {
        return string.IsNullOrEmpty(s);
    }

    public static int ParseInt(this string s)
    {
        return s == null ? 0 : int.Parse(s);
    }
};

void UsingExtensionMethods()
{
    string s = "42";

    // without extension methods
    var isNull = string.IsNullOrEmpty(s);
    var intVal = int.Parse(s);

    // with extension methods
    isNull = s.IsNullOrEmpty();
    intVal = s.ParseInt();

    // the syntactic equivalent, also permitted by the user
    isNull = StringExtensions.IsNullOrEmpty(s);
    intVal = StringExtensions.ParseInt(s);
}

Okay, so there’s a marginal improvement in compactness and readability, so what’s the big deal? I’ll tell you. Extension methods are all about tooling support. Here’s what the editing experience looked like as I typed the code above into my IDE:

ExtMethods

Here, all I could remember was that I needed a method on the string class that had the word “null” in it somewhere.  And that’s all I needed to remember:  any modern IDE offers autocomplete, which is a great productivity boost.  It saves typing and it saves remembering.  The really clever thing about extension methods in .NET is that they afford an expando-like capability to strongly-typed compiled languages – something usually associated solely with dynamic languages, like Javascript or Ruby.

Koenig to the Rescue?

Some have argued that C++ does not need this feature, and that Argument Dependent Lookup is sufficient, but this misses the point of extension methods.  When we program in OO languages, the starting point for every new line of code is almost always an object.  We type in an “argument” (a value/reference/pointer to an instance of a type), usually with some help from autocomplete.  Then, we type in a member (method/property/event), again with help from autocomplete.  On the other hand, if I begin by typing the function, everything’s backwards.  Rather than scoping to the argument in question, I am considering everything that is in scope at the moment.  And there is no support the IDE can give me.  According to Anders Hejlsberg, that is precisely the reason LINQ queries buck the SQL approach and begin with the “from” keyword, rather than the “select” keyword – to support Visual Studio’s autocomplete feature (Intellisense).

Subclassing, Anyone?

Others have suggested the tried and true OO solution: subclassing.  However, this technique does not scale at all.  First, it cannot work with built-in primitive types like int.  Second, it doesn’t work for many UDTs (e.g., those with private constructors, non-virtual destructors, etc).  Third, it doesn’t support “optional” (nullable) arguments.  Fourth, it interacts poorly with code maintenance.  Given an existing body of code using a given type, say std::string, if I want to inject a new method foo() on that type, I have to subclass it, creating stringex.  Now, I have to chase all references to std::string through the codebase, replacing them with stringex.  At least, I’d have to do this as deep and far as I’d ever expect to need to call foo (and who can anticipate that?).  All remaining instances would simply be sliced back down to std::string.  Now imagine iterating that process for future maintenance cycles.  Hmmm.  Much has been written on the topic of abusing subclassing, by Herb Sutter, Scott Meyers and others.  In the end, it’s not a viable implementation strategy for extension methods.

Operator, Give Me Long Distance

Interestingly, C++ already comes very close to this capability when defining free binary operator functions, such as ostream & operator<<(ostream & os, string & s).  In this case, the compiler automatically permits a more natural infix syntax for such operators.  When combined with returning the first argument back out of the operator, chained fluent-style programming can be achieved, like so:  cout << str1 << str2 << str3.  The standard library’s IO manipulators go one step farther, defining << operators that take a free function, such as endl, permitting call chains like cout << str1 << endl.

Some (like Boost) have put all of this behavior to use by designating an arbitrary, infrequently-used operator to do the work of invoking arbitrary extension method functors.  In a nutshell, here’s how it looks:

template<typename extension>
string & operator|(string & s, extension method )
{
	method(s);
	return s;
}

function<string(string &)>
Left(int count)
{
	return [=](string & s)
	{
		return s.substr(0, count);
	};
}

void TestExtensionMethod()
{
	string s = "hello world";
	string s2 = s | Left(5);	// "hello"
	cout << s << s2;

	// oops - operator precedence fails - parens needed
	cout << (s | Left(5));
}

It’s testimony to the power of C++, but it quickly breaks down in real-world scenarios.  For example, operator precedence is not natural.  Also, we’re really back where we started.  The IDE doesn’t know we’re invoking a function on the right side of the operator, so it’s silent as the grave.  And without IDE support, there’s no gain in productivity.

The Request

In the end, there doesn’t appear to be a way to achieve extension methods with the language as it stands.  Therefore, I would propose that C++ be extended to permit an alternate calling syntax for any static free function, effectively eliding the first argument and passing it implicitly as the ‘this’ pointer.  The behavior should work closely with ADL rules to determine when and where scoping is necessary.  Of course, this feature would introduce ambiguities which the programmer would need to resolve.  An example that springs to mind immediately would be iterator begin and end methods.  Binding preference should be given to “regular” instance methods, over extension methods.  Unlike C#, I would want to permit calling an extension method from inside another method, again assuming all ambiguities are resolved.  This feature should also fully embrace template support.  Imagine generating families of helper methods on an arbitrary type (either primitive or UDT – it makes no difference).  Now we have something approximating a “mixin”.  Imagine combining that behavior in turn with template-based systems for parsing, serialization, etc.  This would be a powerful feature indeed.

P.S.  After writing this, I stumbled over the D language’s Uniform Function Call Syntax feature.  I think it fits the bill rather nicely!

Template Template Parameter Overloading

As the ISO C++ committee deliberates on what to include in the next iteration of the language, C++14, I’d submit the following for their consideration…

Often, the need arises to decorate one type with another (typically through derivation or some variation on CRTP). Template Template Parameters are a powerful tool for implementing this, as Alexandrescu demonstrates with his scattered and linear hierarchy generation (Modern C++ Design). The problem is that currently, C++ demands a uniform interface for such parameters – they must have identical parameter lists, despite the fact that “regular” templates and run-time functions offer support for both overloading on parameter lists and for defaulting parameter values.  

There does not appear to be a sound reason for this limitation, as all ambiguity would necessarily be resolved at compile time. This is particularly onerous when using third-party templates which make heavy use of parameter defaulting themselves (e.g., collections) as template parameters.  The defaulting nature is not transparently supported by the referencing template.  One possible implementation might be to use variadic templates.  Another would be to use template typedefs (aliases) to “hide” the optional parameters.  These feel more like workarounds.  My preference would be for the language to simply support this capability natively.  C++ may never achieve the homoiconicity of functional languages, with the unification of compile-time and run-time features, but it can take steps to remove arbitrary distinctions.

Generic Lambdas

GenericLambdasWhen C++11 came to a compiler near me, one of the first features I was eager to try was lambdas.  As a fan of highly-structured code, I had often factored out “local functions” in my designs.  Because nested functions are prohibited in C++ (why?), I’d often rely on the technique of declaring a nested struct with a public static (stateless) method.  This achieves the same effect as a local function with a minimum of fuss.  Some libraries (e.g., Boost) provide macros to fabricate local functions, hiding the “ugliness.”  But it’s not that bad, and the struct name can be useful in providing some self-documenting organization.  Also, I tend to prefer keeping non-language subterfuge to a minimum.  It creates a clutter all its own.  But I digress – I was talking about lambdas.

I’ve used lambdas in C# since their inception as part of LINQ (a technological tour de force, by the way).  They permit a “locality of reference” for the programmer, permitting anonymous function object arguments to be implemented at the point of their use.  This reduces both a great deal of boilerplate code, and cognitive load on the programmer.

A Degrading Experience

So, when lambdas debuted in Visual C++, I eagerly put them to the first use I could find, and was… disappointed.  I had in mind to replace some of my nested struct method uses with lambdas, specifically for calling Windows APIs that accept a C callback function.  My scenario was thus aimed squarely at the “captureless lambda degrades to a function pointer” feature of lambdas. I can smile about it now, since this behavior is now available in Visual Studio 2012 .  I mean, how cool is this?

EnumWindows([](HWND, LPARAM){return TRUE;}, 0L);

I’ve since used lambdas in countless contexts and still regard this as my favorite C++11 feature (second only to “auto“).  However, it wasn’t long until I discovered several more basic limitations of lambdas.  Another was lack of support for contravariance.  Now, I understand that C/C++ has never had this, so maybe just seeing it again in the new lambda light has made me wistful.  But I’d really like to be able to do this:

struct ParamBase{};

struct ParamSub : ParamBase{};

void Bar( void(ParamSub & p) );

void CallBar()
{
	Bar([](ParamBase & p)
	{
		// I only care about ParamBase in my lambda,
		// so why can't I use contravariance?
	});
}

Deduction for the Win

Most of the limitations I’ve encountered, however, could be summarized as a lack of generic (polymorphic) support.  In short, there are many situations in which type deduction would make lambdas more compact and readable.  I’m excited that this is getting some attention.  After co-authoring a proposal, Faisal Vali has created a test implementation and made it available here.  I ran across this on the official C++ website, which is calling for comment on Faisal’s work.  As should be expected, the approach is thorough and thoughtful, and I’ll have a hard time waiting for the standards committee to do their thing.  Still, assuming the proposal is adopted essentially without change, C++ haters are gonna have a field day with this syntax:

[]<>(){}

And it won’t be the Lispers.  They’ll love all those delimiters, bringing C++ asymptotically closer to FP enlightenment.  But pray, what will we do when we need the next pair?

A Place for Override, and Every Override in its Place

“I love C++!”, said an enthusiastic job candidate during an interview a year ago.  That candidate has since become one of the best developers I ever hired.  I too love C++, and especially so during its current renaissance.  Prior to C++11, the last time I felt this way was at the 1997 Visual C++ Developers Conference in Orlando.  Here, C++98 improvements like explicit and mutable were unveiled, causing many a developer to run back to their ARM to review “const correctness”, which had more or less been ignored to that point.  If C++98 was evolutionary, C++11 is revolutionary – a tour de force of new features.  Some, like lambdas, are bold forays into the archrival territory of functional programming.  The most profound new feature, from my perspective, is Rvalue References.  These enable move semantics (construction and assignment), which are potentially far more efficient than their copy counterparts.  For a language that lacks access to the compiler’s parse tree, features like Rvalue References permit surprisingly subtle expressions for optimal code generation.

So the C++ language is alive and well, and its priesthood is openly solicitous of new ideas from the community.  I’ll oblige with a series of posts on features I’d like to see added to the language.  Each of these has been born of necessity, when in the throes of a coding session I lamented that I had to once again resort to workarounds.

To kick things off, let’s consider override.   This new keyword (strictly, an “identifier”, but I’ll use “keyword” for simplicity and clarity) allows the programmer to make a distinction between an override of an existing virtual method, and the introduction of a new virtual method.  Previously, the keyword virtual was overloaded for this purpose and often led to errors when virtual dispatching would fail due to a refactoring of base or derived classes.  Let’s see how this could happen.

Initially the code looks like this:

struct Base
{
	virtual void foo()
	{
		cout << "Base::foo called" << endl;
	}
};

struct RiskyDerived : public Base
{
	// "virtual" is optional, making overrides difficult to track down
	void foo()
	{
		cout << "RiskyDerived::foo called" << endl;
	}
};

RiskyDerived derived;
Base & base = derived;
base.foo();	// "RiskyDerived::foo called"

After a hasty refactor, we now have this:

struct Base
{
	virtual void foo(int = 0)
	{
		cout << "Base::foo called" << endl;
	}
};

RiskyDerived derived;
Base & base = derived;
base.foo(); // "Base::foo called"

The binding of BrokenDerived::foo to the vtable entry of Base::foo is now broken, but the code still compiles, resulting in a runtime error (the worst kind).  But with the override keyword in place, we are protected from situations like this because the compiler will issue an error:

struct BetterDerived : public Base
{
	// Error: member function declared with 'override' does not override a base class member
	void foo() override;
};

The problem with override is that it’s only a half solution.  My development team has been bit by the situation above on a number of occasions.  So naturally, when a new keyword like override is introduced, we eagerly embrace it and look for a way to systematically employ it throughout the codebase.  That’s where override breaks down.  In its current implementation, systematic use is not feasible, because the keyword is optional.

What we have today is this behavior:

All uses of the override keyword must in fact override a virtual method. 

What we’re lacking is a way to enforce the contrapositive:

If a virtual method lacks the override keyword, it must not override a virtual method. 

We might term the current behavior “weak” or “permissive”, and my ideal behavior “strong” or “strict”.  In “strict” mode, without the override keyword, a method declaration must either introduce:

  1. A new virtual method (if accompanied by the virtual keyword) or
  2. A new non-virtual method, possibly hiding or overloading another method of the same name.

In short, all virtual method overrides must use the override keyword in “strict” mode.

I understand that a “strict override” would be a breaking change to much existing code.  But that is the whole point:  I would like the compiler to tell me where I need to use override (or not).  Because the virtual keyword is optional when declaring an overridden virtual method, it is a tedious and error prone exercise to manually track down all appropriate locations for override in an existing codebase.   It’s true that some compilers will warn of method hiding, if the signature alone changes.  But if the name of the method changes, the compiler is silent as the grave.

In “strict” mode, on the other hand, our hypothetical compiler could issue a warning such as this:

struct BestDerived : public Base
{
	// Warning: member function overrides a base class member, but is not declared with 'override'
	void foo();
};

Perhaps this is best left as a compiler extension, due to the concern over breaking changes.  But it surprises me that no compiler, to my knowledge, currently offers it.  At a minimum, as a user of Microsoft’s compiler, it seems to me that a third-party tool could be created that scans the Intellisense database to determine “strict override” violations.  Any takers?  Visual Assist team?