Monthly Archives: March 2013

Generic Lambdas

GenericLambdasWhen C++11 came to a compiler near me, one of the first features I was eager to try was lambdas.  As a fan of highly-structured code, I had often factored out “local functions” in my designs.  Because nested functions are prohibited in C++ (why?), I’d often rely on the technique of declaring a nested struct with a public static (stateless) method.  This achieves the same effect as a local function with a minimum of fuss.  Some libraries (e.g., Boost) provide macros to fabricate local functions, hiding the “ugliness.”  But it’s not that bad, and the struct name can be useful in providing some self-documenting organization.  Also, I tend to prefer keeping non-language subterfuge to a minimum.  It creates a clutter all its own.  But I digress – I was talking about lambdas.

I’ve used lambdas in C# since their inception as part of LINQ (a technological tour de force, by the way).  They permit a “locality of reference” for the programmer, permitting anonymous function object arguments to be implemented at the point of their use.  This reduces both a great deal of boilerplate code, and cognitive load on the programmer.

A Degrading Experience

So, when lambdas debuted in Visual C++, I eagerly put them to the first use I could find, and was… disappointed.  I had in mind to replace some of my nested struct method uses with lambdas, specifically for calling Windows APIs that accept a C callback function.  My scenario was thus aimed squarely at the “captureless lambda degrades to a function pointer” feature of lambdas. I can smile about it now, since this behavior is now available in Visual Studio 2012 .  I mean, how cool is this?

EnumWindows([](HWND, LPARAM){return TRUE;}, 0L);

I’ve since used lambdas in countless contexts and still regard this as my favorite C++11 feature (second only to “auto“).  However, it wasn’t long until I discovered several more basic limitations of lambdas.  Another was lack of support for contravariance.  Now, I understand that C/C++ has never had this, so maybe just seeing it again in the new lambda light has made me wistful.  But I’d really like to be able to do this:

struct ParamBase{};

struct ParamSub : ParamBase{};

void Bar( void(ParamSub & p) );

void CallBar()
{
	Bar([](ParamBase & p)
	{
		// I only care about ParamBase in my lambda,
		// so why can't I use contravariance?
	});
}

Deduction for the Win

Most of the limitations I’ve encountered, however, could be summarized as a lack of generic (polymorphic) support.  In short, there are many situations in which type deduction would make lambdas more compact and readable.  I’m excited that this is getting some attention.  After co-authoring a proposal, Faisal Vali has created a test implementation and made it available here.  I ran across this on the official C++ website, which is calling for comment on Faisal’s work.  As should be expected, the approach is thorough and thoughtful, and I’ll have a hard time waiting for the standards committee to do their thing.  Still, assuming the proposal is adopted essentially without change, C++ haters are gonna have a field day with this syntax:

[]<>(){}

And it won’t be the Lispers.  They’ll love all those delimiters, bringing C++ asymptotically closer to FP enlightenment.  But pray, what will we do when we need the next pair?

Naked Aggression with the Heap

MP900321198I’ll begin my Idiosyncratic C++ series with a discussion of the heap, its abuses, and its alternatives.

The Heap

The heap does not seem to be well understood among many C++ programmers.   I’m referring to the trade-offs involved with using it, not its underlying implementation.  Heap allocation should be a means of last resort, a necessary evil for getting the job done.   As a general pool of memory available for program use, it is slower and riskier to use than the stack.  And yet, many beginning C++ programmers are enamored with it.  Perhaps it’s a misappropriation of the ‘new’ keyword from managed languages like Java and C# – a conflation of the concepts of object creation and heap allocation.  In any case, it’s often the first tool the programmer reaches for in allocating memory.  And when it is deemed necessary, it is often used both “nakedly” and “aggressively”, leading to unnecessary risks for program safety.

The Stack

By contrast, the stack is the original, automatic, hardware-accelerated memory manager.  It’s impossible to leak stack memory, and difficult to double-delete it (or single-delete it, for that matter).  [The original purpose of the “auto” keyword in C/C++ was to designate an “automatic storage” variable with local block scope, in contrast to the “static” keyword.  But since the language provided “auto” as a default, the keyword found little use and has been re-purposed for type deduction.]  At its foundation, the computer is a stack-based machine.  Functional programmers know this and design their algorithms around it.  But it’s just as true for the object-oriented programmer.  Cultivating a “stack-based discipline” can vastly simplify designs, eliminate whole classes of bugs (heap-based), and improve efficiency and readability.

Aggressive

Many heap allocations are simply not necessary.  Examples include:

  • Locals
    Local variables newed and deleted entirely within a given block scope.   This must certainly be a holdover from managed languages.
  • Members
    Aggregate classes newing members in a constructor and delete-ing them in a destructor.  Sometimes this is necessary if a member does not have a public constructor (s obtained from a factory), is implementing the pimpl pattern, and so on.  However, I have seen many cases where member data is created on the heap with evidently no conscious thought.
  • Collections
    The (mistaken) belief that every collection must be implemented with a heap-based library template.  The corollary to this is that aggregate arrays are evil.  This is probably a misapplication of the good advice not to allocate raw arrays from the heap, preferring standard collections.  But this assumes that dynamic allocation is even needed in the first place.

Naked

Many heap allocations are made directly from client code.  Calls to new/malloc and delete/free are manually paired, and may be arbitrarily distant in code.  The risk of mismanaging these allocations is high.  Leaks can result from exceptions, or multiple function returns, or simply overlooking a terminal branch.  Double-deletes may result from pointer aliasing, or not having clear responsibility for ownership.

Avoiding the Heap

Now that we’ve identified a few risks of heap use, let’s discuss ways to avoid it.

  • Locals
    For local variables, the cure is simple – allocate the object directly on the stack, rather than through a pointer.
  • Members
    The same approach should be used for member data.  This does not guarantee the member will live on the stack.  Instead, it ties the allocation policy of the member to its containing class, so that it may live on the stack or on the heap.
  • Collections
    When dealing with simple, static collections, use an aggregate-initialized array.  There are some restrictions to this technique, when dealing with user-defined types.  However, these can usually be worked around, and should improve as the rules for POD types are revised and loosened.  There is virtually no advantage to dynamically allocating a collection of static (compile-time constant) objects, when a fixed array will do.
  • Algorithms
    Some heap allocations may indicate a flawed algorithm.  An example would be the creation or collection of data at a deeper level in the stack than where it is consumed.  In this case, the data must clearly be placed on the heap so it can “swim upstream”, against the call stack current.  The solution is to invert the pattern and use a more “functional” approach, passing  data down the call stack to where it is needed.  This can be generalized to multiple data flows.
  • Recursion
    Similar to the preceding is to use a recursive approach, over a loop-based approach.  This can be especially helpful in avoiding the creation of short-lived heap-based collections, where an element may depend on the computation of prior elements (e.g., Fibonacci).  A map-reduce pattern can then move the reducing logic to the terminal condition of the recursion.
  • alloca
    Finally, consider that “dynamic allocation” != “heap allocation”.  The CRT function alloca enables fast, safe, dynamic allocation on the stack.  This can be very effective in performance critical situations where many small objects are frequently allocated and freed, such as with 3D mesh processing algorithms.  [Note: there are some challenges to the effective use of alloca and its ilk, malloca and freea.  The scoping requirements are tricky, foiling attempts to wrap calls in an RAII pattern.  But the utility of the function outweighs these concerns.]

Holding the Heap at Arm’s Length

In the event the heap cannot be avoided, it should be “treated with modesty”.  Direct calls to new/delete should be avoided if possible, or their exposure minimized.  Smart pointers should be used systematically and universally, to ensure exception-safe unwind behavior.  Locals and members which require heap allocation (factory-based, pimpl) should be wrapped in smart pointers.

  • For shared (aliased) reference-counted smart pointers, std::shared_ptr  should be used.  The std::make_shared template function should be used to allocate std::shared_ptr objects safely and efficiently.
  • For simple, unshared “singleton” smart pointers, either as class members or local variables, std::unique_ptr should be used to manage heap calls in an exception-safe manner.  As with std::make_shared, std::make_unique may be offered in the future for creating std::unique_ptr objects safely.
  • [Note that with C++11, legacy smart pointer templates, such as auto_ptr, are now considered obsolete.]

In summary, my advice is to use the heap as sparingly and indirectly as possible.  Your peers, testers, and customers will appreciate it.

 

Idiosyncratic C++

Happy St. Patrick’s Day!

Years ago I was in Florida courting a prospective customer.  In one of our initial biz-dev meetings, we were trying to get our arms around the opportunity, and were a bit paralyzed.  Finally, Jack, the customer’s colorful VP of Sales, said “what we need is a dead cat!”.  My partner and I looked at each other blankly.  Jack explained, “A dead cat is a brainstorming technique we use.  Someone puts a lousy idea on the table to get started.  Everyone takes a look at it and says, ‘We don’t know what we want, but we know it’s NOT THAT!’ ”

Sometimes, a good way to understand something is to identify what it is not.  With that technique in mind, I’m going to start a series of posts on idiomatic C++.  In each post, I’ll start with an anti-pattern or two of what Andrei Alexandrescu calls “C+”.  Rather than being idiomatic C++, we might call this style “idiosyncratic C++” (“idiotic C++” is probably too strong).  Many arrive at idiomatic C++ by way of idiosyncratic C++, particularly if they’ve come from other languages like C or Java.  They’ve learned the hard way how to write safe, efficient, maintainable, platform-independent C++ in a conventional style – in short, idiomatically.  Those trench experiences are gold <insert tenuous tie-in with St. Patrick’s Day pot-of-gold image>, and worth exploring and sharing.

My working definition of idiosyncratic C++ is:

Idiosyncratic C++:  Code that a compiler will accept, but a reviewer should not.

In this series, I’ll cover topics such as:

  • Using The Heap
  • Structured Code
  • Eager Evaluation
  • Naming Conventions

I’m open to requests and feedback.  Send them in and stay tuned…