04 August 2009

Review and Analysis of C# Part 1: Partial Definitions are an Anti-pattern

I've had to use C# a lot at work recently. Like everything else, I have very strong opinions about C#. So, I've decided to write an N-part series reviewing, critisicing, and sometimes even praising its design.

So, here's the first installment: Partial Definitions are an Anti-Pattern.

C# introduces the partial keyword. In short, the partial keyword allows a programmer to partially define a class at several places. According to MSDN, the rationale for this feature is:
  • When working on large projects, spreading a class over separate files enables multiple programmers to work on it at the same time.

  • When working with automatically generated source, code can be added to the class without having to recreate the source file. Visual Studio uses this approach when it creates Windows Forms, Web service wrapper code, and so on. You can create code that uses these classes without having to modify the file created by Visual Studio.

The first argument, holds no water. Yes, projects sometimes grow large, and often multiple programmers need to work on it concurrently. But, this problem is better handled by version control software.

To make this argument, let me introduce the notion of a section of code. Let's say that a section a collection of lines of code such that a significant modification to any one requires modification to (or at least a critical evaluation of) the other lines within that section. For example, changing the type of a variable requires you to revisit each use of that variable. For example, changing an invariant of a class requires you to revisit all lines which assume that invariant. It's a loose definition, but all the programmers out there know what I'm talking about.

Now suppose that two or more programmers are trying to modify some large class. At any given moment, either they work on the same sections of code, or they do not. If they work on different sections (which implies they are working on different lines), then common version control software will be able to automatically combine their efforts. On the other hand, if these two programmers are working on the same section of code, the programmers will need to coordinate their efforts---even if they are able to isolate their changes to disjoint lines of code or separate files. Said another way, the ability to split the class into two separate files does nothing to solve the original problem.

The second argument is no stronger. Suppose a code generator produces a large class definition. Further suppose that, across multiple runs of the code generator, much of the output is common. Why then, I ask, would the code generator repeatedly emit redundant code? Wouldn't it make more sense for the code generator to place the common code into a runtime library, or even a base class from which the situation specifics may derive? Just because it is a code generator, there is no excuse to emit code which ignores good software engineering practice.

There may be efficiency concerns pushing towards this style of code generation output. It could be that there is a significant difference in run time. I argue that this is a symptom of a problem elsewhere in the software stack. If two different implementations differ only on a cosmetic level, but the compiler exhibits a drastic performance difference, then it is the compiler's responsibility to decrease that performance difference. In a high level language---and especially in the case of a language stack as complicated as the CLR---these kinds of implementation details should be below the programmer's cognitive radar.

I have argued against the claimed benefits of partial classes, and I'll be happy to argue against any others that people suggest. Let me next argue that partial classes not only give no benefit, but that they also do harm.

The primary goal of software engineering is to create software engineering best practices in which programmers can reason about software in a modular fashion. Classes, for instance, enable us to think about one algorithm at a time, and to ignore all of the other parts of the software. Imagine trying to prove that your implementation of a binary tree is correct if it depends upon the operation of your GUI? Yeah, that's ridiculous, and that's why we write modular programs.

Now suppose that you have written a module so complicated---so multi-faceted---that you believe that some parts of its definition should be expressed separately from the rest. This suggests you believe (1) that your module expresses at least two different aspects of the problem, (2) that a programmer can reason about each independently, and (3) that any attempt to reason about them as a single unit will lead to unnecessary confusion. All of these are reasons to place the second aspect of your module into a separate module, not a separate file. In fact, many design patterns describe interfaces which tackle precisely this sort of fine-grained interaction among modules. The only reason to keep them in one module is programmer laziness. In this case, the partial feature enables the programmer to defer good design; he may get the code to work faster, but may cause headaches for all of the programmers who must maintain the code.

Additionally, as a programmer, I find it valuable to see the entire definition of an algorithm (feel free to disagree; let me know why in the comments). I don't necessarily look at the entire definition, but I think it is critical to be able to find any code which contradicts my understanding of code. In this case, a single definition (or at least a single file) enables me to quickly gather all factors which contribute to the operation of an algorithm. From my experience with C#, partial class definitions make it harder to gather all factors, and make it difficult to be sure I have found all of them.

Virtual methods might cause this problem as well. However, virtual methods are more restricted. First, they are marked virtual. Second, I know immediately where to find refinements of virtual methods: lower in the inheritance tree. Partial definitions have an arbitrary number of components---they can be anywhere provided they are combined at compile time.

Some may counter that these arguments all suggest a feature-rich integrated development environment. Sure. I'm in favor of good tools. But (1) I appreciate languages which don't require IDE support to read, because sometimes you read code on a website, a book, or other media, and (2) even the best of the best tools don't yet support this style of code browsing. I've been using Visual Studio 2008 at work, and it can't even list all of the subclasses of a base class for me! Enumerating all parts of a partial definition is equally challenging.

My recommendation: avoid the partial keyword, and instead decompose your software into modules sensibly.

1 comment:

Dennis Ferron said...

In regards to your first argument, Microsoft, at least at the time partial classes were introduced, was still stuck in the boneheaded mindset of source control by exclusive checkout of files. So problem #1 is they introduced a language feature to cover a bad design of their source control tools; problem #2 is they introduced it also to cover for a bad code generation design.

It's part of an "out of sight, out of mind" fallacy that Microsoft falls into time and a again: they look at things that really are simple and see that simple things have an austere interface, and then try to make their overly complex things "simple" by hiding the complexity under a rug. But they're mistaking the effect for the cause, trying to emulate the effects of simplicity without actually making anything simpler at all. You can see the effect in the way that every redesign of the look and feel of Office or Media player just hides important menu items without actually making them unneeded, or in the way that every Windows programming advance merely hides the complexity in the software stack underneath it.