Oliver Sturm's Blog - F# compiler considered too linear

Editor’s note: In case you’re reading this in 2019, like I recently did, it should be pointed out that this opinion piece was written in the early days of F#. The language has since evolved in a variety of ways, which is not to say that the problems described below are necessarily solved today — but the position of F# in the landscape of .NET languages is now clearer than it was and I don’t agree anymore with the proposals I made in 2008.

In my continuing efforts to make XPO work fully with F#, I found the next problem to deal with: the extremely linear way of thinking of the F# compiler.

Basically, the compiler seems to read each source code file from top to bottom. Generally, things that are defined below the current line can’t be referred to in the current line. Apart from the order of lines in source code files, the order of source code files in the project is equally important, since the compiler handles them in precisely that order.

Hint for Visual Studio users: While the order of source code files in the project is represented correctly in the Visual Studio Solution Explorer, it can’t be changed from there. Instead, it is necessary to edit the project file and swap source code files around manually. Right-click the project in Visual Studio and select “Unload Project” from the context menu. Right-click the project again and select “Edit <projectname>.fsharpp”. When you’re done making your changes, use the context menu a third time to select “Reload Project”. You will see that the order of files in Solution Explorer has changed according to the changes you made in the project file.

In some schools of though on programming, this linear view of things might seem quite normal, but in .NET it is not. In main-stream .NET programming, it was widely regarded a great innovation when C# took care of the old problems that C and C++ had in that regard (no more header files!!!). Pascal wasn’t any better, and many other languages — most of them had some sort of “pre-declaration” feature that had to be used when, for instance, a reference to a certain type needed to be created before the type itself was declared.

Nothing like that in C# — the compiler looks at all the types and namespaces declared somewhere in my current project and figures things out for me. Great, that’s how it should be. In all fairness, in F# there’s at least one very obvious reason why the compiler takes that linear approach: type augmentations. They basically mean that depending on the position in code, a class might have a certain member or not. If you’re not familiar with the feature, look at this example:

type MyClass() =
    let answer = 42

let mc1 = new MyClass()

// At this point Output can't be found on mc1
//mc1.Output()

let output x = printfn "%d" x

type MyClass with
    member x.Output() = output 52

let mc2 = new MyClass()
mc2.Output()

// Now Output is part of MyClass, so I can even call
// it on my "old" instance
mc1.Output()

Before I get to my particular use case — just generally speaking, the ordering requirements introduced by linear compiling seem like a great and quite unnecessary hassle in the vast majority of cases. F# has a very strong type inference system, because it is deemed to be unnecessary for the developer to mark all types explicitly in order to implement strong typing. In the same way, the compiler could automatically find types and namespaces in my current project regardless of their location, and it could detect those cases where types change through augmentation.

The particular case I’m dealing with is that of persistent business class hierarchies. These hierarchies are typically interrelated to the point where one or more networks of classes are formed. As an example, consider modelling a hospital. You’d have a whole bunch of different types of people to store, so you’d have classes for People and Addresses, Employees, which might be Nurses, Doctors and cleaning and housekeeping personnel, Patients with relationships to the Nurses and Doctors, Rooms, Floors and OperatingTheatres which are assigned to Doctors or Teams of Doctors. Visitors, CarParks, the whole Accounting and Booking business… the list is endless. It is quite clear that many of these types have references to many other types, and typically a one-to-many relationship is modelled with a collection property on one end and a simple reference on the other end, so as soon as there’s a relationship there, it will result in two classes interrelating.

Sure, not all classes interrelate, so it might be possible, taking a lot of time and great care, to separate the classes into groups that are hopelessly tangled, but have only unidirectional references outside the group. Of course it might make sense in the example above for almost all classes to have a reference to the Hospital type, since that is important if there’s ever more than one hospital being handled at once. There might be other such “special”, high-level objects that make the grouping approach really complicated. In any case the task of sorting the classes into such groups is extremely tedious and the grouping breaks easily, as soon as any class is changed to include or exclude a property that refers to another class.

You might wonder why I’m going on about the grouping thing at all — well, read on, that’s what F# wants me to do. Have I mentioned that persistent business class hierarchies can be large? Apart from having private fields and public members for each and every piece of data that is associated with the various entities, the classes will typically also contain certain parts of business logic functionality. Depending on the architectural approach that is used, validation logic might live in these classes, as well as a lot of the state handling that many entities need. To mention some numbers, a C# project I’ve worked on myself — really just a medium size application — has 75 persistent classes and a total of 11263 lines of code in these classes.

Now, why am I going on about these interrelated networks of classes? Quite simple: because F# requires me to declare all interrelated classes in one block of code! Yes, that’s right. I can’t put some of the classes into other files. I can’t put them in different namespaces. The only valid syntax to declare interrelated classes in F# is this:

type ClassA() =
    let foo = new ClassB()

and
 ClassB() =
    let foo = new ClassA()

As you can see, this uses the and keyword to concatenate the two type declarations. This doesn’t hold true for classes only, but for all types. One of my first thoughts about this was that it shouldn’t be that much of a problem if my application made use of lots of interfaces and dependency injection throughout to remove the need for direct references from one class to another. But in the end this approach only shifts the problem to the interfaces — at a rough count, that class hierarchy from my old project would require me to declare 75 interfaces with 520 properties and around 300 other members. For those declarations, the problem is still the same, and while the volume may be smaller, it’s still significant. Plus, of course, it requires my application architecture to work in a very specific way, I need to create all those interfaces for no real reason whatsoever, … doesn’t sound like a very good idea.

In the end I don’t think that this problem is entirely particular to my use case. In other class hierarchies, dependencies might typically be somewhat more linear than they are in those hierarchies I’ve described, but interrelations are still rather common. So here are the important points I want to make:

For this particular use case, we need a change that allows us to declare interrelated classes separately. There’s a very similar problem for namespaces — perhaps not quite as severe, but that’s just because there aren’t going to be as many namespaces as there are classes. To solve these issues, I guess a “pre-declaration” feature like I described above could do the job (perhaps as an attribute), but what we really need is … see (2).
The F# compiler should handle all type resolution matters automatically, independent of declaration order, apart from those cases where order is important due to type augmentation. It is my belief that the compiler could detect such “significant-order” cases automatically, so there shouldn’t be a need for any new keywords or decorations to make this work. This intelligent implementation is what I expect from a language compiler in the year 2008, and with the ambitions F# has as a multi-paradigm language, we should expect no less.