Creating a Lazy Sequence of Directory Descendants in F#

It seems to turn into a shoot-out: listing directories is suddenly en-vogue! 🙂

So, Craig Andera posted this code in written in Clojure, which lists some directories lazily, and Chris Sells thought he could do that in C#, too.

I thought these code examples all look rather verbose — in the case of Clojure because in that way rather typical for Java, the APIs are pretty verbose to use, and in the case of C# because of all the syntactic, well, ahem, necessities, as well as the fact that there’s no language feature for integrating nested sequences seamlessly.

Keeping it nice and simple, in F# that example can look like this:

open System.IO

let rec GetDirectoryDescendants(path) =
    seq {
        yield! Directory.GetFiles(path)
        for subdir in Directory.GetDirectories(path) do
            yield! GetDirectoryDescendants(subdir)
        }

For one thing, this code only contains what’s needed, like the Clojure code does — no namespaces, class declarations, loads of curlies, …. nice. Second, the “yield!” statement. There’s also “yield”, without the “!”, in F#, and the difference is that while yield returns a single element for inclusion in the sequence, yield! takes an entire sequence and returns it one by one into the “upper level” sequence.

You could argue that’s the same thing the C# code does as well, and you would be right, in a way. But the F# way is much shorter, more concise, and the code doesn’t break down again to the level of the sequence element. Functional languages like these things — by using map instead of a for(each) loop, you have a well-known pattern to apply, and the reader of your code has to read less to see what exactly is going on. In the same way, by using yield! it is obvious what I’m doing without finding the code inside the foreach loop and confirming that all it does is, again, yield.

7 Comments on Creating a Lazy Sequence of Directory Descendants in F#

  1. The syntactic noise is a valid complaint. However, the yield! argument is much weaker. Much the same thing can be accomplished in C# without the foreach loop by using the familiar map concepts.

    public static IEnumerable<string> GetDirectoryDescendants(string path)
    {
    return Directory.GetFiles(path).Concat(
    Directory.GetDirectories(path).SelectMany(p => GetDirectoryDescendants(p)));
    }

    Like

  2. Well, I don’t agree 🙂 The yield! argument isn’t weak – it’s a matter of using a standardized pattern over implementing the mechanism manually every time it’s needed. Your code is nice and yes, I agree I might even do it like that myself. But it has three disadvantages:

    1. It uses library features as opposed to language ones — arguably the Clojure sample does that as well, but my own post was mainly about how things look more concise in F# due to the language support.

    2. I have to read the entire code and probably think about it for a second to understand what it does. I claim this is easier with the F# solution.

    3. You break the idea of laziness that was part of the original challenge at least partly, because your algorithm gets both the files *and all the subdirectories* in a given path, before returning the first file. Of course one could say that with the help of the standard functions GetFiles and GetDirectories, complete laziness isn’t possible anyway, but Chris’ C# sample as well as mine in F# take it that one step further at least.

    Like

  3. 1. Language vs. library is, IMHO, a bit pointless. The library features are built on top of language features, after all.

    2. That’s simply because you’re familiar with the language. Anyone not familiar with the concept would have a hard enough time understanding yield, much less the yield! variant. In contrast, the LINQ expressions should be fairly understandable to a wider audience (though SelectMany may confuse folks). All of this is dependent on the POV of the reader, and I’m not sure any universal claims can be made.

    3. I didn’t break the idea of laziness. LINQ expressions are lazy. They are built on top of the C# yield statement and thus enjoy lazy evaluation. You obviously don’t know anything about LINQ, but what I posted is functionally equivalent to what Chris posted. My algorithm does NOT get both files *and all subdirectories* in a given path before returning the first file. (BTW, in .NET 4 it’s now possible to lazily enumerate files/directories in a path, so in .NET 4 it’s possible to do this in a completely lazy fashion.)

    Like

  4. My friend… you know, I’m not in the mood to take pointless abuse on my own blog. I reserve the right to define myself what the point was of my own blog post. So regarding (1) and (2), it’s really that simple — my post was about features on the language level, and in my opinion, which may or may not be the same as yours, the yield! feature results in greater readability because it is shorter and it relies on standard patterns. I’m sure if you look around, you will find that I’m not alone with that opinion, and any further comments aren’t going to change it either.

    Regarding your (3) — I know a great deal about LINQ, thank you very much. You are of course generally correct that iterators in C# are lazily evaluated, and LINQ benefits from that. Nevertheless, the way you wrote your algorithm means that both files and directories are listed on each level before the first file for that level is returned — it’s quite obvious if you look at your own code, because both the calls to GetDirectories and GetFiles are necessarily executed before the return statement pushes control back to the caller. If you don’t believe me, try to insert some comments or use helper functions in place of GetFiles and GetDirectories, and you will see.

    Like

  5. Whoa. Calm down. I didn’t abuse you.

    I understood your point. I’m disagreeing with it. The yield! version isn’t shorter than the LINQ version (barring the C# syntax argument that I gave you credit for), and they both use the same standard pattern (aka map).

    As for (3), I have to appologize, because there was a misunderstanding. I read what you wrote to mean that my solution was entirely non-lazy. While GetFiles() and GetDirectories() are not lazy, Concat() and SelectMany() are. So the significance of the fact that my solution must get the first level of files and directories before iterating, as opposed to your (and Chris’) variations that only have to get the first level of files is, to me, a bit of a “who cares” distinction. Especially considering the fact that with .NET 4 my solution can be entirely lazy.

    public static IEnumerable<string> GetDirectoryDescendants(string path)
    {
    return Directory.EnumerateFiles(path).Concat(
    Directory.EnumerateDirectories(path).SelectMany(p => GetDirectoryDescendants(p)));
    }

    Like

  6. Hi,

    nice missunderstanding(?) – but after all what’s the problem?
    You can do the C# ver. with F# or use F#’s Seq.{whatever} functors do get the very same “LINQ”-feeling.

    And of course there is even a better version in C# … because you can write
    […].SelectMany(GetDirectoryDescendants));
    (or at least I guess so because I don’t really want to copy this code as C# offers me no “C#-interactive” 😉 )

    PS: Your “antispam test” might be a bit tricky for younger folks or folks like me that can’t spell the answer to “Stairway to …” right 😛

    Like

  7. I am just learning C# now in a college course. Your articles are great and help me learn. thx

    Like

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s