As a continuation of my forays in interesting and less industrially-oriented
programming languages, I decided to compare
F Sharp
against Clojure for a relatively simple programming
problem, and to compare how the two felt in terms of programming ease,
friendliness, and how they each viewed the problem.
The Problem
The problem is a relatively one from
Reddit’s “Daily Programmer” subreddit, called
JSON Treasure Hunt:
given a random, unstructured JSON object, traverse the object looking for
a specific terminal value (in this case, a string “dailyprogrammer”).
This is, of course, an unstructured tree traversal question.
Anyone who’s taken a basic data structures or algorithms course can immediately
recognize that this problem lends itself naturally to a recursive solution.
(Of course, it’s possible to traverse trees using stacks or queues, but where’s
the fun in that?)
What makes this problem interesting is that the tree, as valid JSON, can
be of any JSON type:
- a number (whether integers are supported is language-specific, but both
of our languages in question do support integers)
- a string
- a boolean (true/false)
- null
- a list of any valid JSON type
- an object mapping any of the first four types to any valid JSON type
Because the JSON is unstructured, it’s not possible to know the types of each
value in advance, other than that it can be any of these. Therefore, we ideally
need a recursive way to dispatch based upon type.
There is a minor complication here because we need to maintain a path to the specific value in question.
This is tricky because the path can consist of both strings and integers:
- Any object on the way to the specified value is generally keyed by a string
- Any array on the way to the specified value is keyed by an integer
We will need a return type that can represent both strings and integers in
a collection seamlessly.
F Sharp Solution
Although I didn’t write the F Sharp solution until after I had already written
the Clojure solution, I actually found it easier to model this problem using
F Sharp’s native data types. Therefore, I’ll begin with my treatment of F Sharp.
When I first saw this problem, I immediately thought of how I could represent
it in a typed language like F Sharp. F Sharp, like other statically typed
functional languages, supports
discriminated unions,
which allow flexibility of typing in a constrained and deterministic manner.
As it happens, the
F Sharp Data
library already implements this type.
Given a compound type for each of the valid JSON types, I can
pattern match
each of the types in turn to ensure that every type of JSON data is
handled appropriately.
To solve the earlier problem of needing to represent both strings and integers
in our return type (our solution), we can write our own simple union type:
type TreasurePathCrumb =
| Index of int
| Key of string
This crumb type will help us remember the path to the specified value
(although, since this is a recursive solution, it technically would be more
accurate to say it will help us to remember the path from the specified
value).
Now, once we find the value in question, we can simply begin returning values
by wrapping them in this union type. When we’ve captured the sequence of union
types in our main function, we can just unpack them and convert them to strings:
let buildTreasurePathString (path:TreasurePathCrumb list) =
let crumbToString crumb =
match crumb with
| TreasurePathCrumb.Index i -> string i
| TreasurePathCrumb.Key k -> k
let stringCrumbs = List.map crumbToString path
stringCrumbs |> String.concat " -> "
This function preserves strings while converting the integers to strings,
so that we can cleanly concatenate all the values at once.
(I’m not fluent in F#, so this may be completely unnecessary, but it’s
also a pleasure how straightforward this is to do!)
However, there’s still one unresolved type question: what do we do when we
don’t find a value? This is obviously going to be happening far more often than
finding the specified value.
As it happens, F# (and its brethren) have an answer to this as well:
the Option type:
// Taken from http://fsharpforfunandprofit.com/posts/the-option-type/
type Option<'a> = // use a generic definition
| Some of 'a // valid value
| None // missing
An option type responsibly answers the question, “What if there is not
necessarily a solution to this question (function)?” In our case,
the data type [1, 2, null, false, "hi"]
does not contain the value
"dailyprogrammer"
. If our function was passed this array, it would return
the None
type. On the other hand, if it was passed
[1, 2, null, false, "dailyprogrammer"]
, we could return a
Some(TreasurePathCrumb.Index(4))
!
With all these pieces in place, we now know which types our recursive solution
requires. We just need to write the recursive function which can do a match
on the terminal types and can recur on the collection types. For clarity,
I split the function on the collection types out into their own functions.
(I also implemented my own Option type, because I forgot F# has it built in.)
type TreasurePathCrumb =
| Index of int
| Key of string
type HuntResult =
| Null
| TreasurePath of TreasurePathCrumb list
let rec findMapTreasure (json:(string * JsonValue) list) =
match json with
| [] -> HuntResult.Null
| (key, value) :: rest ->
match findTreasure value with
| HuntResult.Null -> findMapTreasure rest
| HuntResult.TreasurePath (path) -> HuntResult.TreasurePath (TreasurePathCrumb.Key(key)::path)
and findListTreasure (list:JsonValue list) (index:int) =
match list with
| [] -> HuntResult.Null
| first :: rest ->
match findTreasure first with
| HuntResult.Null -> findListTreasure rest (1 + index)
| HuntResult.TreasurePath (path) -> HuntResult.TreasurePath (TreasurePathCrumb.Index(index)::path)
and findTreasure (json:JsonValue) =
match json with
| JsonValue.Null -> HuntResult.Null
| JsonValue.Boolean b -> HuntResult.Null
| JsonValue.Float f -> HuntResult.Null
| JsonValue.Number n -> HuntResult.Null
| JsonValue.String s ->
if s = "dailyprogrammer" then HuntResult.TreasurePath([])
else HuntResult.Null
| JsonValue.Array a -> findListTreasure (Array.toList a) 0
| JsonValue.Record map -> findMapTreasure (Array.toList map)
The findTreasure
function is trivial to understand: if given a terminal
type, return the option, and if given a compound type, refer to the
appropriate function implementation for more information.
findMapTreasure
and findListTreasure
work with the same three modes:
- If the collection is empty, the value isn’t here, and return None.
- If the value of the next item in the collection isn’t the treasure,
recur with the rest of the collection
- If the value of the next item is the treasure, return it! (With a
crumb, of course.)
Clojure Solution
As mentioned above, I needed to meditate on the types of this solution like I
would with F# before I could even begin to think about implementing it in
Clojure. This was true even though I wrote the Clojure solution first!
Although Clojure doesn’t require being as contractually formal with types as
F# does, it doesn’t obviate the need for clearly establishing the types
relationships. (Maybe this is part of what Rich Hickey refers to as
Hammock-driven development?)
Ultimately, however, I found myself confronting the following problems with the
Clojure implementation:
- How do I accurately capture what type(s) the return type can be?
- If I can’t pattern match on union types, what other ways can I pattern match on each of the 6 types?
- Since Clojure doesn’t have (effective) mutual recursion,
how can I have all recur calls jump back to the same function?
For anyone who has programmed in a dynamically typed language before, the
solution to question 1 is trivial: don’t express the types,
express the values, and just make sure you can handle the types correctly
wherever you catch them.
Okay, so we just return e.g. a list of either integers or strings. That’s not
too bad at all. But how should we solve questions 2 and 3?
As it turns out, Clojure has a solution for both of those, and I didn’t believe
it would work until I tried it myself!
Although Clojure doesn’t implement classes, it does implement
something akin to methods, using
a clever technique dubbed protocols.
Protocols essentially permit dynamically defining methods on object types,
even if you don’t have the source for those object types, and even if they
aren’t aware of your code.
(In other words, they solve the expression problem.)
Protocols consist of two parts:
- Define the protocol, analogous to defining an interface.
- Implement the protocol for a specific type. This is like implementing an
interface, with the important distinction that it allows the implementation
retroactively.
(defprotocol ContainsTreasure
(find-treasure [obj]))
(extend-protocol ContainsTreasure
nil
(find-treasure [obj]
false)
java.lang.Integer
(find-treasure [obj]
false)
java.lang.Boolean
(find-treasure [obj]
false)
java.lang.Double
(find-treasure [obj]
false)
java.lang.String [obj]
(= "dailyprogrammer" obj)))
Protocols solve our earlier problem 2 using traditional polymorphism. We can
simply implement the method for each of the 6 types
(although only the terminal types are shown here), and we will have the
appropriate type-based dispatching. Pretty cool!
I was surprised to discover that, in this case, protocols also permit
self-recursion! Because the interface of the protocol was uniform across
all 6 types, each protocol implementation was able to dive back into
the same function defined on another type:
(extend-protocol ContainsTreasure
; iterate over vector, tracking index, and see if any values
; recursively satisfy the value check. If so, return the accumulated
; path back.
clojure.lang.IPersistentVector
(find-treasure [obj]
(loop [i 0
elems (seq obj)]
(let [f (first elems)
treasure (find-treasure f)]
(cond
treasure (if (seq? treasure)
(cons i treasure)
(list i))
(nil? (next elems)) false
:else (recur (inc i) (rest elems))))))
; iterate over map, tracking keys, and see if any values
; recursively satisfy the value check.
clojure.lang.IPersistentMap
(find-treasure [obj]
(loop [kvpairs (seq obj)]
(let [kv (first kvpairs)
treasure (find-treasure (second kv))]
(cond
(nil? kv) false
treasure (if (seq? treasure)
(cons (first kv) treasure)
(list (first kv)))
:else (recur (rest kvpairs)))))))
There are many keywords here, which perhaps make this a bit tricker to follow
than the equivalent F# function, but they both do the same action.
- Store the next item in the sequence in
f
/kvpairs
. Check whether it’s
the specified value. - If it is, then return the crumb to the value.
- If it is not, keep recurring.
Comparison of Solutions
In my unscientific ad-hoc benchmarking, the Clojure solution ran on the hardest
data set in about 140ms in the REPL (Ubuntu), while the F# solution ran in
about 200ms in F# Interactive (Windows 7).
I was actually surprised by how much quicker Clojure seemed, although I’m
probably missing several optimizations in each language.
Feel of Languages
F# definitely felt more natural to express this problem. I think its powerful
and flexible static type system allows for expressing any desired combination
of input and output types, and having confidence that they work consistently.
(In contrast, I spent a few hours working out kinks in the recursion of the
Clojure code, necessitated by not having a compiler tracing the types of the
parameters through the code execution.) Union types worked to great effect
here, and the recursive nature of the problem really lent itself to the
functional paradigm.
On the other hand, I was thoroughly impressed by how well Clojure used
protocol-based polymorphism to handle this trivial case of seemingly
heterogeneous polymorphism. Although I do suspect that a more complex
data model would require Clojure to start implementing dynamic dispatch
using lookup tables (which seems somewhat gross to me), I would not
rule Clojure out just for that. (After all, what does C do?)
Type dispatching aside, I found that the Clojure code was more of a joy
to write when implementing scaffolding code around the core algorithm.
This seems reasonable, as I don’t know many people who like wrestling
with types when reading or writing files (unless they really do need
to catch all sorts of exceptions). I didn’t find Clojure to be as strong
at expressing the actual algorithm as I found F#, as mentioned above.
My takeaway from this would probably be that, depending on whether
flexible and rapid system design or strong algorithmic modeling is more
important, Clojure or F# may be a more natural fit.
Do you have any thoughts on which language you prefer? Leave your
thoughts below!
References
My F# solution
My Clojure solution