June 18, 2017

Clojure and the Esoteric Mysteries of Namespaces

15 minute read


If you’ve ever been programming in Clojure and encountered an error which looks something like, IllegalStateException("Can't change/establish root binding of: *ns* with set"), read on!

Preface

I recently had the drive/opportunity to deep-dive on how Clojure’s namespaces function and how they provide a simple abstraction using the concept of Clojure’s “Vars”. Here is a deep-dive on how they work. This is a two-part series. The previous part of the series is available at Clojure and the Esoteric Mysteries of Vars.

A a fair warning, this requires a far bit of gory Clojure compiler internals to really understand. I’m going to attempt to walk through the relevant bits, but it may not make much sense without also reading the relevant portions of source code. Thankfully, Lisps are simple to understand, so it should only take a few hours (instead of days or weeks). However, if you just want the short version/the spoiler, here’s the TL;DR:

;; In Clojure, this works:
(ns namespace-1)
(println *ns*)
;; #namespace[namespace-1]

;; This also works
(in-ns 'namespace-2)
(clojure.core/println clojure.core/*ns*)
;; #namespace[namespace-2]
;; you could use
(refer-clojure)
;; to use the shorthand syntax available in the previous example

;; However, this fails miserably at runtime, and there is no good documentation
;; available which explains why this is so:
(defn -main
  [& args]
  (in-ns 'namespace-3)
  (refer-clojure)
  (println *ns*))
;; Will shout and complain that you can't set a Var that's not locally bound

What’s in a Name?

Since the earliest days of Computer Science, programmers have struggled with confining subsets of their routines into global compartments, to both minimize the cognitive load of juggling a multitude of routines and variable names, as well as to differentiate between public and private routines (API’s). Of course, to a certain extent, these distinctions are indistinct to a computer.

  • In binary or assembly languages, all symbols are global and essentially public.
  • In C, which is itself a low-level abstraction over common computer hardware (but without requiring writing to specific CPU instructions), all variables and routines are still essentially global. There is, however, a limited concept of privacy in that a library developer need only expose certain routines and data structures.

Since then, different approaches have been taken to try and tackle the concept of compartmentalizing units of code. C++ bolted-on namespaces a few year after its release, but they are primarily oriented towards avoiding a global quagmire of potentially-conflicting names. (During compilation all namespace distinctions are essentially erased). Java (and I assume C#) have something similar to namespaces in that “packages” uniquely and globally distinguish names (similarly to C++). Although Java itself does not provide an easy facility for first-class manipulation of namespaces, there are ways of discovering things about them. This is not the same as the language/platform itself providing that facility.

Clojure (and languages like it) provide first-class namespace support. There are two distinct aspects to namespaces in Clojure:

  1. Namespaces are globally-available Clojure objects which both contain as well as name public (as well as private) objects. Accessing a Clojure namespace is as simple as requesting it from the runtime. There is not necessarily any relationship between Clojure namespaces and files of Clojure source code, although for practical purposes they’re usually kept one-to-one.
  2. Namespaces, in particular through the “current” namespace, are used to write programs without fully qualifying references. When the compiler sees references which are not fully-qualified, it will fall back upon the “current” namespace to resolve fully-qualified Var references. This (like in other languages e.g. Java/C++) is not a strict necessity, but it is ubiquitous in practice, and has some startling implications due to Clojure’s highly dynamic nature.

Namespaces as Maps

At its simplest, namespaces are global lookup-maps within Clojure’s runtime which provide a level of indirection similar to that of a filesystem:

  • Namespaces operate as maps of namespace-names to namespace-objects. The names just uniquely identify the namespace objects.
  • Namespace objects are containers for mapping names of Vars to Vars. The indicated Var instances need not always be stored in the namespace from which they are found; it is common to alias some Var objects from multiple namespaces, especially the functions from within clojure.core (which is the core library of Clojure).

Much like a filesystem can have multiple distinct fully-qualified filenames which ultimately reference the same files through the use of symbolic links, Clojure allows aliases such that multiple fully-qualified references name the same Var. Unlike file systems, Clojure Var objects generally know within which namespace they were originally bound.

To borrow my samples from the previous article, let’s create a Var named my-variable within the user namespace.

user=> (def my-variable 5)
#'user/my-variable

The fully qualified name for this Var is #'user/my-variable, which I previously explained means that the Var knows it is named my-variable and that it is rooted in the user namespace.

If we were to then switch namespaces and reference the variable, the reference would become locally available in an unqualified manner:

user=> (ns 'other-ns)
nil
other-ns=> (refer 'user :only '[my-variable])
nil
other-ns=> user/my-variable
5
other-ns=> my-variable
5
other-ns=> #'my-variable
#'user/my-variable

We do not need the fully-qualified name if we choose to omit it (and in fact this can enable certain programming patterns by dynamically replacing certain utilities with wrappers for convenience/performance). As already described, the Var is not fooled – it knows where it is bound. (Compare this to, say, Python, which does not know where variables actually live, and in which is not conventionally possible to globally change the definition of all uses of a certain import retroactively.)

Namespaces as Compilation Contexts

It is traditional in most programming languages which support local references to omit the fully (namespace) qualified names for variables. As indicated above, one would omit the fully qualified name user/my-variable and instead simply use my-variable in scope. For performance reasons, however, Clojure does not look up the reference to a Var every single time a function is called or a variable is referenced. Instead, generally speaking, whenever any Clojure code is defined, all unqualified references will be resolved to their fully-qualified references, and those references will be embedded into the compiled code. In this way, a program can change the root binding using the already-discussed alter-var-root function and it will be seen globally by all code using that Var, because the references are fully-resolved by the compiler when parsing code. How does this occur?

Compilation as Batched Code Definitions

Within any Lisp, not just Clojure, the distinction between code which is compiled in advance of program execution, and code which is interpreted (and then compiled) on the fly, is blurry at best. These runtimes typically bootstrap a core language implementation and then successively load and compile units of code until an entire program has been defined; these runtimes are then coerced to dump a copy of their working memory to some sort of file (see unexec for Emacs/Emacs Lisp or images for Smalltalk). Indeed, the fundamental property of systems like Lisps and Smalltalk (which was inspired by Lisp) is that the “final” program is one where it is no longer necessary to define additional routines. As such, some programs (*cough* Emacs) are never considered finished, because more functionality can be added in at any time.

Thus, the process of adding new functionality (code) to a Lisp (in this case Clojure) is mostly indistinguishable from compiling code in advance, and restoring the compiled code ex-post facto.

Code Loading Occurs Within Namespaces

We already discussed above that all Vars live within namespaces. How does Clojure decide into which namespace to assign a newly defined Var? Although the machinery is available such that a programmer can attempt to directly create and intern a Var into an arbitrary namespace, it is far more common to simply create unspecified “definitions” which default to the current namespace. When authoring Clojure source code, developers will create new namespaces and add functionality into them until complete, and then repeat:

  1. Enter a namespace and import the relevant Clojure machinery. This is typically done in source code using the ns macro, which creates and enters that namespace, and then optionally does things like copy Vars from other namespaces into the current namespace. (As already discussed, there’s not too much overhead from this, because all the references point to the same Var object.)
  2. Declare and define functions and objects, without explicitly declaring the namespace. The namespace within which the definitions occur magically becomes the namespace within which those variables are interned.

What is the “Current Namespace”?

Ay, there’s the rub! The current namespace in Clojure is stored in the Var#'clojure.core/*ns*”. (The book-ended asterisks in Clojure are called “earmuffs”, and imply (but do not promise) that the thusly-named Var is dynamic.) As already discussed, Vars usually have a single, global root, and can optionally have thread-local bindings. When the current namespace is changed through the use of the ns macro or the in-ns function, the value of *ns* is altered. Is this done by swapping the Var root, or by changing a thread-local binding?

To answer this question, I had to dive deep into the source code for the Clojure runtime (which is distinct from the clojure.core library, although clojure.core surfaces most of the runtime through public API’s). Let’s take a tour.

Bootstrapping the Clojure Runtime

First, let’s poke around the Clojure runtime (which is written in Java for the canonical implementation). Looking in the initial declarations of clojure.lang.RT, we see that *ns* is secretly defined in Java and declared as dynamic Var. It also happens to have a default value of clojure.core. This means that the global root of *ns* everywhere is really clojure.core. But how do we enter new namespaces if *ns* is globally clojure.core?

Enter in-ns, the Clojure function which changes the current namespace, *ns*. The source for this function is just a few lines further down in the same file! in-ns in Clojure is really a Var (not dynamic) in Java which references the Java function inNamespace, which in turn calls .set(ns) on the CURRENT_NS Var. To see how this works, we must read the source in turn for Var in clojure.lang.Var.

How Does Setting a Var Work?

Calling .set(...) on a Var happens here. This function checks for a “thread binding” and throws an exception if one cannot be found. The thread binding is defined here and here and here. What this essentially says is, “If and only if a thread-local binding already exists for this variable in the current thread, perform the set operation, otherwise you are attempting to make a thread-local modification to a global Var and that’s forbidden so have an exception.”

The Current Namespace is a Phantasm

This is where things get to be a bit head-spinning, but also very cool and powerful. The current namespace, at least globally, is always just clojure.core! (Of course you could monkey with the root binding of the Var, but I would not recommend that, given this information.) Whenever a Clojure source program claims to be changing the namespace, it’s really just changing the current namespace within the current thread of execution of the program!. In other words, the current namespace is illusory, a convenient fiction maintained by large parts of the Clojure runtime for expedience and convenience.

“How has the wool been pulled over our eyes this entire time!?” you and I both ask. In order for this to happen, everywhere we might have believed that we were changing the namespace globally, we must have only been permuting it within a single thread. The two main places this illusion arises are:

  1. During REPL use.
  2. During compilation.

I will show that in both of these locations, the Clojure runtime actually creates a thread-local binding for *ns*, effectively isolating the global namespace for the purpose of defining new code.

The Clojure REPL Binds the Namespace

There’s not much to say here. The relevant lines of the Clojure REPl source code are defined here. The interactive Read-Evaluate-Print-Loop with which we are so familiar is actually concealing a thread-local binding (override) to *ns* under the hood. Because the binding underpins the entire REPL, within the REPL one could be forgiven for thinking that the current namespace actually exists. In reality, it exists only within that thread. It’s fairly uncommon to start spawning off new threads from the REPL which attempt to read the current namespace, so this could easily be missed. Here’s a nice counterexample showing that the current namespace is not as real as you might think.

The Clojure Compiler Binds the Namespace

I’ll be honest, I’m not yet quite adept at reading compiler source code, even for a Lisp. However, it’s fairly easy to spot what we’re looking for, now that we know what that is. Compilation happens inside this large compile function. Notice that, at the top of the function, it creates a thread-local binding for #'clojure.core/*ns*, based on the current value. (Tracking exactly where in the compiler it evaluates the in-ns function is a bit tricky. It appears to do that while it’s parsing and emitting bytecode, but I haven’t gotten that far into the code base.)

Implications

The fact that the machinery for creating namespaces, defining Vars, and sticking those Vars inside namespaces uses thread-local bindings means that, for the most part, Clojure code can be added at runtime from any number of threads, relatively safety. (Relatively is the operative term – if different threads are trying to load data with the same name, and set them as root bindings, trampling can and probably will still occur. See this ancient thread with Rich Hickey about the lack of safety of changing root bindings dynamically.)

Although this may seem strange, it’s actually quite liberating. There is nothing particularly special about the Clojure compiler or REPL; they just happen to have the local bindings set up correctly. If you need to do runtime code loading (via eval or the like), you could similarly set up new namespaces for that code. (Technically Clojure does not guard against malicious actors, so custom classloaders may be needed if you’re loading code from an untrusted party.)

Bringing it All Together

To save you some scrolling, I’ll repeat my example from above, down here:

;; In Clojure, this works:
(ns namespace-1)
(println *ns*)
;; #namespace[namespace-1]

;; This also works
(in-ns 'namespace-2)
(clojure.core/println clojure.core/*ns*)
;; #namespace[namespace-2]
;; you could use
(refer-clojure)
;; to use the shorthand syntax available in the previous example

;; However, this fails miserably at runtime, and there is no good documentation
;; available which explains why this is so:
(defn -main
  [& args]
  (in-ns 'namespace-3)
  (refer-clojure)
  (println *ns*))
;; Will shout and complain that you can't set a Var that's not locally bound

Why is it that the first two examples work (during compilation and the REPL) and the last one does not (at runtime)? I’ve already hinted why above, but I’ll give the long-form explanation.

In the interest of efficiency and expediency, the compiler and the REPL both pretend to be within an interactive user session, and allow the global *ns* variable to be manipulated fairly freely. Every def and defn call (e.g. every declaration of a global variable, whether that variable is a function or not) translates into something like, “Take this object and shove it into the named box within the compartment named by *ns*.” In order to ensure that it’s easy to assign those boxes into those compartments, the compiler and REPL both take pains to ensure that *ns* looks like what a user would expect it to, such that compilation proceeds in an orderly fashion.

This does not hold for the Clojure runtime itself! When someone calls a -main function in Clojure, there is no magic call of with-bindings like those which occur in clojure.main and clojure.lang.RT to allow *ns* to be freely tweaked. As such, this will fail with the error IllegalStateException("Can't change/establish root binding of: *ns* with set"). Fuzzy and unclear before, it is now obvious: before, the compiler and REPL were root-binding *ns*, so taht we would never see this error! Now, we’re on our own, and we’ll be blindsided by a fastball if we aren’t either deeply engrossed in the language runtime fundamentals, or previously warned of this quirk!

Should Clojure be nicer about this?

After I explained all of the above to a colleague at work (related to this Github issue), he was justifiably upset. He submitted a ticket on the Clojure mailing list and eventually received a response from Alex Miller himself. (I chimed in toward the end but the conversation was essentially concluded by the time I added my two cents.)

The debate essentially goes as follows:

Clojure User: Since *ns* and in-ns work consistently during compilation and in the REPL, they aught to behave the same way even at runtime. To do so is to violate the principle of least surprise. Please provide a shim to the runtime such that even when launching gen-classed Clojure from the Java command line, or when invoking the runtime via the Java API, *ns* and in-ns will have received the same treatment as they get in other contexts.

Clojure Maintainer: There isn’t any canonical or deliberately consistent behavior of *ns* or in-ns in the manner you perceive. The compiler happens to work the way it does. clojure.main/Leiningen/Boot are by no means the canonical implementations; they provide the same setup for the convenience of the REPL, and happen to have chosen consistent conventions, but for us to impose those conventions unilaterally would be to take an actual stance and dictate that there is a canonical implementation.

Clojure User: It’s apparent that the community has consolidated around a canonical implementation, and to deny that is to consign developers to stumble upon this every few months, until this happens again. Although theoretically it was not necessary to provide this shim to achieve the language runtime, it is needed to avoid shocking users.

Clojure Maintainer: Sure, open a ticket!

Clojure User: Done.

Update

After the initial publication of this post, I got some push backs from friends and colleagues that it did not sufficiently motivate why namespaces have counter-intuitive behavior and what can be done about it. Therefore I’ve taken the approach (from which I normally refrain) of updating this post with more information and content, specifically the sections on “Bringing it All Together”, “Should Clojure be nicer about this?”, and the TL;DR at the top.

References

© Jeff Rabinowitz, 2019