From Impure to Pure Code

By Tommi Kaikkonen in 2017

How do purely functional languages facilitate inherently impure logic?

This article answers that question by listing common impure code constructs and their pure counterparts.

It's written in JavaScript (ES6), but the ideas are generally applicable to languages with first class functions and closures.

I'm not going to delve too deeply into when these patterns should be used, which largely depends on context. Sometimes refactoring to pure code may make it easier to test, reason about, and understand. Other times it may make it slower, harder to grasp, and harder to debug—however, it is beneficial to be aware of these patterns.

Pure code is composed out of pure functions. Let's start by defining what they are.

Defining a Pure Function

We define a pure function by two criteria.

1. Given the same arguments, the return value will always be the same.

In other words, the function maps its domain, a set of input values, to its codomain, a set of output values. Each input value is associated with at most one output value.

This means that the function body may not depend on variables outside its arguments, unless they are constant for the lifetime of the program. If the value of one of these variables changes between two invocations of a function, it could return two different output values for a single input value.

Here are some examples of functions that violate this criterion:

let mutableVariable = 1

// This function is impure
// if 'mutableVariable' ever changes.
function fn() {
    return mutableVariable;
}

Math.random() // output depends on system state, an outside variable
Date.now() // output depends on system time, an outside variable

2. Evaluating the function doesn't result in observable side effects.

We can distinguish two types of side effects. The first type consists of effects bounded in the JavaScript runtime:

These two functions perform these side effects and violate the criterion:

let currentUserId = 2;

function setCurrentUser(userId) {
    currentUserId = userId; // modifies a global variable
    return userId;
}

function addToArray(arr, element) {
    arr.push(element); // modifies array in-place
}

The second type consists of effects extending outside the JavaScript runtime:

These functions violate the criterion with the second type of effects:

function getWebpage(url) {
    return fetch(url); // makes a network request
}

console.log('Hello world') // causes an observable side effect in the console

function modifyDOM() {
    window.document.write('Hello world'); // modifies the DOM
}

We could say that if none of these side effects occur during a function execution, it must be pure, but that doesn't hold for long. Everything becomes a side effect when you peel back the layers of abstraction and get closer to the hardware. Those side effects exist to modify state, which is inherent in the physical world. A bit is flipped in-place because you can't conjure new bits out of thin air, just as you can't create physical matter out of nothing. Purity is a man-made abstraction that hides the state and side effects underneath, leading us to stronger reasoning about our code.

A notable corollary is that if a system already has few states and side effects, purity doesn't offer much benefit and only serves to restrict us. Thus, purity is best used to manage complexity arising from state.

While our criteria may seem strict, they allow the first type of side effects within a function invocation as long as they are not observable by the caller. Here's an example of such a function:

function copyArray(array) {
    const copiedArray = []; // modifying local scope
    for (let i = 0; i < array.length; i++) { // modifying local scope
        copiedArray.push(array[i]); // modifies `copiedArray` in-place
    }
    return copiedArray;
}

In this case, the return value is initialized to an empty array, appended to in-place to achieve the final value, and returned. No side-effects are visible to the caller.

Finally, here are examples of pure functions in the JavaScript standard library:

Math.floor // Will always return the same value for the same input
String.prototype.toLowerCase // returns a new string in lower case
Array.prototype.map // assuming that the `mapFn` argument is pure

Now that we've agreed on the definition of a pure function, let's dive into the patterns. While not needed in practice, we will take the time to turn low-level side-effects such as language statements, modifying local scope and mutating values to pure code. This serves an important purpose, because these patterns are the building blocks used to turn more complex side-effects pure.

Global Variables

Instead of referring to global variables inside a function, pass the global environment or a part of it as an argument to the function.

Impure

This functions violates the second function purity criterion by depending on a value outside of its arguments.

function getDocument() {
    return global.window.document;
}
getDocument()

Pure

function getDocument(environment) {
    return environment.window.document;
}
getDocument(global);

Using this pattern can significantly simplify your tests as you don't have to mock or monkeypatch global variables for tests—just supply your own value for the environment. However, it may become cumbersome if the globals are needed in only a few places deep in the call stack, as you need to pass the environment to each nested function call if they depend on it.

Variable Assignment

Instead of mutating local scope with a variable assigment, move the rest of the statements in the function body to a new function with a single argument, and call it with the value of the variable.

Impure

function doubleAndAddTen(x) {
    const doubled = x * 2;
    return doubled + 10;
}

Pure

function doubleAndAddTen(x) {
    return (doubled => doubled + 10)(x * 2);
}

The doubled binding is now a function parameter, which doesn't modify the local scope for doubleAndAddTen.

This pattern is of little practical use by itself, but it's an elementary building block used to turn loops pure, as well as sequencing side effects.

Loops

Instead of mutating local scope during a loop, the loop body can be turned into a recursive function loop that accepts all loop variables as arguments, starting with the initial values, and calls itself until the loop is finished, and returns the final values for all the loop variables.

Impure

let i = 0;
while (i < 5) {
    i += 1;
}

Pure

function loop(i) {
    if (i < 5) {
        return loop(i + 1);
    }
    return i;
}
const i = loop(0);

Do not use this pattern in a language without tail call optimization, as each loop iteration will add a call to the stack, leading to stack overflow on long loops.

Exceptions

Instead of throwing exceptions, indicate exceptional states in the return value. Here, safeDivide performs division, but returns 0 in case the division throws an error.

Impure

function divide(dividend, divisor) {
    if (divisor === 0) throw new Error("Can't divide by 0.");
    return dividend / divisor;
}

function safeDivide(dividend, divisor) {
    try {
        return dividend / divisor;
    } catch (e) {
        return 0;
    }
}

We could think of exceptions as another possible return value that is implicitly propagated up the call stack unless the return value is explicitly handled with a try/catch construct. As such, they may be considered pure.

Pure

function divide(dividend, divisor) {
    if (divisor === 0) {
        return {
            ok: false,
            value: new Error("Can't divide by 0").
        };
    }
    return {
        ok: true,
        value: dividend / divisor;
    }
}

function safeDivide(dividend, divisor) {
    const result = divide(dividend, divisor);
    if (result.ok) {
        return result.value;
    }
    return 0;
}

In this pattern, the function caller must know how to distinguish a normal return value from an error value. This is done using the ok property. Unless we add another layer of abstraction, all calls to functions that produce these kind of return values must explicitly handle both value types, which adds boilerplate.

Mutating In-Place

Instead of mutating state in-place, return a new, updated value.

Impure

setFirst and setSecond methods mutate the internal state of the instance.

class MutablePair {
    constructor() {
        this.first = undefined;
        this.second = undefined;
    }

    setFirst(value) {
        this.first = value;
        return this;
    }

    setSecond(value) {
        this.second = value;
        return this;
    }
}

const pair = new MutablePair();
pair.setFirst(1).setSecond(2);
// MutablePair { first: 1, second: 2 }

Pure

class ImmutablePair {
    constructor(first = undefined, second = undefined) {
        this.first = first;
        this.second = second;
    }

    setFirst(value) {
        return new ImmutablePair(value, this.second);
    }

    setSecond(value) {
        return new ImmutablePair(this.first, value);
    }
}

const pair = new ImmutablePair();
const finalPair = pair.setFirst(1).setSecond(2);
// ImmutablePair { first: 1, second: 2 }

Some types of values are commonly implemented as immutable in programming languages, such as dates, times, numbers, and strings, because they map well to our understanding. For example, a date does not have state to mutate; it is what it is, it cannot be changed. We can calculate new dates based on a starting point, but that has no effect on the initial date. As a result, this pattern is common in both imperative and functional languages.

Stateful Computations

Instead of holding state in variables that change value after executing statements, pass computation state as an argument to a function, and return the computation result and a new computation state.

Impure

This function, used to generate sequential integer identifiers, violates both of our function purity criteria.

let state = 0;
function getNextId() {
    return state++;
}

getNextId();
// 0
getNextId();
// 1

Pure

function getNextId(state) {
    return [
        state /* return value */,
        state + 1 /* next state */
    ];
}

const initialState = 0;
const [firstId, secondState] = getNextId(initialState);
// [0, 1]

const [secondId, thirdState] = getNextId(secondState);
// [1, 2]

The pure implementation makes getNextId very predictable on each call, but adds a lot of boilerplate to manage all the intermediary states (secondState, thirdState). It is common in purely functional languages to add another layer of abstraction to stateful computations that manages them implicitly, akin to imperative languages.

Managing Inherent Impurity

The forms of impurity we've looked at so far can be fully eliminated from our code. However, side effects that extend outside the language runtime can not be eliminated from an application. Further, all applications must have side effects in order to be useful: if a program doesn't write anything to an output stream, or perform observable side-effects, you might as well not run the program. Instead of letting impurity permeate our business logic, we can hold on to purity for as long as possible and perform side effects at the edge of our application where they won't affect our ability to reason about code.

A practical edge of our application is the main function that initiates program execution, and runs all the side effects we have carefully prepared.

Side Effects

Instead of performing a side effect in a function, return enough information to allow you to perform the side effect later.

Impure

This function call violates our second criterion for function purity: the caller may observe a side-effect of output in the console.

console.log('Logging to the console is a side effect.');

Pure

function pureLog(...args) {
    return {
        fn: console.log,
        args: args,
    };
}

const sideEffect = pureLog('Logging to the console is a side effect.');

// At the periphery of your program,
// perhaps in your 'main' function.
sideEffect.fn(...sideEffect.args);

The function call is encoded in an object with properties fn and args, and executed later based on those values. Note that this is just one way to encode a side effect. Another simple approach is to return a thunk, a function with zero arguments, that holds the data in a function closure and allows you to call it to perform the side effect:

function pureLog(msg) {
    return () => console.log(msg);
}

const sideEffect = pureLog('Logging to the console is a side effect.');

// At the periphery...
sideEffect();

Since this encoding is the simplest, we'll use it in the following examples. However, in practice you want to encode your side effects explicitly. There are two main reasons for this. First, it makes it possible to change how side effects are executed at runtime. In tests you may want to mock network requests, or read from a mock database. Given this side effect:

// Returns a Promise for the JSON parsed response payload.
async function impureGetJSON(url) {
    const response = await fetch(url);
    return response.json();
}

function pureGetValueFromAPI(resourceID) {
    return {
        fn: impureGetJSON,
        args: ['https://example.com/' + resourceId]
    };
}

const sideEffect = pureGetValueFromAPI(1);

You can implement separate "runners":

function runNormal(effect) {
    return effect.fn(...effect.args);
}

function runTest(effect) {
    if (effect.fn === impureGetJSON) {
        // Mock a response payload for our fetch.
        const [resourceId] = effect.args;
        return Promise.resolve({ status: 'ok', id: resourceId });
    }
    // If we don't provide a mock for the effect,
    // fall back to executing it.
    return effect.fn(...effect.args);
}

Second, it allows you to test your logic deterministically, assuming your side effects run correctly. A unit test for pureGetValueFromAPI would simply be:

expect(pureGetValueFromAPI(1)).to.deep.equal({
    fn: impureGetJSON,
    args: ['https://example.com/1']
})

This example is of course simple enough not to warrant much testing, but the utility quickly increases with complexity. For example, redux-saga uses this pattern to allow testing complex asynchronous flows with side effects.

Sequenced Side Effects

Instead of using two sequential statements to perform two side effects one after the other, force the sequence evaluation order using expression evaluation order: arguments in a function call must be evaluated before the function body.

Impure

console.log('One');
console.log('Two');
// Output:
// One
// Two

Pure

Here we use thunks with closures to encode side effects as data, like we saw in the previous section:

function pureLog(msg) {
    return () => console.log(msg);
}

const sideEffectOne = pureLog('One');
const sideEffectTwo = pureLog('Two');

Using statements, we would do:

sideEffectOne();
sideEffectTwo();
// Output:
// One
// Two

But we can turn this into a single expression:

const sideEffectSequence = (firstEffect, secondEffect) =>
    () => (ignoredReturnValue => secondEffect())(firstEffect());

const sequencedSideEffect = sideEffectSequence(pureLog('One'), pureLog('Two'));

sequencedSideEffect();
// Output:
// One
// Two

Sequencing Dependent Side Effects

Instead of saving side effect results, such as lines from a file or user input from a stream, in a variable and using the variable value in the following statement, use expression evaluation order to force the sequence and delay determining the second side effect until the first side effect result is resolved.

This is a small but powerful adjustment to the previous pattern.

Impure

Here, we prompt the user for their name in a dialog, and output it to the console. The second effect depends on a value produced in the first.

const name = window.prompt('Please enter your name');
console.log(`Hey ${name}!`);

Pure

function purePrompt(message) {
    return () => window.prompt(message);
}

function pureGreet(name) {
    return () => console.log(`Hey ${name}!`);
}

const chainSideEffects = (firstEffect, decideSecondEffect) =>
    () => decideSecondEffect(firstEffect())();

const sideEffect = chainSideEffects(
    purePrompt('Please enter your name'),
    pureGreet
);

sideEffect();

Due to the expression evaluation order, firstEffect, or the prompt for the user name, is performed first. The return value of that function will be the name entered by the user. That value is passed to pureGreet, which is bound to the argument name decideSecondEffect. To fully understand the process, let's expand the definition for sideEffect:

// Starting point.
let sideEffect = chainSideEffects(
    purePrompt('Please enter your name'),
    pureGreet
);
// Expand the definition of chainSideEffects
sideEffect = () => pureGreet(purePrompt('Please enter your name')())();
// Expand purePrompt
sideEffect = () => pureGreet((() => window.prompt('Please enter your name'))())();
// Replace (() => X)() with X in the purePrompt definition
sideEffect = () => pureGreet(window.prompt('Please enter your name'))();
// Expand pureGreet
sideEffect = () =>
    (name => () => console.log(`Hey ${name}!`))(
        window.prompt('Please enter your name')
    )();
// Remove one redundant thunk layer
sideEffect = () =>
    (name => console.log(`Hey ${name}!`))(
        window.prompt('Please enter your name')
    );

As we can see from the expanded definition, it looks almost identical to the pure code in Variable Assignments, except that instead of binding a value returned from a pure function, we're binding a result from a side-effect and wrapping the procedure in a thunk.

Statements

Whereas expressions evaluate to a value, statements don't evaluate to anything. They direct the runtime to perform a side effect. For example, a variable assignment directs the language runtime to bind a value to a name in the current scope. An assertion statement directs the language runtime to throw an exception if the guard value is falsy. The return statement directs the runtime to exit the function, assigning the return value of the function according to the statement.

These are all side effects. The order of the statements determines in which order those effects are performed. Often a statement depends on a side-effect performed by a previous statement, such as a variable being assigned before using it.

The two previous sections described how we can perform one side effect after the other. That's what statements are. We've already turned statements pure. chainSideEffects in the previous section implements the idea of a semicolon: first do this; then that. Instead of being restricted to how the language runtime decides to implement semicolons, we can decide for ourselves. Haskell, for example, provides syntactic sugar for writing statements that are turned to pure expressions with the do notation, but letting the programmer decide what the semicolon means.

Composing Applications with Pure Functions

We have examined patterns that can turn the vast majority of application code pure. Applying these patterns is not hard at all—it is more difficult to learn how to compose applications with pure functions. It can definitely be done, but the tools are very different than control flow in imperative languages.

Normal function composition is straightforward. It is the glue that holds together pieces of a functional application. It combines two functions f and g into a function fg that pipes a value x through these two functions, first applying g to the value x to yield y and then applying f to y to yield the final return value z. Note that in this notation, the order of function evaluation goes from right to left, from g to f. In JavaScript:

function compose(f, g) {
    return function(x) {
        const y = g(x);
        const z = f(y);
        return z;
    }
}

// or more succinctly
const compose = (f, g) => x => f(g(x));

This type of composition works when the return type of g matches the argument type for f. For example, you can compose a function that returns a string with a function that accepts a string as an argument.

There is no requirement for the types to match in dynamic languages, but to compose functions in a sane manner, it is highly beneficial that returned values satisfy a common interface. For dynamic languages, you can replace interface for type in this section.

However, when you turn impure code pure, we're changing the return type to add more information. Instead of a single value, were returning a value and state information, or a value or error information, or altogether something else like a thunk that can produce a value later. We call these functions effectful computations that produce a result value and one or more other values that describe the effect. Effectful computations can't use normal function composition because the return and argument types don't match. Further, what a composition of two effectful computations means needs to be determined for each type of effect.

Say we have two computations that may fail, and that we want to compose. Both take a single argument of some type T, and both return a value of type ErrorOrOk<T>, that is the return value includes two pieces of information: a success flag indicating whether a failure occurred, and a value that is either an Error or a normal value depending on the flag. Because ErrorOrOk<T> doesn't match T, we can't compose two functions of T -> ErrorOrOk<T>.

The common way to define the composition of two functions which may both fail is as follows:

  1. Call the first computation g with the input argument x and go to step 2.
  2. If g(x) failed, return g(x). Otherwise, go to step 3.
  3. Let the successful return value of g(x) of type T be y and go to step 4.
  4. Call the second computation f with y and return the result.

Which looks like this in JavaScript:

function composeFailable(f, g) {
    return function(x) {
        const firstResult = g(x);

        if (firstResult.ok) {
            const y = firstResult.value;
            return f(y);
        }

        return firstResult;
    }
}

As each effectful computation needs to implement their own composition, purely functional languages offer generic operations for composition that work an all computations. When an effectful computation implements these generic operations, it is said to form a monad. Because monad operations are commonly used to describe effectful computations in purely functional languages, they're ubiquitous in that paradigm. However, monads as a concept is more abstract than just effectful computations, which makes it harder to understand intuitively and apply in practice.

Conclusion

We examined common impure code patterns, compared them to their pure counterparts, and discussed the consequences of building applications from smaller parts of pure code. Hopefully this document exposed some of the magic that allows functional languages to conduct imperative operations with pure grace.

I don't recommend fully converting your applications to pure code: use common sense and embrace the strengths of the language you are working in.