In computer science, functions are said to be either pure or impure. A pure function is one that doesn't act on data outside of its inputs, or modify data outside of its outputs. An impure function, on the other hand, is one that doesn't follow these rules.
Let's dive in and better understand the place for each of them.
A pure function, put simply, is a function that satisfies the following conditions.
- It acts on no data outside of the one it explictly receives as arguments.
- It modifies no data outside of the one it explictly returns, as outputs.
That's it. It's a very simple yet profound concept. It's profound because once we adopt a style of programming revolving around pure functions - namely functional programming - the following good things happen.
- Functions become easier to reason about, because their limits are more well defined, and we can rest assured that they are not silently and mysteriously changing data.
- Unit testing becomes easier, as writing tests for pure functions doesn't require mocking up any external data: everything is happening due to the inputs.
- Code becomes composable, meaning we can chain together smaller functions to produce bigger ones. Those bigger ones can in turn be chained together for even more sophisticated and specialized functions. Think Lego blocks.
Here's a basic example.
Above we have a naive implementation of a function that multiplies two numbers. This implementation is pure, because it acts only on data received through arguments, and explictly returns a value. It does nothing more. It does one thing, and does it well. We can also notice that this pure functions - and any pure function by definition - will return the same value when given the same inputs, no matter how many times it runs.
A pure function will return the same value when given the same inputs, no matter how many times it runs.
Let's now compare this pure function with the following impure implementations of multiplier functions.
In the example above, the function is acting on data from a free (non-local) variable outside of its scope: the variable
y. It's not acting just on its arguments, so it's impure, even though the result is the same as the pure version. This is known as a side-cause: something that affects the function outside of its inputs.
Here's another example of impurity.
In the above example, the function is modifying a non-local
total variable, and returning that from inside the function body. This modification of non-local values makes it impure: these are called side-effects.
Let's look at this one now.
Did you catch that above? The output is logged to the console - not returned - so it breaks the rules for purity. It's impure. Here's another one.
In this one above, we have a reference to date, but we never use it. It doesn't matter. Just interacting with the
Date object makes the code function impure. Furthermore, the value for
rightNow will be different every time it runs, so we couldn't rely on the innards of the function working the same every time. Here's a last one.
In the silly example above, we're fetching the Google homepage, and then returning the results of the multiplication once it resolves. Why would anybody want to do that? There's no good reason, but it helps to illustrate the principle of impurity: the data returned could be different every single time.
We can then generalize and highlight the following operations as common causes for and symptoms of function impurity.
- Whenever a function references or acts on variables outside of inputs.
- Whenever a function modifies values outside of its outputs.
- Querying servers and databases. This is because the data might change between invocations, thus violating the rule that a pure function must always return the same output when given the same input.
- Most date operations, as these tend to be dynamic and time-dependent.
- Anyting involving randomness.
We can see then that impure functions are a necessary part of creating useful computer programs: after all, in the real world, we have to query databases that might have been modified since we last queried them. We have to work with times and date values, which are constantly changing. We have to deal with - and model - an unpredictable, stochastic and dynamic world.
It's this dynamism that makes computer programs interesting and useful.
At a higher level though, pure functions and impure functions can be broken out to - and fitted into - separate conceptual buckets: deterministic vs stochastic.
Deterministic vs stochastic
Even though pure functions form a central pillar of the functional programming paradigm, and even if they are currently trendy because of their use in React, Redux, and similar libraries, I think we can take a step back and consider them from a higher level of abstraction: determinism.
Pure functions are purely deterministic. They take an input and produce a perfectly predictable value. They are reliable and trusty.
Impure functions, on the other hand, are stochastic - or at the very least unpredictable. They are unreliable, we can't fully rely on their outputs, but we need to use them to make our programs useful.
In large software systems, ones contributed to by large teams of programmers at varying levels of development, predictability is your friend. Clarity is your friend. Ease of reasoning - and testing - are all your friends. Choose pure functions - and determinism - whenever possible. Introduce unpredictability - impure functiontions - only deliberately, when necessary and beautiful.