Julia Pipeline Operator (|>): Write Methods that Read Like Proses
Chaining methods can enable writing more expressive functions in many scenarios. While the pipeline operator in Julia (|>) offers a more Unix bash-style pipelining rather than the traditional method chaining approach, we can still leverage it to write succinct functions.
I became interested in fluent interface (method chaining) while working in reactive programming paradigm in Java. As I got more confident in the reactive paradigm, I started to consider borrowing the fluent interface style from it wherever I saw fit. As I am writing more Python code now, I can appreciate any library that allows method chaining, especially when I need to define a chain of transformations where the intermediate objects are not necessary.
Although many Python data-wrangling libraries, such as Pandas, Dask, and Polars have leveraged this pattern’s expressiveness, it is unfortunately missing in the core Python data structures. I recently picked the Julia programming language for a new project, where I discovered the pipeline operator (|>). This operator is not the most feature-rich in contrast to other functional programming languages, but it can still be used to assimilate method chaining style.
As we explore the fluent interface and the pipeline operator, we will continue writing solutions to the following toy problem.
Problem Statement: Design a function that will take two lists of the same size and transform the data in this sequence:
1. Compute in-place differences
2. Square the differences in place
3. Keep only values that are greater than 100 and less than 200
4. Find the average of the elements
Making a Case for Fluent Interfaces
Python offers a very interesting feature for data manipulation, namely comprehensions. We can use list comprehensions to solve the given problem.
def foo(arr1, arr2):
differences = [x-y for x,y in zip(arr1, arr2)]
squared = [x**2 for x in differences]
filtered = [x for x in squared if x>100 and x<200]
average = sum(filtered) / len(filtered) # Ignoring division by 0 error
return average
To the eyes of a Pythonista, this is rightly a readable enough code. However, we can be more pedantic by arguing on two points. First of all, the solution does not read like the problem statement. Secondly, finding good fitting names for the intermediate variables such as differences
, squared
, and filtered
won’t always be possible. In fact, often we don’t even bother to give them proper names. We can fortify this statement by observing a forward method used in PyTorch.
def forward(self, x):
x = self.layer1(x)
x = self.relu(x)
x = self.layer2(x)
return x
Notice how all the intermediate variables are named as x
simply because the intermediate steps are not important for us. Also, not naming these intermediate values didn’t decrease the code readability, if not the opposite. This progression brings us to whether we can avoid the intermediate values altogether, and one answer lies in function composition. If we have two functions f(x) and g(x), and we want first to compute g(x) and with the returned value, we want to compute f(x), then we can express it mathematically with the composite function h(x) = f(g(x)). Followingly, we can rewrite our foo(arr1, arr2)
function as foo_composite(arr1, arr2)
as defined below.
def differences(arr1, arr2):
return list(map(lambda x: x[0]-x[1], zip(arr1,arr2)))
def squared(arr):
return list(map(lambda x: x**2, arr))
def filtered(arr, low, high):
return list(filter(lambda x: low<x<high, arr))
def averaged(arr):
return sum(arr) / len(arr)
def foo_composite(arr1, arr2):
return averaged(filtered(squared(differences(arr1, arr2)), 100, 200))
The foo_composite
function now assimilates the mathematical way of composing functions, and doesn’t require any intermediate variables. However, if we were to understand how this function works, we would have to read it right-to-left. This right to left reading feels unnatural since we read most of our codes left-to-right. Also, if we followed the execution chain, differences(arr1, arr2)
which would be called before the other methods appear on the right side. In addition to this, to find each next-to-be executed function, we have to keep reading toward the left. Fluent reading of code is also disrupted by the parameters `100` and `200` which require eye straining parenthesis matching to find their function.
The problems stated above could have been resolved if we could chain methods one after another with the assumption that each method passes its returned object to the next method in the chain.
# Pseudocode
def foo_pseudo_chained(arr1, arr2):
return (
zip(arr1,arr2)
.map(differences)
.map(squared)
.filter(lambda x: 100<x<200)
.map(averaged)
)
Pseudocode is used here to demonstrate how methods could be chained since there is no way to achieve this on the core Python data types without referring to some libraries. Polars
is an excellent library to showcase this style of method chaining.
import polars as pl
def foo_polars(csv_path):
return (
pl
.read_csv(csv_path, infer_schema_length=1000)
.select(
pl.col("arr1"),
pl.col("arr2")
)
.select(
pl.apply(["arr1","arr2"], lambda xy: xy[0] - xy[1])
)
.select(
pl.first().pow(2) # Apply power to the first column
)
.filter(
pl.first() > 100,
pl.first() < 200,
)
.select(
pl.first().mean()
)
)
To the trained eye, this code reads like prose. Look how reading this function top-to-bottom allows us to read itin how we would describe it in English: we are reading a csv file, selecting certain columns, selecting a new column of differences, selecting a new column of the squared values, filtering based on some condition, and finding the mean. We are reading the code top-to-bottom and left-to-right while avoiding intermediate steps.
|> Pipeline Operator
The syntax for pipelining methods in Julia is similar to pipelining in bash.
result = input |> function1 |> function2 |> function3
Here, the result is obtained by passing the input through three functions, function1, function2, and function3, sequentially. We can’t create a method chain in Julia with the dot operator because, unlike other programming languages, types in Julia can’t own behaviours. We can only define functions that may use a certain type. Hence, if we want to achieve anything similar to the fluent interface, we need to use pipelines. Let’s portray another example.
data = 2;
f(x) = x + 5;
g(x) = x / 7;
h(x) = x * 8;
# The following two lines are equivalent
println(h(g(f(data)))) # right-to-left
data |> f |> g |> h |> println # left-to-right
If we had more functions, or if we had longer function names, we could convert it to a top-to-bottom counterpart. Note that, like in Python, an extra pair of parenthesis will be required so that Julia can accept multi-line expressions as a single line.
function foo_julia(arr1, arr2)
return (
zip(arr1, arr2)
|> list -> map(y -> y[1]-y[2], list)
|> list -> map(y -> y^2, list)
|> filter(y -> 100<y<200)
|> list -> sum(list)/length(list)
)
end
As a side note, in Julia the syntax
x -> x^2
is used to define anonymous functions. In this case this would be equivalent tolambda x: x**2
in Python.
The necessity of using list -> …
repeatedly hinders the conciseness of this code. However, the metaprogramming capabilities of Julia allowed the creation of modules like Pipe which can help us retain some conciseness.
# Install in Julia repl: using Pkg; Pkg.add("Pipe")
using Pipe: @pipe
function foo_julia_pipe(arr1, arr2)
return @pipe(
zip(arr1, arr2)
|> map(y -> y[1]-y[2], _)
|> map(y -> y^2, _)
|> filter(y -> 100<y<200)
|> sum(_)/length(_)
)
end
The macro _@pipe
replaces the underscores with the output of the previous function. In addition, it won’t be necessary to write x -> f(x)
to indicate a function; instead,f(_)
will do just that.
There are cases when most intermediate steps are already defined as functions that take single parameters, in which case we can have an even more concise way of writing this code.
difference(arr1, arr2) = arr1 .- arr2
squared(arr) = arr .^ 2
mean(arr) = sum(arr) / length(arr)
function foo_julia_v2(arr1, arr2)
return (
difference(arr1, arr2)
|> squared
|> filter(y -> 100<y<200)
|> mean
)
end
In Julia an operator preceded by a dot
.+ .- .^
indicate an element-wise operation with an array.
By contrasting this example with the previous ones, we notice that it is easier to deal with single-parameter functions with the pipeline operator. A function with two or more parameters necessitates either the use of an anonymous function or the use of the pipe macro.
Concerns
It is important to make sure that we don’t overengineer a solution in the name of code styling, or readability. The toy example we have been working on could be very easily solved in the following manner.
function foo_julia_no_pipeline(arr1, arr2)
x = (arr1 .- arr2) .^ 2
x = [i for i in x if 100<i<200]
return sum(x) / length(x)
end
This version defines the intermediate variable x
and is not written to reflect the problem statement. Nonetheless, we cannot dismiss the fact that this code is simple and understandable. This won’t always be true, especially when the number of transformations to be executed in sequence keeps getting longer. In those scenarios, one should consider using method chaining.
At this point, it should also be disclosed that the pipeline doesn’t offer a concise way of error handling or branching of code, two features that exist in the Java reactive programming implementation. The current implementation of the Julia pipeline operator should not be taken as an alternative or exact copy of the fluent interface, as they are two distinct concepts. There are places where forcing pipelines may feel simply wrong. Hence, it should not be shoehorned into every single scenario.
In Conclusion
Using the Julia pipeline operator (|>) and fluent interfaces offers a compelling alternative to the traditional way of writing functions, providing a concise and expressive way to compose functions and perform sequences of transformations. The example problem illustrated the advantages of this approach, emphasizing readability and eliminating the need for explicit intermediate variables. Knowing about the pipeline and fluent interfaces can introduce a transformative paradigm shift in someone’s coding style.