I was suprised to find out about TSort, a standard Ruby module implementing Tarjan’s graph algorithm. When you come and think about it, it’s in fact no surprise that it is in the standard library. Dependency management is behind every corner in software engineering: from managing you gems, to implementing formulas or executing with priorities, … and the list goes on.
Using this beautiful abstraction, I was able to clean up a lot of code. As an added bonus there is an enormous speed advantage compared to doing it yourself in Ruby, since it’s part of the standard library implemented in C. The aim of this post is to demonstrate how it works, where it is used, and how you can take advantage of it in your own projects. We’re not going into the details of the algorithm itself, as the wikipedia article is a quite good resource for it.
The problem
Suppose we have a simple set of assignments:
Simple enough, you easily figured out that we have A = 6, B = 4, C = 2, D = 11
.
But consider this second set,
here we stumble upon a cyclic dependency, to calculate F you have to know E, but to calculate E you have to know G, but to calculate G you need F! This is just a linear system, so the solution E = -1, F = -2, G = 1
checks out, but in order to find this we had to manipulate the equations instead of changing the order of assignments.
Now imagine you have a very big list of equations, and you want to know if there are any cyclic dependencies present. You need some way to order these assignments so we are sure the needed variables are already calculated when you reach a certain equation in the list. In other words, it should warn you exactly for the situation described above.
The picture
Everything becomes clearer if you make a picture of it, in our case we’ll use a graph. To make a graph we need two main ingredients, nodes and relationships between them.
- Nodes: Our variables
- Relations: Let’s draw a directed arrow from B to A, because B is needed to calculate A. So in general, two variables X and Y are related if Y is a prerequiste for calculating X. One can read the arrows as: “… goes in to …”.
Upon drawing this picture we see the main difference between the two situations, we have the cycle F,G,E in the second graph. This is a so called strongly connected component, a part of the graph where each node is reachable from the others in the collection. As you’ll notice in the first graph, each strongly connected component is in fact size 1! This is in fact a sufficient condition guaranteeing we have no cycles. Tarjan’s algorithm figures out these strongly connected components, the order in which one has to calculate the equations is a nice side effect.
The library
This is where the our library comes into play. (If you haven’t done it yet, please read the excellent documentation for the TSort standard library). How does it work? We’ll make two classes Assignment
and AssignmentSet
, the AssignmentSet
is the one doing the sorting, so it’ll include our module TSort
.
The AssignmentSet
is simply a collection of Assignment
objects, an Assigment
object contains 2 strings, representing the right hand side in @variable
and the left hand side in the @equation
. The ultimate goal is to sort those assigments so that every term on the right hand side is already known before we get to a specific assignment in the set. Ideally we would like to call assignment_set.tsort_each
to iterate over our assignments while executing them, so that the right hand side variables needed in an assigment are already calculated once we get to it.
To make this method call the library requires you to implement 2 methods.
tsort_each_node(&block)
tsort_each_child(node, &block)
We basically have to implement our nodes and relations from the graph above, these methods are enumerators and so we have to keep in mind that they are receiving a block. With these two defined, our graph is fully defined and the internals of TSort can figure out the order.
The solution
So how do we implement these? Well our nodes are just all the variables that are assigned. Our children are all the variables that a certain parent node is required for.
In this example I kept the parsing of the equation basic on purpose, but if you want to do some advanced parsing be sure to check out TreeTop
At the end of the day we’re able to do this:
Giving us the desired order.
The examples
Without knowing it you’ve probably been using this standard library for a long time. It’s in the heart of bundler, the defacto package manager for gems. It used to be in the bundler core itself but now they have migrated to the Molinillo gem which is in fact just a fancy wrapper around the standard library, as can be seen in this file.
But we don’t even have to look that far, it’s used in the startup process of Rails itself!
In fact when defining an initializer in Rails we can specify a before and an after option, ensuring our initializer executes on the right place in our code.
Let us look at a piece of the source to see how this is implemented:
All the nodes are simply all the initializers, as can be seen by the aliasing of tsort_each_node
and the children are those nodes who specified to be before our initializer and that one initializer we specified to be executed after. As a child will always be ordered first in our topological sort, we are ensured that the correct order is respected when using tsort_each
later on on the collection. Makes sense, doesn’t it?
Diving deep into the internals of Rails or other open source gems teaches us a great deal, and this was again an excellent example. Go out there and see for yourself.
Thanks for reading this through, hope you learned something!