9 min

Ruby's TSort explained

Ruby's TSort explained

I was suprised to find out about TSort, a standard Ruby module implementing Tarjan’s graph algorithm. When you come and think about it, it’s in fact no surprise that it is in the standard library. Dependency management is behind every corner in software engineering: from managing you gems, to implementing formulas or executing with priorities, … and the list goes on.

Using this beautiful abstraction, I was able to clean up a lot of code. As an added bonus there is an enormous speed advantage compared to doing it yourself in Ruby, since it’s part of the standard library implemented in C. The aim of this post is to demonstrate how it works, where it is used, and how you can take advantage of it in your own projects. We’re not going into the details of the algorithm itself, as the wikipedia article is a quite good resource for it.

The problem

Suppose we have a simple set of assignments:

A = B + C
B = C + 2
C = 2
D = B + A + 1

Simple enough, you easily figured out that we have A = 6, B = 4, C = 2, D = 11.

But consider this second set,

E = F + G
F = E - 1
G = F + 3

here we stumble upon a cyclic dependency, to calculate F you have to know E, but to calculate E you have to know G, but to calculate G you need F! This is just a linear system, so the solution E = -1, F = -2, G = 1 checks out, but in order to find this we had to manipulate the equations instead of changing the order of assignments.

Now imagine you have a very big list of equations, and you want to know if there are any cyclic dependencies present. You need some way to order these assignments so we are sure the needed variables are already calculated when you reach a certain equation in the list. In other words, it should warn you exactly for the situation described above.

The picture

Everything becomes clearer if you make a picture of it, in our case we’ll use a graph. To make a graph we need two main ingredients, nodes and relationships between them.

A B C D E F G

Upon drawing this picture we see the main difference between the two situations, we have the cycle F,G,E in the second graph. This is a so called strongly connected component, a part of the graph where each node is reachable from the others in the collection. As you’ll notice in the first graph, each strongly connected component is in fact size 1! This is in fact a sufficient condition guaranteeing we have no cycles. Tarjan’s algorithm figures out these strongly connected components, the order in which one has to calculate the equations is a nice side effect.

The library

This is where the our library comes into play. (If you haven’t done it yet, please read the excellent documentation for the TSort standard library). How does it work? We’ll make two classes Assignment and AssignmentSet, the AssignmentSet is the one doing the sorting, so it’ll include our module TSort.

require 'tsort'

class Assignment
  attr_reader :variable, :equation

  def initialize(variable, equation)
    @variable = variable
    @equation = equation
  end
end

class AssignmentSet
  include TSort

  def initialize(assignments)
    @assignments = assignments
  end

  #...
end

The AssignmentSet is simply a collection of Assignment objects, an Assigment object contains 2 strings, representing the right hand side in @variable and the left hand side in the @equation. The ultimate goal is to sort those assigments so that every term on the right hand side is already known before we get to a specific assignment in the set. Ideally we would like to call assignment_set.tsort_each to iterate over our assignments while executing them, so that the right hand side variables needed in an assigment are already calculated once we get to it.

To make this method call the library requires you to implement 2 methods.

We basically have to implement our nodes and relations from the graph above, these methods are enumerators and so we have to keep in mind that they are receiving a block. With these two defined, our graph is fully defined and the internals of TSort can figure out the order.

The solution

So how do we implement these? Well our nodes are just all the variables that are assigned. Our children are all the variables that a certain parent node is required for.

class AssignmentSet
  include TSort

  def initialize(assignments)
    @assignments = assignments
  end

  def tsort_each_node(&block)
    @assignments.map(&:variable).each(&block)
  end

  def tsort_each_child(node, &block)
    @assignments.select do |assignment|
      assignment.equation.include? node
    end.map(&:variable).each(&block)
  end
end

In this example I kept the parsing of the equation basic on purpose, but if you want to do some advanced parsing be sure to check out TreeTop

At the end of the day we’re able to do this:

assignments = []
assignments.push Assignment.new('A','B + C')
assignments.push Assignment.new('B','C + 2')
assignments.push Assignment.new('C','2')
assignments.push Assignment.new('D','B + A + 1')

assignment_set = AssignmentSet.new(assignments)

assignment_set.tsort_each do |node|
  p node
end

Giving us the desired order.

The examples

Without knowing it you’ve probably been using this standard library for a long time. It’s in the heart of bundler, the defacto package manager for gems. It used to be in the bundler core itself but now they have migrated to the Molinillo gem which is in fact just a fancy wrapper around the standard library, as can be seen in this file.

But we don’t even have to look that far, it’s used in the startup process of Rails itself!

In fact when defining an initializer in Rails we can specify a before and an after option, ensuring our initializer executes on the right place in our code.

initializer 'ours', after: 'runs_after_this', before: 'runs_before_this' do |app|
  #...
end

Let us look at a piece of the source to see how this is implemented:

class Collection < Array
  include TSort

  alias :tsort_each_node :each
  def tsort_each_child(initializer, &block)
    select { |i| i.before == initializer.name || i.name == initializer.after }.each(&block)
  end

  def +(other)
    Collection.new(to_a + other.to_a)
  end
end

All the nodes are simply all the initializers, as can be seen by the aliasing of tsort_each_node and the children are those nodes who specified to be before our initializer and that one initializer we specified to be executed after. As a child will always be ordered first in our topological sort, we are ensured that the correct order is respected when using tsort_each later on on the collection. Makes sense, doesn’t it?

Diving deep into the internals of Rails or other open source gems teaches us a great deal, and this was again an excellent example. Go out there and see for yourself.

Thanks for reading this through, hope you learned something!

Latest of my written stuff

Ruby's TSort explained

I was suprised to find out about TSort, a standard Ruby module implementing Tarjan’s graph algorithm. When you come and think about it, it’s in fact no surprise that it is in the standard library. Dependency management is behind every corner in software engineering: from managing you gems, to implementing...