Human explanation and step-by-step visualisation of the Ford-Johnson algorithm

Ford-Johnson sorting algorithm, or merge-insertion sort, is not a very well known or popular algorithm, which is optimised for making the least amount of comparisons between elements. The reason on why this algorithm is not well known is that it's neither fast, nor the best minimum sort algorithm, which is why there are almost no good resources on this algorithm, between scientific papers, hard to read computer science books, or obscure threads on Reddit or Stack Overflow.

This algorithm has little to do with merge sort but is often confused with it. In the 42 school for cpp09 it's a common mistake, or sometimes a way to cheat, to implement much simpler hybrid sort between merge sort and binary insertion sort. This article aims to give an understanding of the algorithm using simple explanations and step-by-step execution of the algorithm on a set of 22 numbers.

My name is Eldar, and I'm a student at 42 Lyon. Hence, the article will also have the context of one of the school assignments. I decided to write an article on this algorithm because almost no one does this algorithm correctly and understandably since it is hard, and easily one of the more complex sorting algorithms out there.

Steps of the algorithm

I recommend to read first Wikipedia's description of the algorithm, as well as Knuth's. Both of those sources describe the steps of the algorithm a little bit differently. I intend to give my own explanation as well, as I believe that rewording helps to develop understanding; I will also later explain each step individually in depth. Hence, my explanation of the steps for sorting numbers using this algorithm in ascending order:

Recursively divide into pairs of numbers, pairs of pairs of numbers, pairs of pairs of pairs of numbers... etc, and sort them by the biggest number, until we can't form any pair anymore. If there is an unpaired odd element, leave it be.
- I will also refer to the units on which we operate on as elements. elements can be numbers, pairs of numbers, pairs of pairs of numbers etc...
- We will refer to the smallest element in each pair as b, and to the biggest as a. Depending on the index of the pair, we will call them b1, a1, b2, a2... bx, ax.
Create a sequence (also referred to as S or the main chain in Wikipedia and Knuth's book respectively) out of the smallest element of the smallest pair (b1) and the rest of as. If the first step was done correctly, this sequence will be sorted. Create the sequence that consists of the rest of bs (also referred to as the pend), and the odd element if there is any.
- I will refer to those sequences as the main and the pend from now on.
Binary insert the elements from the pend into the main, in the order based on Jacobsthal numbers. I will explain later why and how it works. If we can't insert elements into the main using Jacobsthal numbers anymore, we insert them in reverse order, using similar approach to the plain good ol' binary insertion.

Due to the fact that step 1 is recursive, it will create a bunch of recursion levels, to each of which steps 2 and 3 will be applied. It will look as something like this:

If everything is done correctly, the sequence will be sorted. Of course, I omitted some of the details, for the sake of giving the general description on what we will be doing. The algorithm isn't easy to understand, and you aren't expected to understand it from the first read. After you'll see the visualisation of the algorithm using the established nomenclature, you will have a much better idea.

Step 1: the division into the pairs & sorting

This step is pretty confusing to understand, and also tricky to implement in code, but good visualisation helps. The difficulty is that with each recursion call we operate on bigger and bigger groups of elements: pairs of numbers on the first call, pairs of pairs of numbers on the second call, pairs of pairs of pairs of numbers on the third etc, until we can no longer divide our original sequence into a single pair of elements.

On top of this, we also need to preserve the original pairing: we can't break pairs formed on the first recursion call on the fifth one. This is important for this algorithm, as it relies on certain guarantees about the elements: that the biggest element of b2 is smaller than the biggest element of a2, b3 is smaller than a3, b4 is smaller than a4, bx is smaller than ax. This is important for the algorithm's nature of a minimum comparison sort. The easiest way to achieve this is to simply swap entire groupes of numbers with each other when we sort them. It'll be much clearer in a minute.

For the visualisation I'll denote at the top, the recursion depth we are currently it. Because the algorithm requires recursive sorting of pairs before the insertion part, I can demonstrate this process in this section.

This level of recursion depth is easy to wrap head around: here we operate just on numbers. If the number in the pair is bigger than the other one, we simply swap them in place. After sorting the pairs, I put the b and a labels that I described earlier underneath each element in the new sequence: now it should be clear how the labels work.

This is where it starts to get funky: now our element is a pair of numbers instead of just a number. Also, we have an odd element that is left without its pair. We name it b6.
On top of the white borders denoting pairs, here there are red borders for the pairs of pairs. Notice also that I placed the labels against the last number of each element: this is no coincedence, the labels reside under the number that we actually use for comparison/pair sorting: 11 against 17, 16 against 15 etc, but the swapping happens on an entire subsequence of numbers. Green dotted border denotes an odd pair. Here, only one swap occures: of 8 16 and of 6 15.

Now we have a blue border to denote pairs of pairs of pairs... It's already mouthful to say. But we must go deeper.
Notice that there are numbers without any borders: I chose this way to separate numbers who can't even form an element. At this level, the smallest unit on which we operate consists out of 4 numbers, and the pairs are formed out of 8 numbers. What we are gonna do with those numbers? Simple: we ignore them.
At this level, two swaps occured: 2 11 0 17 and 6 15 8 16, 3 10 1 21 and 9 18 14 19.

Now we are at the level of pairs of pairs of pairs of pairs. We have 22 elements, but the element here is formed out of 8 numbers: hence only one pair of two elements.
17 is lesser than 21. Lucky us! No swaps happened at this level.

There is nothing to be done at the level 5 of recursion: because the amount of numbers in the element doubles with each recursion call, at one point we won't be able to form a pair of elements. This is where we break from this level. Because at level 5 one element consists of 16 numbers, and we have only 22 numbers overall, pair of 2 elements is impossible to form. From now on, we proceed with the steps starting with the level 4, then we go to level 3, 2, 1... until we got our sorted number sequence.

To really understand what's going on at the sorting of pairs level, I invite you to come up with a random sequence of numbers and try to sort them on paper. After doing it a couple of times, you will probably have some ideas on how to implement this step in code.

Steps 2 and 3: the initialization and insertion

We will talk about those two steps together, as due to the recursive nature of the algorithm, they will alternate between each other. Step 1 operated on its own and created a 4 recursion levels, which are for steps 2 and 3 to resolve. Those steps are applied to each of the levels.

Here is where the labels we defined earlier will come in handy.
This step is pretty easy:
The main is initialised with the elements {b1, a1} and then with the rest of as.
The pend is initialised with the rest of bs starting from b2.
The main at this point should be sorted, but not entirely. Like with comparing and swapping at the step 1, only the biggest number in each element counts for "being sorted". Let's demonstrate this with the example we already have. This is where we're at after continuing on the level 4 of recursion:

Not a lot of things to talk about at this recursion level: the pend is empty, as there are only two elements which go directly to the main, since b1 and a1 both start as part of the main. The reason for this is that we already know that b1 is smaller than a1, and because of the logic of the algorithm, and what you will see from further visualisations, the main is sorted, so a1 is smaller than any of the other as, hence no additional comparisons are needed. What counts as "sorted" are the biggest numbers in each element: in this case, in b1 and a1 those are 17 and 21 respectively. That is one of the optimisations of this algorithm to minimize comparisons.

The numbers in the dotted red border 5 12 4 20 7 13 are numbers which can't form even a single element. Just like in the step 1, they are still part of the sequence, they just don't participate in the insertion.

One important thing to understand is that on that step, when we initialise the pend and the main the labels change. Elements can and will change their positions compared to their step 1 position because of all the swapping and insertion. On this step, you can think about it as if the labels refer the position of the elements: for example, on this level element consists of 8 numbers, so b1 will always consist of numbers at positions of 0-7, while a1 of numbers at positions of 8-15, b2, and so on.

Nothing to say about step 3 at this recursion level, as there are no elements in the pend at all. No pend - no insertions. Move on. I will discuss insertion logic when we'll have something to insert.

We are done with level 4: nothing else to do here.

At this level our element consists of 4 numbers. From the start, we divide elements into bs and as. We initialize the pend and the main as I said before: the main is b1 and the rest of as, the pend is the rest of bs. Notice that bs and as are changed. Remember what I told you about relabeling? We change the labels at the step 2 of each recursion level. Notice, however, that the actual number pairings remained unbroken.

This time we have two elements in the pend: b2 + the odd element that didn't participate in step 1, but is a full participant here. There is something to insert, so we can discuss how step 3 works.

Step 3 can be pretty confusing. This is where binary insertion of our merge-insertion algorithm comes in. What do I mean by binary insertion? It means that we use binary search algorithm on our main, which is a sorted sequence, to find a place for insertion for our pend elements. Insertion itself works very similarly to how it's done in the binary insertion sort, but with a twist: the insertion order from the pend is dictated by mathematical sequence called Jacobsthal numbers.

Before we proceed further, let's spend a little bit of time to understand what's going on here. Why Jacobsthal numbers? Why bother with them, what do they do with sorting?

Why Jacobsthal numbers?

Ford-Johnson algorithm uses an important binary search algorithm optimisation that further cuts the number of times we compare our elements to each other. The maximal number of comparisons for the binary search is the same for 2^k as for 2^(k+1)-1. That means that the maximal number of comparisons is the same for 4(2^2) and 7(2^3-1). Same thing for 8 and 15, for 16 and 31... etc. This algorithm uses this property of the binary search during the insertion phase, if it can. What do I mean by "if it can"? Well, as you remember, we didn't used any Jacobsthal numbers during unrolling of our level 4 and 3 recursions. Wait, but how are those damn numbers related to this optimization? Well, turns out, that if you insert the pend elements into the main using a specific order dictated by the Jacobsthal numbers and we respect the bound elements, the surface area for insertions is 2^(k+1)-1 for (nearly) all elements, as opposed to 2^k for the first element and God knows what for next elements, since we would naively insert numbers as in regular binary insertion (like we did during the insertion at the recursion level 3).

The way Jacobsthal numbers dictate the order of insertion is like this: we start from the Jacobsthal number of 3. We start insertion from element b3. We insert elements in the reverse order starting from this element, until we hit b element of number of previous Jacobsthal number. In other words, the amount of inserted elements is current_jacobsthal - previous_jacobsthal.
For the Jacobsthal number of 3, we insert 2 elements (3 - 1), b3, b2.
For the Jacobsthal number of 5, we insert 2 elements (5 - 3), b5, b4.
For the Jacobsthal number of 11, we insert 6 elements (11 - 5), b11, b10, b9, b8, b7, b6.
I hope that you got the idea. And that you understand now that we can't always insert numbers this way. If there's not enough elemets to insert, for example, the Jacobsthal number is 11 (we should insert 6 elements), but we have only 3 elements in the pend, then we need to handle it somehow, and the simplest way to do this is to insert the pend elements in order, as we did in recursion level 3.

Ok, but now again, how exactly this order optimises the amount of comparisons for our binary insertion? Let's look at what happens if we insert pend elements in this specific order! Time for another visual example! Yay!

The elements are ordered by labels in these exapmples, but the actual order with actual numbers might be more mixed, so it's like this in the example for clarity.
As you can see, the order dictated by the Jacobsthal numbers is what allows us to use this optimisation in the algorithm: we can insert large amount pend elements into the main with binary search only on 2^(k+1)-1 elements. This works because due to how the algorithm works, we know that the bound for bx is ax. So when we insert, for example b5 instead of searching in {b1, a1, b2, a2, b3, a3, a4, a5, a6, a7, ...} we can search in {b1, a1, b2, a2, b3, a3, a4}, because we know that there is no point to include into the search range a5 and anything after that since they are guarenteed to be bigger. After we inserted b5 and it is time to insert b4, the bound is decreased by one (from a5 to a4), but the amount of elements is increased by one (we inserted b5), so the amount of elements in this range stayed the same.
Note: however, sometimes, as we will see soon, the Jacobsthal number insertions guarantee 2^(k+1)-1 in the worst case. In the best case the range of search may be even smaller, if inserted element is going to be bigger than any other number in the search range. Then, for the next number the search range will actually be lower. You will see this in the continuation of the execution.

Now that we understood what purpose the Jacobsthal numbers have here, I want to bring up that there is also an optimal way to calculate the n'th Jacobsthal number. Rounded result of (2^(n+1) + (-1)^n) / 3 will produce an offsetted for our use case (it normally starts from 0) Jacobsthal number.

Back to where we left

I will mark the current element for insertion from the pend with the blue border. The range of elements that we will try to insert this element into with the green border, and the "bound" element, ie corresponding a element, with the red border. "Bound" marks the end of the area of search.

The current Jacobsthal number is 3, which means that we start with b3 from the pend. The bound element for b3 is a3, and we should try to insert it into {b1, a1, a2}. Since there is no a3 element, because b3 is an odd numer, we compare it with the entire main.

In other words, we should compare 20, the biggest number in the element, to {16, 17, 21}, the biggest numbers in the elements of our search area, and then insert the whole element after that in it's corresponding place.

Since 20 is bigger than 16 or 17, we insert the entire element b2 (5 12 4 20) between a1 (2 11 0 17) and a2 (3 10 1 21).

The Jacobsthal number is still 3, and the next element for insertion is b2. Its bound element is a2: we already know that it's smaller that this element, so we try to insert it into {b1, a1, b3}.

So, at the moment, our sequence is this: 6 15 8 16 2 11 0 17 9 18 14 19 5 12 4 20 3 10 1 21 7 13. When we go down a recursion level, the main and the pend initialisation, as well as insertion, will go from this sequence.

We divide the resulting sequence into the elements again, and initialize our main and pend. The sequence has changed since the recursion level of 3, and the labels too: but the original pairs (like ((2 11) (0 17))) remained unbroken. So the guarantee that the (2, 11) (now b2) element is smaller than (0, 17) (now a2) is still valid.

So, we inserted our b3 into the main, but as you can see, it shrinks the search area for the b2, as this element turned out bigger than a2, so the area of search shrank. Remember the note from the "Why Jacobsthal numbers?" part? This is the example. It may happen sometimes, and it's another tricky thing of this algorithm that needs to be accounted for.

Oops! We run out of the Jacobsthal numbers. There is only one element left in the pend! What to do? We do almost the same thing as we do on the binary search: we insert the pend elements, but in the reverse order, meaning that we start from the end. We do it like that in order to keep the search area the same. Of course, you still have to respect the bound element: for example, if we insert b3 in this way, the area of search still doesn't include a3 and further. In this case, it's an odd element, so there is no corresponding bound element.

Ok, so now, I think, we are fully equipped to comprehend the rest of the algorithm, so let's proceed with its steps without any more yapping!

With this, the second level of the recursion was resolved. The resulting sequence after this level looks like this:

And with this, our sequence became sorted. This algorithm is not an easy one to understand and also to implement, but both swapping and insertion follow specific patterns, peculiarities of which you can calculate, if you do this algorithm on paper/in your favorite text editor again and again, which I encourage you to do, as doing is the only way to really cement theory.

Testing the algorithm implementation

This is a minimal comparisons sort, so your algorithm should perform no more than k comparisons for the input of n numbers. For example, for 21 numbers the maximal number of comparisons is 66, so if you run this algorithm 1000 times on 21 random numbers, it will never exceed 66 comparisons.

There is a mathematical way to calculate k given the n. The formula was described by Knuth in his book, which in turn was discovered by A. Hadian in 1969.

For people who aren't on friendly terms with math: this formula is actually very simple to implement in code. Math people just like terse scary symbols.
Using C++, F(n) can be written as:

#include <cmath>

int F(int n)
{
    int sum = 0;
    for (int k = 1; k <= n; ++k) {
        double value = (3.0 / 4.0) * k;
        sum += static_cast<int>(ceil(log2(value)));
    }
    return sum;
}

The next step after that is to implement in your code a comparison function, which goal is simply to increment some global/static variable each time its called. You use it every time you need to perform a comparison between numbers, including passing it to std::upper_bound or a similar function, if you are going to use that.

Run functions a lot of times for a given n, using different n's, and compare the biggest their worst in term of number of comparisons result to the expected maximal number of comparisons. If it exceeds this number, your implementation is not correct yet.

I also wrote a single-file testing script for benchmarking the algorithm. It requires your executable to print a sorted sequence and number of comparisons at the end of program.

This is an example of how I count the number of comparisons in my code:

//////////// ADD COMPARISON COUNTER ////////////
// PmergeMe.hpp
class PmergeMe
{
  public:
// ...
    static int nbr_of_comps;
// ...

template <typename T> bool _comp(T lv, T rv) {
    PmergeMe::nbr_of_comps++;
    return *lv < *rv;
}

// PmergeMe.cpp don't forget to initialise it!
int PmergeMe::nbr_of_comps = 0;

//////////// USE _comp FOR COMPARISONS ////////////
// PmergeMe.hpp
// ... at the step 1
        if (_comp(next_pair, this_pair))
        {
            _swap_pair(this_pair, pair_level);
        }
// ...

// ... at the step 3 whenever I search a place for a number to insert into
            typename std::vector<Iterator>::iterator idx =
                std::upper_bound(main.begin(), bound_it, *pend_it, _comp<Iterator>);
// ...

The code

I have implemented this algorithm in the context of doing the last C++ exercise of module 09, in the 42 school; and you may find my take on this algorithm here, in cpp09/ex02. There are many ways to do this algorithm, my way is optimised for vectors and benefits a lot from random access.

Last note

Thank you epolitze for proofreading the first version of the article, sponthus for proofreading the second edition, and eandre for helping me to update the diagrams.
This is a second version of an article, where I simplified some explanations (hopefully made them more clear) and fixed a mistake which was pointed out by bwerner.

Original description of the algorithm, when runs out of the Jacobsthal numbers, inserts elements from the pend in the reverse order, and in the previous version of the article I said that this doesn't make a difference according to my benchmarks; I was wrong! After writing the tester that rigorously tests ranges of numbers many-many times I found out that actually it matters whether it's in order or in reverse order. With this algorithm I had to run into unexplored waters, as there were no accessible articles or videos at the time of me doing the algorithm.

emuminov @emuminov