Fun With Linear Time: My Favorite Algorithm
Andrew Healey

Andrew Healey @healeycodes

About: 📚 🌵 🐕 Love blogging, open-source, and helping out. https://healeycodes.com

Location:
United Kingdom
Joined:
Jan 29, 2019

Fun With Linear Time: My Favorite Algorithm

Publish Date: Apr 30 '19
69 15

A Linear Time Majority Vote Algorithm

I love this algorithm because it's amazing and approachable. I first saw it on a LeetCode discuss thread, and it blew everyone away. Some people were, like, irate. 169. Majority Element. I'd love to hear about your favorite algos in the comments!

This problem, common on competitive coding sites, has a solution that was discovered in 1980 but went unpublished until 1991 because of its emphasis on Fortran and mechanical verification.

Problem: Given a list of votes with a majority (n/2 + 1), declare the leader. As in, the most frequently occurring vote. It is possible to get the result in linear time O(n) AND constant space O(1).

Examples:

[0, 1, 1, 0, 1, 0, 1, 1] => 1 is the majority element

['a', 'b', 'a', 'c', 'b', 'c', 'a', 'a'] => 'a' is the majority here

A naive solution might look like this. We'll use a Dictionary to keep track of all the votes as well as storing the highest number of votes we've seen.

def majority_vote(votes):
    leader = None
    max_votes = 0
    candidates = dict()

    for i in votes:
        # if seen before
        if i in candidates:
            # count their vote
            candidates[i] += 1
            # and check if they're leading
            if candidates[i] > max_votes:
                leader = i
                max_votes = candidates[i]
        else:
            candidates[i] = 1

    return leader
Enter fullscreen mode Exit fullscreen mode

The above accomplishes a correct solution in linear time O(n) using linear space O(n). We can do better. One pass, without counting every element.

MJRTY or A Fast Majority Vote Algorithm was discovered in the Computer Science Laboratory of SRI International in 1980 by Robert S. Boyer and J Strother Moore. They were assisting a colleague who was working on fault tolerance.

In their humorous paper, they imagined a convention center filled with voters, carrying placards boasting the name of their chosen candidate. Each voter representing an index of the list.

Suppose a floor fight ensues

They opined that the voters might knock each other out simultaneously, going only for the opposing team. After the mess, the voter/s left standing would represent the majority.

Here is a bloodless way the chairman can simulate the pairing phase.

Their algorithm improves on our naive solution by removing the data structure (the Dictionary). Converted here from Fortran to Python.

def majority_vote_improved(votes):
    # after one vote, we have a leader
    leader = votes[0]
    count = 1

    for i in range(1, len(votes)):
        # the lead may grow
        if votes[i] == leader:
            count += 1
        else:
            # or shrink
            count -= 1

        # and they may be replaced
        if count == 0:
            leader = votes[i]
            count = 1

    return leader
Enter fullscreen mode Exit fullscreen mode

a graph of this algorithm

Thus, the majority is found in linear time O(n) with constant space O(1).

Check out a step-by-step walkthrough on Moore's website. More on Wikipedia too.

The entire effort of specifying MJRTY and getting the 61 verification conditions proved required about 20 man hours. [...] about 55 minutes of computer time to prove the final list of 66 theorems.

MJRTY - A Fast Majority Vote Algorithm, with R.S. Boyer. In R.S. Boyer (ed.), Automated Reasoning: Essays in Honor of Woody Bledsoe, Automated Reasoning Series, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1991, pp. 105-117.


Join 150+ people signed up to my newsletter on programming and personal growth!

I tweet about tech @healeycodes.

Comments 15 total

  • Guillaume Martigny
    Guillaume MartignyApr 30, 2019

    The fact that the sequence A, A, A, B, B, C output C as the leader is a bit underwhelming.
    I find the (n/2 + 1) majority limitation a little harsh, but indeed it's an elegant solution to a very common problem.

    • Andrew Healey
      Andrew HealeyMay 1, 2019

      People will sometimes get stuck on that point and disregard the algorithm. I love that someone out there posted this as a puzzle which is solvable in linear time BUT has this perfect unique solution.

  • Xing Wang
    Xing WangMay 1, 2019

    The leader election algorithms, who knew that have computers making seemingly simple decisions in a distributed setting could be so hard and fascinating.

    en.wikipedia.org/wiki/Leader_election

    • Andrew Healey
      Andrew HealeyMay 1, 2019

      Wow! Really fascinating stuff. I’m interested in some of the code behind this.

  • Ahmed Musallam
    Ahmed MusallamMay 1, 2019

    This is awesome!!

    Could not help but rewrite it in javascript to help me understand it better. It's not the syntax, but me trying to wrap my head around it. here is what I have:

    function findLeader(votes) {
      // after one vote, we have a leader
      let leader = votes[0];
      let count = 1;
    
      for (i = 1; i < votes.length; i++) {
        let vote = votes[i]
        if (vote === leader) // the lead may grow
          count++;
        else // or shrink
          count--; 
    
        // and they may be replaced
        if (count == 0) {
          leader = vote;
          count = 1;
        }
      }
      return leader;
    }
    
    console.log(findLeader(['a', 'b', 'a', 'c', 'b', 'c', 'a', 'a'])); // "a"
    console.log(findLeader([0, 1, 1, 0, 1, 0, 1, 1])); // "1"
    
    
  • Subramanian 😎
    Subramanian 😎May 1, 2019

    Took me quite some time to comprehend. Good one though!

    But I have a couple genuine questions.

    Is this algorithm practically applicable? If we needed to implement a voting system, would we use this algorithm?

    And, will this behavior reflect in real life where we have huge number of elements?

    The fact that the sequence A, A, A, B, B, C output C as the leader is a bit underwhelming.
    I find the (n/2 + 1) majority limitation a little harsh, but indeed it's an elegant solution to a very common problem.

    • Andrew Healey
      Andrew HealeyMay 2, 2019

      It seems pretty rare that you'd have a large data set and know that there is a majority without knowing what that majority is. But if that was the case, then this would be the optimal algorithm.

      It was more important when older computers had different restrictions:

      Note that the algorithm fetches the elements of A in linear order. Thus, the algorithm can be used efficiently when the number of votes is so large that they must be read from magnetic tape.

  • Aditya
    AdityaMay 1, 2019

    Your naive solution needs a bit of updating. You never update the max_votes and as such such will only remain 0. Should be

    if candidates[i] > max_votes:
        leader = i
        max_votes = candidates[i]
    
    • Andrew Healey
      Andrew HealeyMay 2, 2019

      You’re correct! I must have typoed when I copied it over. Thanks, updated.

  • Valentin Baca
    Valentin BacaMay 1, 2019

    Your code is missing an important element:

    "Even when the input sequence has no majority, the algorithm will report one of the sequence elements as its result....This second pass is needed, as it is not possible for a sublinear-space algorithm to determine whether there exists a majority element in a single pass through the input."

    Otherwise, [A, B, C] => C

    • Andrew Healey
      Andrew HealeyMay 2, 2019

      Hi Valentin, a majority element cannot be confirmed with one-pass but if there is a majority element (as per the problem statement) it will be found 🔍

Add comment