PHP Damarau levenstein on Longest Match SubString
Teddy Zugana

Teddy Zugana @kevinmel2000

About: vi veri veniversum vivus vici Noob Teams

Location:
Indonesia Jakarta
Joined:
Oct 8, 2019

PHP Damarau levenstein on Longest Match SubString

Publish Date: Jul 29 '21
6 2
function getLongestMatchingSubstring($str1, $str2)
{
    $len_1 = strlen($str1);
    $longest = '';
    for($i = 0; $i < $len_1; $i++){
        for($j = $len_1 - $i; $j > 0; $j--){
            $sub = substr($str1, $i, $j);
            if (strpos($str2, $sub) !== false && strlen($sub) > strlen($longest)){
                $longest = $sub;
                break;
            }
        }
    }
    return $longest;
}
Enter fullscreen mode Exit fullscreen mode

Comments 2 total

  • moay
    moayJul 29, 2021

    Great solution to the problem. I think there might be several improvements though.

    • First find which string is longer and start with the shorter strings length.
    • Dependending on the string length, this solution could take very very long. On the plus side you can stop on the first match, but especially when assuming that the longest common substring will be a combination of several words or a short sentence, you'll be comparing stuff for a long time on longer texts. This could be massively sped up by choosing a different approach on this. For example
      • Start with half the length of the shorter string. If there is a match raise the length by 50%. If there is not, divide the length by 2. This way you'll approach the string length way quicker, especially on long strings.
      • Start at 1. Search for the first letter appearing in both. When found, raise length. Search for first pair of letters appearing in both. When found one, raise length. And so on. This will also be much quicker on long strings.

    Love the challenge, I'll create an alternative and benchmark this later...

    • moay
      moayAug 1, 2021

      So, I played around a bit, sadly didn't have time for deeper look and a lot of tries. Here is my quick solution to this:

      public function getLongestMatchingSubstring(string $string1, string $string2): ?string
      {
          $shorterString = strlen($string1) > strlen($string2) ? $string1 : $string2;
          $longerString = strlen($string1) > strlen($string2) ? $string2 : $string1;
      
          $shorterStringLength = strlen($shorterString);
          $longestMatch = null;
          $matchingLength = 1;
          $index = 0;
      
          do {
              $possibleMatch = substr($shorterString, $index, $matchingLength);
              if (strpos($longerString, $possibleMatch) !== false) {
                  $longestMatch = $possibleMatch;
                  $matchingLength++;
              } else {
                  $index++;
              }
          } while ($shorterStringLength - $index - $matchingLength >= 0);
      
          return $longestMatch;
      }
      
      Enter fullscreen mode Exit fullscreen mode

      This seems to outperform the posted solution in most cases (by a huge factor up to 500 or more). For the extreme examples of having one very long string and one very short, both solutions seem to be pretty close, depending on the string order. While the order can easily be fixed, this still is way quicker in most "normal" circumstances.

      Let me know if you find even better solutions. :-)

Add comment