C# Search by multiple strings
Karen Payne

Karen Payne @karenpayneoregon

About: Microsoft MVP, Microsoft TechNet author, Code magazine author, developer advocate. Have a passion for driving race cars.

Location:
Oregon, USA
Joined:
Jan 1, 2023

C# Search by multiple strings

Publish Date: Nov 17 '24
45 8

Introduction

Usually when there is a need to determine if a string has multiple tokens/words a developer uses code like the following.

public static class Extensions
{
    public static bool Search(this string line) =>
        line.IndexOf("hello", StringComparison.OrdinalIgnoreCase) > 1 && 
        line.IndexOf("world", StringComparison.OrdinalIgnoreCase) > 1;
}
Enter fullscreen mode Exit fullscreen mode

Starting with .NET Core 8, Microsoft provides

System.Buffers.SearchValues<T> Class

Learn about SearchValues.

Which to use IndexOf or SearchValues?

SearchValues is a powerful structure that improves the efficiency of search operations. Providing a dedicated and optimized method for lookups, helps you write more performant and cleaner code, especially in scenarios where checking for multiple values is frequent.

SearchValues is not a replacement for IndexOf or IndexOfAny, SearchValues over larger strings which means for smaller strings a developer can use IndexOf && IndexOf etc.

Examples for SearchValues

Does text contain spam?

The text can come from any source, in this case to keep things simple, a text file.

TextBanned.txt

Hello Karen, I am writing to inform you that your account is now active.
This is not a spam message. Please click the link below
Enter fullscreen mode Exit fullscreen mode

In a project a json file is used for watched tokens/words.

bannedwords.json

[
  {
    "Id": "1",
    "Name": "spam"
  },
  {
    "Id": "2",
    "Name": "advertisement"
  },
  {
    "Id": "3",
    "Name": "clickbait"
  }
]
Enter fullscreen mode Exit fullscreen mode

The following model is used to deserialize the file above.

public class BannedWord
{
    public string Id { get; set; }
    public string Name { get; set; }
}
Enter fullscreen mode Exit fullscreen mode

Next, create a language extension method for SearchValues.

public static class GenericExtensions
{
    /// <summary>
    /// Determines whether the specified text contains any of the banned words.
    /// </summary>
    /// <param name="text">The text to be checked for banned words.</param>
    /// <param name="bannedWords">An array of banned words to search for within the text.</param>
    /// <returns>
    /// <c>true</c> if the text contains any of the banned words; otherwise, <c>false</c>.
    /// </returns>
    public static bool HasBannedWords(this string text, params string[] bannedWords) => 
        text.AsSpan().ContainsAny(SearchValues.Create(bannedWords, StringComparison.OrdinalIgnoreCase));
}
Enter fullscreen mode Exit fullscreen mode

Note
The above extension method is case-insensitive, either logic and a bool passed in to determine if the search is case-insensitive or not or create an overloaded method for matching case.

The following code first reads words/tokens to search for by deserializing bannedwords.json followed by reading the file TestBanneded.txt which is the file to scan for spam.

Note the foreach statement uses Enumerable.Index which is in .NET Core 9 which allows deconstruction to the current index (zero based) and the item, where item is a line for the variable sentences.

Debug.WriteLine is used below as the source code was done in a Windows Forms project where Console.WriteLine does not work.

code to search for spam using SearchValues

Find errors/warning in Visual Studio log file

When Visual Studio encounters errors they can be written to a log file by starting Visual Studio with the following command.

devenv.exe /log

Open the ActivityLog.xml by clicking on the file usually has thousands of lines and can be tedious to find errors/warnings.

Small look at ActivityLog.xml.

ActivityLog.xml small view

The following extension methods, first and second are using SearchValues were for the following code sample the second will be used as we are only interested in errors and warnings. The first extension method would be used for general purpose searches. The last extension method is the conventional approach which is less flexible.

public static class Extensions
{
    /// <summary>
    /// Searches the specified string for any of the provided tokens case-insensitive.
    /// </summary>
    /// <param name="sender">The string to search within.</param>
    /// <param name="tokens">An array of tokens to search for within the string.</param>
    /// <returns>
    /// <c>true</c> if any of the tokens are found within the string; otherwise, <c>false</c>.
    /// </returns>
    public static bool Search(this string sender, string[] tokens) 
        => sender.AsSpan().ContainsAny(
            SearchValues.Create(tokens, 
                StringComparison.OrdinalIgnoreCase));

    /// <summary>
    /// Determines whether the specified line contains a warning or error.
    /// </summary>
    /// <param name="line">The line of text to be checked for warnings or errors.</param>
    /// <returns>
    /// <c>true</c> if the line contains a warning or error; otherwise, <c>false</c>.
    /// </returns>
    public static bool LineHasWarningOrError(this string line)
    {
        ReadOnlySpan<string> tokens = ["<type>Error</type>", "<type>Warning</type>"];
        return line.AsSpan().ContainsAny(SearchValues.Create(tokens, StringComparison.OrdinalIgnoreCase));
    }

    /// <summary>
    /// Determines whether the specified line contains a warning or error using conventional string comparison.
    /// </summary>
    /// <param name="line">The line of text to be checked for warnings or errors.</param>
    /// <returns>
    /// <c>true</c> if the line contains a warning or error; otherwise, <c>false</c>.
    /// </returns>
    public static bool LineHasWarningOrErrorConventional(this string line) =>
        line.IndexOf("<type>Error</type>", StringComparison.OrdinalIgnoreCase) > 1 && 
        line.IndexOf("<type>Warning</type>", StringComparison.OrdinalIgnoreCase) > 1;
}
Enter fullscreen mode Exit fullscreen mode

Executing code (full source is provided).

  • First determine if the activity file exists, if so read it.
  • Display the path and file name along with line count
  • Iterate each line searching for errors and warnings.

full executing code

Extra

Finding the activity log is not easy and that there may be multiples. To assist with finding the right activity log the provided source code has a class dedicated to working with the activity file which includes providing the path to the activity file which can be helpful for developers who want to examine older activity files.

Source code

Both point to two different GitHub repositories. For the Spam Source code check out new NET Core 9 features.

Spam Source code Activity log Source code

Summary

SearchValues provides a new method to search for words/tokens in a string which is better performing than IndexOf for larger strings and that SearchValues is more flexible than IndexOf.

Comments 8 total

  • Peter Truchly
    Peter TruchlyNov 18, 2024

    How does it compare to System.Text.RegularExpressions, especially Compiled regular expression? (Performance scaling with 10, 100, ... search values and input text size.)

    • jshergal
      jshergalNov 19, 2024

      While I can't speak for the compiled regex, if you look at the code produced by using the regex source generator, especially in the case of word matches (i.e. spam|advertisement|clickbait), it will typically use SearchValues in the generated code.

      Based on that, I would expect the performance to be similar and to scale in a similar fashion.

      • Peter Truchly
        Peter TruchlyNov 19, 2024

        Tried this out of curiosity (.net 9) with:

        [GeneratedRegex(@" ...")
        private static partial Regex CompiledRegex();
        //by doing 1M times
        for (int i = 0; i < repetitions; i++) { testRegex.IsMatch(testText); }
        
        Enter fullscreen mode Exit fullscreen mode

        and compared to SearchValues it is usually way slower. Worst case for regex is when the input does not contain any of the searched sequences.
        In some special cases, especially when the sequence is found at the beginning the compiled regex was quicker, but in other cases it was 2x - 20x slower.

        • jshergal
          jshergalNov 19, 2024

          I think it will be very dependent on the data set and what the regex is. I ran the following benchmark using Bogus to generate a large chunk of Lorem text and then tacking the word "Tuesday" at the beginning, at the end, and then not at all (a misspelled version however to keep the text basically identical in length). What I found is that, with the exception of the text where "Tuesday" was the first word, the performance was pretty similar between both:

          Method searchString Mean Error StdDev
          FindStringWithRegex Con(...)day [20809] 1,670.42 ms 8.770 ms 7.324 ms
          FindStringWithSearchValues Con(...)day [20809] 1,656.58 ms 7.650 ms 7.156 ms
          FindStringWithRegex Not(...)sya [20813] 1,631.96 ms 2.385 ms 1.992 ms
          FindStringWithSearchValues Not(...)sya [20813] 1,613.41 ms 2.734 ms 2.283 ms
          FindStringWithRegex Tue(...)ins [20810] 43.12 ms 0.163 ms 0.136 ms
          FindStringWithSearchValues Tue(...)ins [20810] 26.43 ms 0.493 ms 0.462 ms

          For reference, here is the code:

          [SimpleJob]
          public partial class FindTextBenchmark
          {
              private const int Iterations = 1_000_000;
          
              [GeneratedRegex(@"Monday|Tuesday|Wednesday", RegexOptions.IgnoreCase)]
              private static partial Regex MyReg();
          
              private static SearchValues<string> MySearchValues =
                  SearchValues.Create(["Monday", "Tuesday", "Wednesday"], StringComparison.OrdinalIgnoreCase);
          
              private static readonly Lorem Data = new()
              {
                  Random = new Randomizer(42)
              };
          
              private static readonly string BaseText = Data.Paragraphs(100);
          
              public IEnumerable<object> ArgumentStrings()
              {
                  yield return "Contains" + BaseText + " Tuesday";
                  yield return "Tuesday " + BaseText + " Contains";
                  yield return "NotContains " + BaseText + " Teudsya";
              }
          
              [Benchmark]
              [ArgumentsSource(nameof(ArgumentStrings))]
              public int FindStringWithRegex(ReadOnlySpan<char> searchString)
              {
                  int foundCount = 0;
                  for (int i = 0; i < Iterations; ++i)
                  {
                      if (MyReg().IsMatch(searchString))
                          foundCount++;
                  }
          
                  return foundCount;
              }
          
              [Benchmark]
              [ArgumentsSource(nameof(ArgumentStrings))]
              public int FindStringWithSearchValues(ReadOnlySpan<char> searchString)
              {
                  int foundCount = 0;
                  for (int i = 0; i < Iterations; ++i)
                  {
                      if (searchString.ContainsAny(MySearchValues))
                          foundCount++;
                  }
          
                  return foundCount;
              }
          }
          
          Enter fullscreen mode Exit fullscreen mode
          • Peter Truchly
            Peter TruchlyNov 20, 2024

            I just had to try and compare this agan. I used shorter input text (~3300 chars) and 24 searched words. (Your input is tested by methods ending '2'.) I used the same approach by placing one of the search words early or late into the sequence (or not at all - no match) in methods suffixed '1'.

            • It seems that SearchValues are scaling better with more search keywords.
            BenchmarkDotNet v0.14.0, Windows 10
            AMD Ryzen 9 9950X, 1 CPU, 16 logical and 16 physical cores
            .NET SDK 9.0.100
            [Host] .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
            
            | Method              | N       | text                 | Mean            |
            |-------------------- |-------- |--------------------- |----------------:|
            | UsingSearchValues2  | 1000    | Con(...)day [20809]  |       509.02 us |
            | UsingCompiledRegex2 | 1000    | Con(...)day [20809]  |       527.17 us |
            | UsingSearchValues2  | 1000000 | Con(...)day [20809]  |   508,156.44 us |
            | UsingCompiledRegex2 | 1000000 | Con(...)day [20809]  |   536,759.98 us |
            | UsingSearchValues2  | 1000    | Not(...)sya [20813]  |       495.79 us |
            | UsingCompiledRegex2 | 1000    | Not(...)sya [20813]  |       507.24 us |
            | UsingSearchValues2  | 1000000 | Not(...)sya [20813]  |   492,306.42 us |
            | UsingCompiledRegex2 | 1000000 | Not(...)sya [20813]  |   510,228.88 us |
            | UsingSearchValues2  | 1000    | Tue(...)ins [20810]  |        20.20 us |
            | UsingCompiledRegex2 | 1000    | Tue(...)ins [20810]  |        28.29 us |
            | UsingSearchValues2  | 1000000 | Tue(...)ins [20810]  |    15,613.21 us |
            | UsingCompiledRegex2 | 1000000 | Tue(...)ins [20810]  |    27,636.95 us |
            | UsingSearchValues1  | 1000    | Earl(...)ien. [3243] |        60.78 us |
            | UsingCompiledRegex1 | 1000    | Earl(...)ien. [3243] |       197.24 us |
            | UsingSearchValues1  | 1000000 | Earl(...)ien. [3243] |    59,129.08 us |
            | UsingCompiledRegex1 | 1000000 | Earl(...)ien. [3243] |   197,564.64 us |
            | UsingSearchValues1  | 1000    | Late(...)ien. [3242] |     1,588.84 us |
            | UsingCompiledRegex1 | 1000    | Late(...)ien. [3242] |     7,135.61 us |
            | UsingSearchValues1  | 1000000 | Late(...)ien. [3242] | 1,212,064.31 us |
            | UsingCompiledRegex1 | 1000000 | Late(...)ien. [3242] | 7,153,609.21 us |
            | UsingSearchValues1  | 1000    | NoMa(...)ien. [3234] |     1,463.93 us |
            | UsingCompiledRegex1 | 1000    | NoMa(...)ien. [3234] |     7,150.13 us |
            | UsingSearchValues1  | 1000000 | NoMa(...)ien. [3234] | 1,682,654.02 us |
            | UsingCompiledRegex1 | 1000000 | NoMa(...)ien. [3234] | 7,153,832.79 us |
            
            Enter fullscreen mode Exit fullscreen mode
            • jshergal
              jshergalNov 20, 2024

              Interesting, and good to know. Thanks for running some more tests 😃

  • jshergal
    jshergalNov 19, 2024

    One thing that is important to note with SearchValues is that it is a bit expensive to create and so recommended usage is to create it once and reuse it.

    The code presented here is creating a new instance of SearchValues on each call. It is understandable for the example since we are allowing for custom values each time, but it should be pointed out that it is best practice to cache the instance and reuse it.

    For instance, this example:

        public static bool LineHasWarningOrError(this string line)
        {
            ReadOnlySpan<string> tokens = ["<type>Error</type>", "<type>Warning</type>"];
            return line.AsSpan().ContainsAny(SearchValues.Create(tokens, StringComparison.OrdinalIgnoreCase));
        }
    
    Enter fullscreen mode Exit fullscreen mode

    would be better written as:

        private static readonly SearchValues<string> WarningsOrErrorSearch = SearchValues.Create(["<type>Error</type>", "<type>Warning</type>"], StringComparison.OrdinalIgnoreCase);
    
        public static bool LineHasWarningOrError(this string line)
        {
            return line.AsSpan().ContainsAny(WarningsOrErrorSearch);
        }
    
    Enter fullscreen mode Exit fullscreen mode
  • Edward David
    Edward DavidNov 23, 2024

    When Trust Becomes Deception: My Bitcoin Investment Saga and Resurrection Dealing with Digital Tech Guard Recovery

    What started as a promising investment opportunity quickly turned into a nightmare of deception and broken trust. I had heard the hype about Bitcoin and the potential for massive returns, so I decided to take the plunge and invest a significant portion of my savings. Lured in by promises of easy riches and assured of the security of my funds, I entrusted my hard-earned money to an online platform that claimed to be a reputable Bitcoin exchange. Little did I know that I was about to embark on a harrowing journey filled with betrayal, loss, and a desperate search for a way to recover my stolen assets.

    As I logged in to check on my investment, I was met with the devastating realization that my account had been cleaned out, the balance reduced to zero. Panic and disbelief set in as I grappled with the harsh truth - the platform I had trusted had been nothing more than a sophisticated scam, designed to fleece unsuspecting investors like myself. With my funds vanished and no clear path forward, I felt utterly powerless and betrayed, my dreams of financial security shattered in an instant. The journey that followed was a rollercoaster of emotions, filled with uncertainty, frustration, and glimmers of hope when I was directed to Digital Tech Guard Recovery. The Digital's methods were shrouded in mystery, and progress seemed painfully slow, but I clung to the belief that they would ultimately succeed. After what felt like an eternity of waiting and uncertainty, the Digital finally announced a breakthrough – they had managed to trace the stolen Bitcoin and were in the process of recovering it, piece by painstaking piece. As the final pieces of the puzzle fell into place, I found myself on the precipice of a remarkable resurrection, my lost investment slowly but surely being restored to me. The sense of relief and gratitude I felt was palpable, a testament to the power of perseverance and the unwavering determination to right the wrongs that had been done. This harrowing ordeal had taught me a valuable lesson about the importance of trust and the consequences of placing it in the wrong hands, but it had also shown me the remarkable resilience of the human spirit in the face of adversity. You can reach out to Digital Tech Guard Recovery through email: digitaltechguard.com Telegram: digitaltechguardrecovery.com website link :: https : // d i g i t a l t e c h g u a r d . c o m  WhatsApp +1 (443) 859 - 2886  

Add comment