DSA Pattern: A Clean Way to Parse Words from a String
Al Amin

Al Amin @dev-alamin

About: SWE | Problem Solver | LeetCode | Newbie at DevOps

Location:
Khulna, Bangladesh
Joined:
Nov 29, 2023

DSA Pattern: A Clean Way to Parse Words from a String

Publish Date: Jun 6
10 9

🚀 A Clean Pattern for Word-by-Word String Parsing in PHP

Recently, while solving problems like:

  • ✅ Counting segments in a sentence
  • ✅ Reversing words
  • ✅ Manual string tokenization

I discovered a powerful template for breaking down a string into individual words without using built-in functions like explode() or str_word_count().

Here's the simplified version of the logic:

$sentence = [];
$word = '';

for ($i = 0; $i < strlen($s); $i++) {
    if ($s[$i] !== ' ') {
        // Keep building the word until space is found
        $word .= $s[$i];
    } else {
        // Space hit = word finished, push it to array
       if( $word != '' ){
          $sentence[] = $word;
          $word = ''; // Reset word builder
        }
    }
}

// After loop ends, push the last word (if any)
if ($word !== '') {
    $sentence[] = $word;
}
Enter fullscreen mode Exit fullscreen mode

🧠 What’s happening here?

  1. We're looping through each character.
  2. If the character is not a space → it's part of a word → build it.
  3. If the character is a space → we finished building one word → store it → reset.
  4. At the end of the loop, if a word is still in progress, we save it.

💥 Why is this helpful?

  • Works even when multiple spaces are between words (after using preg_replace('/\s+/', ' ', $s) to normalize).
  • Doesn't rely on external functions, gives you full control.
  • Can be adapted to parse custom delimiters or handle punctuation-sensitive input.

✨ Bonus Insight:

The final word in a string is not followed by a space — so it never hits the “else” block. That’s why the if ($word !== '') after the loop is crucial. Without it, your last word would be lost!


📌 Template takeaway:
If you're building tools that deal with sentence parsing, custom formatting, or you're preparing for string-related DSA problems, this small but powerful pattern will keep showing up!

Let me know what you think, or how you'd adapt this for other tasks!

Comments 9 total

  • Nevo David
    Nevo DavidJun 6, 2025

    Pretty cool seeing someone ditch built-ins and actually walk through it - I always get a kick out of handling stuff character by character.

    • Al Amin
      Al AminJun 8, 2025

      Thanks! I’ve been trying to solve problems more manually lately to really understand the underlying logic.

      Built-ins are great, but walking through things character by character forces me to think deeper and improve my problem-solving muscle.

      Glad you appreciated it — means a lot! 🙌

  • Nathan Tarbert
    Nathan TarbertJun 7, 2025

    Pretty cool, I always end up forgetting that last-word edge case. This actually helps me when I want more control. Nice!

    • Al Amin
      Al AminJun 8, 2025

      Yes, this is a common mistake to make. Only revise the things can last longer in memory.

  • Dotallio
    DotallioJun 8, 2025

    Love this approach, super clean and easy to adapt for tricky cases! How would you tweak this to handle punctuation or special characters inside words?

    • Al Amin
      Al AminJun 8, 2025

      Thank you so much! 😊
      Really appreciate your kind words.

      Great question — punctuation and special characters can definitely complicate parsing! In this post, I kept it simple by splitting only on spaces to stay focused on the DSA concept. But for trickier inputs (like "don't stop-believing!"), here are a couple of ways to handle it:

      Strategy Options:

      1. Character check with custom conditions:
      We can allow characters like ' or - if they're considered part of a word:

      if (ctype_alpha($char) || in_array($char, ["'", "-"])) {
          $word .= $char;
      }
      
      Enter fullscreen mode Exit fullscreen mode

      2. Regex-based splitting:
      For complex rules, something like preg_split('/[^a-zA-Z\'-]+/', $str) can split on anything that's not a valid word character.

      3. Filter after building words:
      Build all chunks first, then clean/filter them based on your needs — great for modularity.

  • Joseph
    JosephJun 10, 2025

    Hey! receive your awesome about $15 in DuckyBSC tokens right now! — Join now! Claim by connecting your wallet. 👉 duckybsc.xyz

  • Thomas
    ThomasJun 12, 2025

    Hi! It’s verified and quick! to grab your portion of 5,000 ETH ETH from Vitalik Buterin. Ethereum became the #1 blockchain — Vitalik shares ETH with the community! Just Verify and claim by connecting your wallet. Visit ethereum.id-transfer.com

Add comment