String Manipulation of URLs is an Anti-Pattern.
Mike Stemle

Mike Stemle @manchicken

About: I have been a software professional since I was in high school in 1998. I'm enthusiastic about open source, and I really enjoy working in unusual software systems or within strange constraints.

Location:
Yardley, PA
Joined:
Sep 25, 2020

String Manipulation of URLs is an Anti-Pattern.

Publish Date: Aug 31 '21
46 8

Quick note before we get started: this piece is Node-centric in its examples, but this anti-pattern is polyglottal. As with most anti-patterns, this isn't about the syntax, it is about the approach.

What's a URL, really?

A URL is a useful thing. It tells both humans and users where to find resources on the internet. There's a lot of information packed into a URL, from protocol designations to document anchors, and when we treat it like a string we're steering into danger.

A URL is a packed value. It contains an awful lot of data:

  • Protocol scheme
  • Host name
  • Port number
  • Path
  • File name
  • Search parameters (a.k.a. query string parameters)
  • Anchor (which can also be used for parameters)

The Problem with String Manipulation

Based on your specific needs, a URL may contain several reserved characters. Some of these characters include ?, #, =, &, %, :, , and /. This is not an exhaustive list. Having these characters in the wrong place within your URL can cause misunderstanding.

A good implementation should be flexible enough to deal with any reasonable inputs, and capable of failing predictably when inputs are not reasonable. Packed values, like a URL, need to be treated like packed values, and not handled using string manipulation.

You can see here how the q is seen as part of the URL, but restrict_sr is interpreted as another URL parameter parallel to url. While it may be tempting to simply use a function to URL-encode this, I would like to encourage you to reconsider. These URL encoding methods aren't great for all of the possible characters that you'd want to put in there, and they're likely to make a bunch of assumptions that aren't going to be true.

A Better Approach

Here you can see that encoding the URL didn't solve the problem. Let's try a different approach: let's use the URL API.

By using the URL API here, you can see that the URL which is being used as a parameter is safely tucked away, and you don't have to worry about it being confused.

Why does this matter?

The two primary problems caused by the anti-pattern of string manipulation of URLs are those of bugs, and of URL injection vulnerabilities.

Poorly-encoded URLs make it difficult for web servers and applications to understand the parameters coming to them. If they cannot reliably understand their inputs, there may be unexpected or unwanted behavior.

URLs which are constructed using predictable string manipulation also pose a very real risk of URL injection. URL injection can lead to SQL injection, NoSQL injection, cross-site scripting (XSS), and a whole host of other security holes.

Conclusion

A URL isn't a string. Much like the packed bit fields of yore, it is a packed value. Don't treat it like a string, treat it like a first-class object or structure. And never write your own URL parsers, every language has a good URL library that you can use.

Comments 8 total

  • Keff
    KeffAug 31, 2021

    I really liked this, I have done this many times. But I've recently started using the good approach, not because I knew it was a bad practice or antipattern, but because it makes my life a lot easier and the code cleaner IMO. The hard work is done for you!

    • Mike Stemle
      Mike StemleSep 1, 2021

      Yeah, there are a lot of areas of programming where the problems have already been solved for us, we need only take advantage of those already-existent solutions.

      Other problem domains which fit into this category (not an exhaustive list), IMO, are time zones and localization. I can't tell you how many times I've seen people try to roll their own solutions in those areas when there have been excellent (and portable) solutions in place for decades.

  • Steve Blundy
    Steve BlundySep 1, 2021

    This a very common problem. It’s good of you to call it out here. String URL manipulation is a subset of a broader anti-pattern called “string obsession” or “primitive obsession”. File system paths is another thing you do not want to be manipulating as a string. Any kind of structured or encoded data really

    • Mike Stemle
      Mike StemleSep 1, 2021

      That's a good point!

    • hidden_dude
      hidden_dudeSep 2, 2021

      I'd generalize this and say that whenever you have a structured string like:

      • urls
      • emails
      • phone numbers
      • SQL queries
      • Mailing Addresses
      • ???

      That you need to extract or inject info into (not just store or copy), then you need some sort of Builder pattern or Editor pattern to do so. You can use a third party library or build your own if it doesn't exist.

      But the advantage of treating it as a library is that it can evolve over time and your code isn't riddled with N half baked implementations.

      About 16 years ago I was presented with a situation in which we had a major application that was building SQL statements on the fly, and creating my own SQLBuilder really was able to make to code far more maintainable since so many parts of the code where editing SQL in different ways.

      SQL editing has since fallen out of favor, and ORMs basically provide that function now. But the principle remains.

  • coderdenver9
    coderdenver9Sep 1, 2021

    Instead of using a third party library, you could've used encodeURIComponent(bookmark_url), which behaves correctly.

    • Mike Stemle
      Mike StemleSep 1, 2021

      There are two reasons I used the third-party library:

      1. Most folks that I have encountered do use a library.
      2. Even if the encoding function does work in that use case, the anti-pattern remains.

      I was trying to keep this article short, but I could have also gone in to how so many times folks will try to do stuff like:

      some_url = `${first_url}${(source_url.indexOf('?') > -1) ? '&' : '?'}${param_list.join('&')}`
      
      Enter fullscreen mode Exit fullscreen mode

      Those are all really bad practices. The only real solution is to use URL libraries which treat a URL as the packed value that it is.

  • David Ongaro
    David OngaroMay 3, 2025

    URLs are defined as "a compact string representation for a resource available via the Internet" so they are literally strings by definition. That's what makes them so useful. They are not some kind of language specific data structure. Wrapping them in an object doesn't change but may obscure that.

    So certain kind of string manipulations are OK, they are designed for that by various RFCs. That being said, you should rely on library functions/methods when it comes to parameter parsing, constructing, escaping and some more obscure aspects of URL parsing and construction.

Add comment