Can you regex this? A 10MB+ regex file of the entire Indonesian internet blocklist!
Shift / Reinhart Previano K.

Shift / Reinhart Previano K. @reinhart1010

About: I'm software developer working to solve problems and create robots to solve problems and create robots to solve problems and create robots to... yeah I really love recursion...

Location:
Jakarta, Indonesia
Joined:
Sep 30, 2019

Can you regex this? A 10MB+ regex file of the entire Indonesian internet blocklist!

Publish Date: Jan 8 '23
0 0

A while ago we decided to publish our new gigantic regular expression files containing the whole Indonesian internet blocklist with 99.99% accuracy (as tested against the official one).

Well, why? We’re bored and just want to experiment with graphs. Back to the old days of C, oh wait, it's another Go program!

Our experiments inside a M1 MacBook Air shows that even Go’s default regexp library can’t handle this big that we had to switch to regexp2—and that still only works for the ~10MB regex-reversed.txt, not the larger regex.txt.

So now, we have a challenge for you: can your favorite regular expression library handle this, 14+ MB of pure regex? I’m personally interested with Intel’s Hyperscan engine, optimized for their x86 platform, of course, to see whether they can handle this big.

Because who knows, we accidentally made a regex performance benchmark tool. (#_ )

Comments 0 total

    Add comment