Non-String Regular Expressions
NSRE (Non-String Regular Expressions) is a new spin at regular expressions It's really abstract, even compared to regular expressions as you know them but it's also pretty powerful for some uses.
Here's the twist: what if regular expressions could, instead of matching just character strings, match any sequence of anything?
from nsre import *
re = RegExp.from_ast(seq('hello, ') + (seq('foo') | seq('bar')))
assert re.match('hello, foo')
The main goal here is matching NLU grammars when there is several possible interpretations of a single word, however there is a lot of other things that you could do. You just need to understand what NSRE is and apply it to something.
Note — This is inspired by this article from Russ Cox which explains how Thompson NFA work, except that I…
I had a brief look at it, and whilst I'm predominantly a Java developer, I have done some Python scripting in the past so I can look at the code and feel comfortable with it.
I'll try and answer your questions:
Do I understand what this is?
I think so - I believe it's a way to search for a value in complicated objects, and those values and/or objects may or may not be strings.
It feels like it is making Regex more human readable.
Do I see applications for this?
Kind of - I saw right at the bottom that the performance for this was quoted as being "terrible" so it's not something I'd happy use in a production environment, but I can imagine it would be super excellent for searching for a piece of data in a complex JSON or XML object, and that would be super dandy if I say so myself - but only if the performance is decent.
Does the API look nice?
This is where me being a prominent Java developer will probably fail me. To me, it actually reminds me a lot of old-school Java, in that it's very verbose to express something simple. What I would imagine would be nice would be something like
re.on(datatype).match(expression)
and have a limited set of expressions be available for the datatype. But I expect that would be more lengthier to code and maintaining a codebase like that would be hell.But then again I'm not an expert in Python
What features would I want to see around that?
Mainly efficient, easy-to-understand regex formatting with various data types, like JSON, XML, CSS, perhaps even just a massive String which represents a text file.
What would I want before using this in production?
Mostly speed to be honest, and possibly support for the above data types? But that might be a stretch. The API is a nice to have but I rather not enforce that on any Python developer as someone who is interested in the code but isn't an expert on the language.
Honestly I think what you made is cool, even if is in another language that I don't use! 😂 Oh well. It's a good start and I think it definitely has promise!
Keep up the good work! 👍