tag:blogger.com,1999:blog-6133409975588434312.post851273909503024524..comments2023-10-16T11:16:53.502-07:00Comments on <a href="/">KAE Scripts</a>: Negation in regular expressionsPauanhttp://www.blogger.com/profile/09141136258826180195noreply@blogger.comBlogger7125tag:blogger.com,1999:blog-6133409975588434312.post-49932381858831799122009-09-07T04:21:37.373-07:002009-09-07T04:21:37.373-07:00Regular expressions in general tend to be short an...Regular expressions in general tend to be short and confusing, so I'm not terribly surprised. The regular expression I was using has worked flawlessly for some time now, so I never thought to question it's validity. Indeed it only causes problems in very weird situations with uneven backslashes. Since I always make sure to properly backslash my code, the problem has simply not come up. That is probably the reason it has not been investigated further.<br /><br />On the plus side, I can plug the new regular expression into BRUSH, and it will automagically filter down, so I can benefit from it with almost no effort whatsoever.Pauanhttps://www.blogger.com/profile/09141136258826180195noreply@blogger.comtag:blogger.com,1999:blog-6133409975588434312.post-76345782705373908402009-09-07T04:10:39.189-07:002009-09-07T04:10:39.189-07:00"\
" Oh yukh (I never knew that!). Stil..."\<br />" Oh yukh (I never knew that!). Still, it's good to plan ahead.<br /><br />When I said you discovered something new, I meant that a short regular expression to match string literals doesn't seem to be well known to the world, not just you and me; try googling for one, for example. And you'd think, what with all these syntax highlighters, that it would be a fairly well investigated problem.David Joneshttp://drj11.wordpress.com/noreply@blogger.comtag:blogger.com,1999:blog-6133409975588434312.post-21787524825433301962009-09-07T03:58:07.533-07:002009-09-07T03:58:07.533-07:00Ah, I spoke a bit too soon. Your regular expressio...Ah, I spoke a bit too soon. Your regular expression matches one less character than mine. I will need to test it to verify that it works as expected.Pauanhttps://www.blogger.com/profile/09141136258826180195noreply@blogger.comtag:blogger.com,1999:blog-6133409975588434312.post-51698053044925667392009-09-07T03:52:25.900-07:002009-09-07T03:52:25.900-07:00Technically JavaScript does not allow for "\
...Technically JavaScript does not allow for "\<br />", but all the major browsers support it, and it will be supported (officially) in ECMAScript 5.<br /><br />Indeed, I did learn some new things, notably about how the regular expression engine handles negative look-ahead, and also some interesting insights into backslash escaping.<br /><br />Thank you for your input.<br /><br />P.S. Your regular expression also matches "\\"". It produces the same output, as near as I can see.<br /><br />P.P.S. Back references, as far as I know, can occur anywhere except inside character classes. So.. [\1] or [^\1] won't work, but it will work anywhere else.Pauanhttps://www.blogger.com/profile/09141136258826180195noreply@blogger.comtag:blogger.com,1999:blog-6133409975588434312.post-26930154847132704692009-09-07T03:35:13.048-07:002009-09-07T03:35:13.048-07:00Cute. You can put back references in the negative...Cute. You can put back references in the negative assertion (I never knew that), so your idea of using (?!\1) is sound.<br /><br />However, your RE needs some tidying up to handle backslashes correctly. For example your RE matches the whole of:<br /><br />"\\""<br /><br />but that is not a valid string literal. It's a valid string literal followed by a lone double quote. Might not matter depending on how your tokenizer works.<br /><br />See this:<br /><br />http://wordaligned.org/articles/string-literals-and-regular-expressions<br /><br />Also, what language's string literal syntax are you trying to match? As far as I know, JavaScript does not permit newlines inside a string literal, not even if you backslash escape them, so "\<br />" is not a valid literal, you always have to use \n if you want newlines. So I don't know what the \\\n stuff is trying to achieve.<br /><br />Soo... I think someone like this will do:<br /><br />/(["'])(?:\\.|(?!\1)[^\\])*\1/<br /><br />I think you have discovered something new (namely, a reasonable RE to match string literals).David Joneshttp://drj11.wordpress.com/noreply@blogger.comtag:blogger.com,1999:blog-6133409975588434312.post-32371548907302245312009-09-07T02:31:01.545-07:002009-09-07T02:31:01.545-07:00You are quite right; that regular expression does ...You are quite right; that regular expression does solve the problem. I knew about the negative look-ahead, but did not think about using it in that way.<br /><br />I'm curious about how well that would work in a more complicated expression. Sure, with the contrived example it was simple enough to solve. How about this, however:<br /><br />I wish to devise a regular expression that will match either single OR double-quoted strings. It must only match valid strings, and must allow for escapement. This regular expression works, but is overly verbose and complicated:<br /><br />/"(?:\\"|\\\n|[^"\n])*"|'(?:\\'|\\\n|[^'\n])*'/g<br /><br />It would be great if I could write it like this:<br /><br />/(["'])(?:\\\1|\\\n|[^\1\n])*\1/g<br /><br />But that doesn't work. Using the syntax I proposed, it would be possible:<br /><br />/(["'])(?:\\\1|\\\n|[^\n](?^\1))*\1/g<br /><br />With a quick cursory test, it seems the following also works:<br /><br />/(["'])(?:\\\1|\\\n|(?!\1)[^\n])*\1/g<br /><br />I haven't put it under a great amount of testing, but it seems to work okay. If this construct does indeed do what I desire, then you are correct that new syntax is not required.<br /><br />What I'm curious about is if there is any situation where my proposed syntax would work, but the negative look-ahead would not.Pauanhttps://www.blogger.com/profile/09141136258826180195noreply@blogger.comtag:blogger.com,1999:blog-6133409975588434312.post-73313877559118066082009-09-07T02:08:17.278-07:002009-09-07T02:08:17.278-07:00Or we could learn more regular expression syntax:
...Or we could learn more regular expression syntax:<br /><br />/^(?!foo\b)\w+/<br /><br />«(?!» introduces a "negative assertion". Basically it says "without advancing the match cursor, attempt to match the thing in brackets here; if it matches, then fail".<br /><br />See ECMA-262 section 15.10.2.8 (no, don't really, the regular expression part of the JavaScript spec, is the _worst_ place to find out what regular expressions mean).<br /><br />It was inherited from Perl 5 via PCRE.<br /><br />(and I only know about this because we made a mug with a regular expression cheat sheet on it)David Joneshttp://drj11.wordpress.com/noreply@blogger.com