Re: Some regular-expression performance hacking - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: Some regular-expression performance hacking
Date
Msg-id b16bccc0-3f98-47ad-81aa-699a3e00630d@www.fastmail.com
Whole thread Raw
In response to Re: Some regular-expression performance hacking  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Some regular-expression performance hacking  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Thu, Feb 18, 2021, at 19:53, Tom Lane wrote:
>(Having said that, I can't help noticing that a very large fraction
>of those usages look like, eg, "[\w\W]".  It seems to me that that's
>a very expensive and unwieldy way to spell ".".  Am I missing
>something about what that does in Javascript?)

This popular regex

    ^(?:\s*(<[\w\W]+>)[^>]*|#([\w-]+))$

is coming from jQuery:

// A simple way to check for HTML strings
// Prioritize #id over <tag> to avoid XSS via location.hash (#9521)
// Strict HTML recognition (#11290: must start with <)
// Shortcut simple #id case for speed
rquickExpr = /^(?:\s*(<[\w\W]+>)[^>]*|#([\w-]+))$/,


I think this is a non-POSIX hack to match any character, including newlines,
which are not included unless the "s" flag is set.

Javascript test:

"foo\nbar".match(/(.+)/)[1];
"foo"

"foo\nbar".match(/(.+)/s)[1];
"foo
bar"

"foo\nbar".match(/([\w\W]+)/)[1];
"foo
bar"

/Joel

pgsql-hackers by date:

Previous
From: Markus Wanner
Date:
Subject: [PATCH] Present all committed transaction to the output plugin
Next
From: Jan Wieck
Date:
Subject: Re: Extensibility of the PostgreSQL wire protocol