> On Aug 9, 2021, at 12:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Pushed, but while re-reading it before commit I noticed that there's
> some more fairly low-hanging fruit in regexp_replace(). As I had it
> in that patch, it never used REG_NOSUB because of the possibility
> that the replacement string uses "\N". However, we're already
> pre-checking the replacement string to see if it has backslashes
> at all, so while we're at it we can check for \N to discover if we
> actually need any subexpression match data or not. We do need to
> refactor a little to postpone calling pg_regcomp until after we
> know that, but I think that makes replace_text_regexp's API less
> ugly not more so.
>
> While I was at it, I changed the search-for-backslash loops to
> use memchr rather than handwritten looping. Their use of
> pg_mblen was pretty unnecessary given we only need to find
> backslashes, and we can assume the backend encoding is ASCII-safe.
>
> Using a bunch of random cases generated by your little perl
> script, I see maybe 10-15% speedup on test cases that don't
> use \N in the replacement string, while it's about a wash
> on cases that do. (If I'd been using a multibyte encoding,
> maybe the memchr change would have made a difference, but
> I didn't try that.)
I've been reviewing and testing this (let-regexp_replace-use-NOSUB.patch) since you sent it 4 hours ago, and I can't
seemto break it. There are pre-existing problems in the regex code, but this doesn't seem to add any new breakage.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company