Re: Another regexp performance improvement: skip useless paren-captures - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: Another regexp performance improvement: skip useless paren-captures
Date
Msg-id 08D3EB44-D38A-40A7-B297-B3E0A72771D2@enterprisedb.com
Whole thread Raw
In response to Re: Another regexp performance improvement: skip useless paren-captures  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Another regexp performance improvement: skip useless paren-captures  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers

> On Aug 9, 2021, at 12:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Pushed, but while re-reading it before commit I noticed that there's
> some more fairly low-hanging fruit in regexp_replace().  As I had it
> in that patch, it never used REG_NOSUB because of the possibility
> that the replacement string uses "\N".  However, we're already
> pre-checking the replacement string to see if it has backslashes
> at all, so while we're at it we can check for \N to discover if we
> actually need any subexpression match data or not.  We do need to
> refactor a little to postpone calling pg_regcomp until after we
> know that, but I think that makes replace_text_regexp's API less
> ugly not more so.
>
> While I was at it, I changed the search-for-backslash loops to
> use memchr rather than handwritten looping.  Their use of
> pg_mblen was pretty unnecessary given we only need to find
> backslashes, and we can assume the backend encoding is ASCII-safe.
>
> Using a bunch of random cases generated by your little perl
> script, I see maybe 10-15% speedup on test cases that don't
> use \N in the replacement string, while it's about a wash
> on cases that do.  (If I'd been using a multibyte encoding,
> maybe the memchr change would have made a difference, but
> I didn't try that.)

I've been reviewing and testing this (let-regexp_replace-use-NOSUB.patch) since you sent it 4 hours ago, and I can't
seemto break it.  There are pre-existing problems in the regex code, but this doesn't seem to add any new breakage. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Another regexp performance improvement: skip useless paren-captures
Next
From: "Bossart, Nathan"
Date:
Subject: Re: Estimating HugePages Requirements?