Re: Another regexp performance improvement: skip useless paren-captures - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Another regexp performance improvement: skip useless paren-captures
Date
Msg-id 3730031.1628557874@sss.pgh.pa.us
Whole thread Raw
In response to Re: Another regexp performance improvement: skip useless paren-captures  (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses Re: Another regexp performance improvement: skip useless paren-captures  (Mark Dilger <mark.dilger@enterprisedb.com>)
Re: Another regexp performance improvement: skip useless paren-captures  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Mark Dilger <mark.dilger@enterprisedb.com> writes:
>> On Aug 9, 2021, at 4:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There is a potentially interesting definitional question:
>> what exactly ought this regexp do?
>>     ((.)){0}\2
>> Because the capturing paren sets are zero-quantified, they will
>> never be matched to any characters, so the backref can never
>> have any defined referent.

> Perl regular expressions are not POSIX, but if there is a principled reason POSIX should differ from perl on this, we
shouldbe clear what that is: 

>     if ('foo' =~ m/((.)(??{ die; })){0}(..)/)
>     {
>         print "captured 1 $1\n" if defined $1;
>         print "captured 2 $2\n" if defined $2;
>         print "captured 3 $3\n" if defined $3;
>         print "captured 4 $4\n" if defined $4;
>         print "match = $match\n" if defined $match;
>     }

Hm.  I'm not sure that this example proves anything about Perl's handling
of the situation, since you didn't use a backref.  I tried both

    if ('foo' =~ m/((.)){0}\1/)

    if ('foo' =~ m/((.)){0}\2/)

and while neither throws an error, they don't succeed either.
So AFAICS Perl is acting in the way I'm attributing to POSIX.
But maybe we should actually read POSIX ...

>> ... I guess Spencer did think about this to some extent -- he
>> just forgot about the possibility of nested parens.

> Ugg.  That means our code throws an error where perl does not, pretty
> well negating my point above.  If we're already throwing an error for
> this type of thing, I agree we should be consistent about it.  My
> personal preference would have been to do the same thing as perl, but it
> seems that ship has already sailed.

Removing an error case is usually an easier sell than adding one.
However, the fact that the simplest case (viz, '(.){0}\1') has always
thrown an error and nobody has complained in twenty-ish years suggests
that nobody much cares.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Small documentation improvement for ALTER SUBSCRIPTION
Next
From: Mark Dilger
Date:
Subject: Re: Another regexp performance improvement: skip useless paren-captures