perl: unsafe empty pattern behavior - Mailing list pgsql-hackers

From Jeff Davis
Subject perl: unsafe empty pattern behavior
Date
Msg-id 4a1db3c8ea39156dbe4a4f9e166ca9453e05daaa.camel@j-davis.com
Whole thread Raw
Responses Re: perl: unsafe empty pattern behavior
List pgsql-hackers
Moved from discussion on -committers:

https://postgr.es/m/0ef325fa06e7a1605c4e119c4ecb637c67e5fb4e.camel@j-davis.com

Summary:

Do not use perl empty patterns like // or qr// or s//.../, the behavior
is too surprising for perl non-experts. There are a few such uses in
our tests; patch attached. Unfortunately, there is no obvious way to
automatically detect them so I am just relying on grep. I'm sure there
are others here who know more about perl than I do, so
suggestions/corrections are welcome.

Long version:

Some may know this already, but we just discovered the dangers of using
empty patterns in perl:

"If the PATTERN evaluates to the empty string, the last successfully
matched regular expression is used instead... If no match has
previously succeeded, this will (silently) act instead as a genuine
empty pattern (which will always match)."

https://perldoc.perl.org/perlop#The-empty-pattern-//

In other words, if you have code like:

   if ('xyz' =~ //)
   {
       print "'xyz' matches //\n";
   }

The match will succeed and print, because there's no previous pattern,
so // is a "genuine" empty pattern, which is treated like /.*/ (I
think?). Then, if you add some other code before it:

   if ('abc' =~ /abc/)
   {
       print "'abc' matches /abc/\n";
   }

   if ('xyz' =~ //)
   {
       print "'xyz' matches //\n";
   }

The first match will succeed, but the second match will fail, because
// is treated like /abc/.

On reflection, that does seem very perl-like. But it can cause
surprising action-at-a-distance if not used carefully, especially for
those who aren't experts in perl. It's much safer to just not use the
empty pattern.

If you use qr// instead:

https://perldoc.perl.org/perlop#qr/STRING/msixpodualn

like:

   if ('abc' =~ qr/abc/)
   {
       print "'abc' matches /abc/\n";
   }

   if ('xyz' =~ qr//)
   {
       print "'xyz' matches //\n";
   }

Then the second match may succeed or may fail, and it's not clear from
the documentation what precise circumstances matter. It seems to fail
on older versions of perl (like 5.16.3) and succeed on newer versions
(5.38.2). However, it may also depend on when the qr// is [re]compiled,
or regex flags, or locale, or may just be undefined.

Regards,
    Jeff Davis



Attachment

pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Next
From: Jelte Fennema-Nio
Date:
Subject: Re: UUID v7