Re: BUG #5334: Version 2.22 of Perl Safe module breaks UTF8 PostgreSQL 8.4 - Mailing list pgsql-bugs

From Alex Hunsaker
Subject Re: BUG #5334: Version 2.22 of Perl Safe module breaks UTF8 PostgreSQL 8.4
Date
Msg-id 34d269d41002190818t3df89d49h15e056d3c95f310@mail.gmail.com
Whole thread Raw
In response to Re: BUG #5334: Version 2.22 of Perl Safe module breaks UTF8 PostgreSQL 8.4  (Tim Bunce <Tim.Bunce@pobox.com>)
Responses Re: BUG #5334: Version 2.22 of Perl Safe module breaks UTF8 PostgreSQL 8.4  (Alex Hunsaker <badalex@gmail.com>)
List pgsql-bugs
On Fri, Feb 19, 2010 at 02:30, Tim Bunce <Tim.Bunce@pobox.com> wrote:
> On Thu, Feb 18, 2010 at 11:32:38AM -0700, Alex Hunsaker wrote:
> > On Thu, Feb 18, 2010 at 11:09, Tim Bunce <Tim.Bunce@pobox.com> wrote:
> >    *PLPerl::utf8::SWASHNEW =3D \&utf8::SWASHNEW;
> >
> > Hrm... It seems to work for me in HEAD and AFAICS we dont have that
> > line.  Did I just miss it?  Or did you happen to fix it in another way
> > with your refactoring?

> To be honest I'm not sure. I plan to look into that today.

My hunch is it has to do with the require strict;  require feature;
That's the only major difference I see (other than the require_op and
it being in its own package/file).

>> I did a few quick tests but it failed miserably for me... =C2=A0Im also =
not
>> fond of adding yet another closure. :)
>
> No amount of closure wrapping will fix the problem.

Yeah, brain fart... That's essentially what Safe.pm does now (and why
there is a problem :) )

>> Makes me think we might just be able to share some of utf8 package in th=
e safe?
>
> I tried. The perl utf8.c code does a method lookup of SWASHNEW to decide
> if the utf8 module has been loaded. So if SWASHNEW is shared _before_
> utf8 is loaded *and used* then the method lookup works (it finds the
> shared stub) and the utf8 module never gets loaded.

Hrm...  That seems wrong to me. Let me see If I can explain why.  The
below is what you seem to be saying:

package utf8;
sub import {  # or maybe this is a BEGIN
  return if(\&{'utf8::SWASHNEW'}; # already loaded
  # ok not loaded open the Unicode database and do junk which will
'trap' in safe
  do 'utf8_heavy.pl';
}

So if we define SWASHNEW without loading the unicode database how will
utf8/unicode work exactly?  I guess as long as it gets loaded at some
point it works.  So for postgres because we do the utf8 fix after
Safe->new and at that point we cant have any 'bad' strings, it will
work. (with your hack).  Sound right?

It seems to me a more correct fix would be to require utf8; inside of
the safe like we do strict.  Sorry thats a bit handwavy.  You have
obviously spent more time then me looking into this...

Im thinking (in pseudo code):

#define SAFE_OK
....
sub ::mksafefunc {
   permit->(qw(caller require));
   reval->('require utf8; 1;');
   deny->(qw(caller require));
...
}
sub ::mk_strict_safefunc {
    ...reval->('use strict; require utf8;)

}

static void
plperl_safe_init
{
    if (GetDatabaseEncoding() =3D=3D PG_UTF8)
   {
        eval_pv("my $a=3Dpack('U',0xC4); $a =3D~ /\\xE4\\d/i;", FALSE);
   }

   eval_pv(SAFE_MODULE, FALSE);
   eval_pv(SAFE_OK, FALSE)
}

One thing that stinks is while we might not do the utf8fix if we are
not PG_UTF8 we would always require utf8;.  And I dont see an easy way
around that in 8.4 :(  Also note that is all entirely untested :(  If
you think its sane (and it might not be) Im happy to work up a patch.
Id favor this approach as if you have utf8 strings the likely hood
that you want ::upgrade, ::downgrade, ::encode, ::valid or ::is_utf8
is fairly high.  Then again, no one has complained thus far...  Maybe
thats just me :)

Thoughts?

Anywhoo I cant reproduce this outside of postgres.  Maybe you can give
me a pointer?

use Safe();
binmode(STDOUT, ':utf8');
print $Safe::VERSION . "\n";
my $safe =3D Safe->new('t');
$safe->permit('print');
$safe->reval('sub { print "\x{263a}\n"; }')->();
print $@ ."\n" if($@);
-----
2.22
=E2=98=BA

pgsql-bugs by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: [Tigerlead] BUG #5334: Version 2.22 of Perl Safe module breaks UTF8 PostgreSQL 8.4
Next
From: Alex Hunsaker
Date:
Subject: Re: BUG #5334: Version 2.22 of Perl Safe module breaks UTF8 PostgreSQL 8.4