Thread: plperl and regexps with accented characters - incompatible?

plperl and regexps with accented characters - incompatible?

From
hubert depesz lubaczewski
Date:
hi,
i wrote this function:
#v+
CREATE OR REPLACE FUNCTION test(TEXT) RETURNS bool language plperl as $$
return (shift =~ /[a-ząćęłńóśźżĄĆĘŁŃŚÓŹŻ0-9_-]+/i) || 0;
$$;
#v-

it's functioning it not really relevant.

important thing is, that the creation of it fails:
psql:z.sql:25: ERROR:  creation of Perl function "texts_words_iu" failed:
'require' trapped by operation mask at line 15.

it looks strange - what "require"?

i mean - it is possible that perl itself loads something that is related to handling polish characters.

if i'll remove "i" flag to regexp matching - it works ok.
so, i assume perl loads something like "locale" or "utf8" modules to handle
//i, but since the error message doesn't mention what module it tried to load
it is quite hard to understand it.
also - perhaps loading of this particular module should be allowed even in
plperl? otherwise it requires me to use plperlu for even the simple task of
regexp matching.

if i'll remove //i flag - it works correctly, but then i have to change "a-z" to "a-zA-Z", and it's not really nice.

any ideas what's wrong, and how can i fix it?

depesz

--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

Re: plperl and regexps with accented characters - incompatible?

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160



hubert depesz lubaczewski writes:
...
> return (shift =~ /[a-z0-9_-]+/i) || 0;
...
> 'require' trapped by operation mask at line 15.
>
> it looks strange - what "require"?

As you guessed, it's trying to do load the utf8 pragma, and failing
as 'require' (and 'use') are not allowed by default: plperl uses the
Safe module to disallow things like 'require Module;'. Unfortunately, the
only way around it on your end is to use plperlu - something I recommend
anyway (for other reasons).

> also - perhaps loading of this particular module should be allowed even in
> plperl? otherwise it requires me to use plperlu for even the simple task of
> regexp matching.

Yes, we might want to consider making utf8 come pre-loaded for plperl. There
is no direct or easy way to do it (we don't have finer-grained control than
the 'require' opcode), but we could probably dial back restrictions,
'use' it, and then reset the Safe container to its defaults. Not sure what
other problems that may cause, however. CCing to hackers for discussion
there.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200711121139
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFHOIJPvJuQZxSWSsgRA10hAJ996hZYM8KiuziJb/R2QX0HY754bwCg+xZN
kePHNNZbLtRXj6ko8j51waw=
=fw0v
-----END PGP SIGNATURE-----



Re: [HACKERS] plperl and regexps with accented characters - incompatible?

From
Andrew Dunstan
Date:


Greg Sabino Mullane wrote:
>
> Yes, we might want to consider making utf8 come pre-loaded for plperl. There
> is no direct or easy way to do it (we don't have finer-grained control than
> the 'require' opcode), but we could probably dial back restrictions,
> 'use' it, and then reset the Safe container to its defaults. Not sure what
> other problems that may cause, however. CCing to hackers for discussion
> there.
>
>
>

UTF8 is automatically on for strings passed to plperl if the db encoding
is UTF8. That includes the source text. Please be more precise about
what you want.

BTW, the perl docs say this about the utf8 pragma:

       Do not use this pragma for anything else than telling Perl that your
       script is written in UTF-8.

There should be no need to do that - we will have done it for you. So
any attempt to use the utf8 pragma in plperl code is probably broken anyway.

cheers

andrew





Re: [HACKERS] plperl and regexps with accented characters - incompatible?

From
Andrew Dunstan
Date:

Andrew Dunstan wrote:
>
>
>
> Greg Sabino Mullane wrote:
>>
>> Yes, we might want to consider making utf8 come pre-loaded for
>> plperl. There is no direct or easy way to do it (we don't have
>> finer-grained control than the 'require' opcode), but we could
>> probably dial back restrictions, 'use' it, and then reset the Safe
>> container to its defaults. Not sure what other problems that may
>> cause, however. CCing to hackers for discussion there.
>>
>>
>>
>
> UTF8 is automatically on for strings passed to plperl if the db
> encoding is UTF8. That includes the source text. Please be more
> precise about what you want.
>
> BTW, the perl docs say this about the utf8 pragma:
>
>       Do not use this pragma for anything else than telling Perl that
> your
>       script is written in UTF-8.
>
> There should be no need to do that - we will have done it for you. So
> any attempt to use the utf8 pragma in plperl code is probably broken
> anyway.
>
>

Ugh, in testing I see some nastiness here without any explicit require.
It looks like there's an implicit require if the text contains certain
chars. I'll see what I can do to fix the bug, although I'm not sure if
it's possible.

cheers

andrew

Re: [HACKERS] plperl and regexps with accented characters - incompatible?

From
Andrew Dunstan
Date:

Andrew Dunstan wrote:
>
>
> Ugh, in testing I see some nastiness here without any explicit
> require. It looks like there's an implicit require if the text
> contains certain chars. I'll see what I can do to fix the bug,
> although I'm not sure if it's possible.
>
>

Looks like it's going to be very hard, unless someone has some brilliant
insight I'm missing :-(

Maybe we need to consult the perl coders.

cheers

andrew

Re: [HACKERS] plperl and regexps with accented characters - incompatible?

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> Ugh, in testing I see some nastiness here without any explicit
> require. It looks like there's an implicit require if the text
> contains certain chars.

Exactly.

> Looks like it's going to be very hard, unless someone has some
> brilliant insight I'm missing :-(

The only way I see around it is to do:

$PLContainer->permit('require');
...
$PLContainer->reval('use utf8;');
...
$PLContainer->deny('require');"

Not ideal. Part of me says we do this because something like //i
shouldn't suddenly fail just because you added an accented
character. The other part of me says to just have people use plperlu.
At the very least, we should probably mention it in the docs as
a gotcha.

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200711132155
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iD8DBQFHOmQLvJuQZxSWSsgRA6bJAKDX9tN6ridD6aP8PywuUOUKRnHFvQCeJizW
Rcq+43grmuckX1I4Rm75eTU=
=3cmn
-----END PGP SIGNATURE-----



Re: [HACKERS] plperl and regexps with accented characters - incompatible?

From
Andrew Dunstan
Date:

Greg Sabino Mullane wrote:
>> Ugh, in testing I see some nastiness here without any explicit
>> require. It looks like there's an implicit require if the text
>> contains certain chars.
>>
>
> Exactly.
>
>
>> Looks like it's going to be very hard, unless someone has some
>> brilliant insight I'm missing :-(
>>
>
> The only way I see around it is to do:
>
> $PLContainer->permit('require');
> ...
> $PLContainer->reval('use utf8;');
> ...
> $PLContainer->deny('require');"
>
> Not ideal.

I tried something like that briefly and it failed. The trouble is, I
think, that since the engine tries a require it fails on the op test
before it even looks to see if the module is already loaded. If you have
made something work then please show me, no matter how grotty.

> Part of me says we do this because something like //i
> shouldn't suddenly fail just because you added an accented
> character. The other part of me says to just have people use plperlu.
> At the very least, we should probably mention it in the docs as
> a gotcha.
>
>

I think we should search harder for a solution, but I don't have time
right now. If you want to submit a warning for the docs in a patch we
can get that in.

cheers

andrew

Re: [HACKERS] plperl and regexps with accented characters - incompatible?

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> I tried something like that briefly and it failed. The trouble is, I
> think, that since the engine tries a require it fails on the op test
> before it even looks to see if the module is already loaded.

I think we have little choice but to report this as a Perl bug.  It
essentially means that a "safe" interpreter cannot decide to preload
modules that it thinks are safe; and to add insult to injury, the
engine is apparently trying to require utf8 in some very low-level,
hidden-behind-the-scenes place, yet using high-level trappable
operations to do that.  Maybe those are two different bugs.  Either
utf8 is part of the Perl core or it isn't; you can't have it both ways.

            regards, tom lane

Re: [HACKERS] plperl and regexps with accented characters - incompatible?

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


Just as a followup, I reported this as a bug and it is
being looked at and discussed:

http://rt.perl.org/rt3//Public/Bug/Display.html?id=47576

Appears there is no easy resolution yet.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200711281358
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iD8DBQFHTbpxvJuQZxSWSsgRA+BqAJ9Q1KB6w4ow7GyqXTY3EtZvJRrdkgCfVXlb
yC/EaTWPOI6SpvBSRBXTC7s=
=LA+E
-----END PGP SIGNATURE-----



Re: [HACKERS] plperl and regexps with accented characters - incompatible?

From
Andrew Dunstan
Date:

Greg Sabino Mullane wrote:
> Just as a followup, I reported this as a bug and it is
> being looked at and discussed:
>
> http://rt.perl.org/rt3//Public/Bug/Display.html?id=47576
>
> Appears there is no easy resolution yet.
>
>
>

We might be able to do something with the suggested workaround. I will
see what I can do, unless you have already tried.

cheers

andrew

Re: [HACKERS] plperl and regexps with accented characters - incompatible?

From
Andrew Dunstan
Date:

Andrew Dunstan wrote:
>
>
> Greg Sabino Mullane wrote:
>> Just as a followup, I reported this as a bug and it is being looked
>> at and discussed:
>>
>> http://rt.perl.org/rt3//Public/Bug/Display.html?id=47576
>>
>> Appears there is no easy resolution yet.
>>
>>
>>
>
> We might be able to do something with the suggested workaround. I will
> see what I can do, unless you have already tried.
>
>

OK, I have a fairly ugly manual workaround, that I don't yet understand,
but seems to work for me.

In your session, run the following code before you do anything else:

CREATE OR REPLACE FUNCTION test((text) RETURNS bool LANGUAGE plperl as $$
return shift =~ /\xa9/i ? 'true' : 'false';
$$;
SELECT test('a');
DROP FUNCTION test(text);

After that we seem to be good to go with any old UTF8 chars.

I'm looking at automating this so the workaround can be hidden, but I'd
rather understand it first.

(Core guys: If we can hold RC1 for a bit while I get this fixed that
would be good.)

cheers

andrew