Thread: Re: [GENERAL] plperl and regexps with accented characters - incompatible?
Re: [GENERAL] plperl and regexps with accented characters - incompatible?
From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 hubert depesz lubaczewski writes: ... > return (shift =~ /[a-z0-9_-]+/i) || 0; ... > 'require' trapped by operation mask at line 15. > > it looks strange - what "require"? As you guessed, it's trying to do load the utf8 pragma, and failing as 'require' (and 'use') are not allowed by default: plperl uses the Safe module to disallow things like 'require Module;'. Unfortunately, the only way around it on your end is to use plperlu - something I recommend anyway (for other reasons). > also - perhaps loading of this particular module should be allowed even in > plperl? otherwise it requires me to use plperlu for even the simple task of > regexp matching. Yes, we might want to consider making utf8 come pre-loaded for plperl. There is no direct or easy way to do it (we don't have finer-grained control than the 'require' opcode), but we could probably dial back restrictions, 'use' it, and then reset the Safe container to its defaults. Not sure what other problems that may cause, however. CCing to hackers for discussion there. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200711121139 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFHOIJPvJuQZxSWSsgRA10hAJ996hZYM8KiuziJb/R2QX0HY754bwCg+xZN kePHNNZbLtRXj6ko8j51waw= =fw0v -----END PGP SIGNATURE-----
Greg Sabino Mullane wrote: > > Yes, we might want to consider making utf8 come pre-loaded for plperl. There > is no direct or easy way to do it (we don't have finer-grained control than > the 'require' opcode), but we could probably dial back restrictions, > 'use' it, and then reset the Safe container to its defaults. Not sure what > other problems that may cause, however. CCing to hackers for discussion > there. > > > UTF8 is automatically on for strings passed to plperl if the db encoding is UTF8. That includes the source text. Please be more precise about what you want. BTW, the perl docs say this about the utf8 pragma: Do not use this pragma for anything else than telling Perl that your script is written in UTF-8. There should be no need to do that - we will have done it for you. So any attempt to use the utf8 pragma in plperl code is probably broken anyway. cheers andrew
Andrew Dunstan wrote: > > > > Greg Sabino Mullane wrote: >> >> Yes, we might want to consider making utf8 come pre-loaded for >> plperl. There is no direct or easy way to do it (we don't have >> finer-grained control than the 'require' opcode), but we could >> probably dial back restrictions, 'use' it, and then reset the Safe >> container to its defaults. Not sure what other problems that may >> cause, however. CCing to hackers for discussion there. >> >> >> > > UTF8 is automatically on for strings passed to plperl if the db > encoding is UTF8. That includes the source text. Please be more > precise about what you want. > > BTW, the perl docs say this about the utf8 pragma: > > Do not use this pragma for anything else than telling Perl that > your > script is written in UTF-8. > > There should be no need to do that - we will have done it for you. So > any attempt to use the utf8 pragma in plperl code is probably broken > anyway. > > Ugh, in testing I see some nastiness here without any explicit require. It looks like there's an implicit require if the text contains certain chars. I'll see what I can do to fix the bug, although I'm not sure if it's possible. cheers andrew
Andrew Dunstan wrote: > > > Ugh, in testing I see some nastiness here without any explicit > require. It looks like there's an implicit require if the text > contains certain chars. I'll see what I can do to fix the bug, > although I'm not sure if it's possible. > > Looks like it's going to be very hard, unless someone has some brilliant insight I'm missing :-( Maybe we need to consult the perl coders. cheers andrew
Re: [GENERAL] plperl and regexps with accented characters - incompatible?
From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > Ugh, in testing I see some nastiness here without any explicit > require. It looks like there's an implicit require if the text > contains certain chars. Exactly. > Looks like it's going to be very hard, unless someone has some > brilliant insight I'm missing :-( The only way I see around it is to do: $PLContainer->permit('require'); ... $PLContainer->reval('use utf8;'); ... $PLContainer->deny('require');" Not ideal. Part of me says we do this because something like //i shouldn't suddenly fail just because you added an accented character. The other part of me says to just have people use plperlu. At the very least, we should probably mention it in the docs as a gotcha. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation PGP Key: 0x14964AC8 200711132155 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFHOmQLvJuQZxSWSsgRA6bJAKDX9tN6ridD6aP8PywuUOUKRnHFvQCeJizW Rcq+43grmuckX1I4Rm75eTU= =3cmn -----END PGP SIGNATURE-----
Greg Sabino Mullane wrote: >> Ugh, in testing I see some nastiness here without any explicit >> require. It looks like there's an implicit require if the text >> contains certain chars. >> > > Exactly. > > >> Looks like it's going to be very hard, unless someone has some >> brilliant insight I'm missing :-( >> > > The only way I see around it is to do: > > $PLContainer->permit('require'); > ... > $PLContainer->reval('use utf8;'); > ... > $PLContainer->deny('require');" > > Not ideal. I tried something like that briefly and it failed. The trouble is, I think, that since the engine tries a require it fails on the op test before it even looks to see if the module is already loaded. If you have made something work then please show me, no matter how grotty. > Part of me says we do this because something like //i > shouldn't suddenly fail just because you added an accented > character. The other part of me says to just have people use plperlu. > At the very least, we should probably mention it in the docs as > a gotcha. > > I think we should search harder for a solution, but I don't have time right now. If you want to submit a warning for the docs in a patch we can get that in. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > I tried something like that briefly and it failed. The trouble is, I > think, that since the engine tries a require it fails on the op test > before it even looks to see if the module is already loaded. I think we have little choice but to report this as a Perl bug. It essentially means that a "safe" interpreter cannot decide to preload modules that it thinks are safe; and to add insult to injury, the engine is apparently trying to require utf8 in some very low-level, hidden-behind-the-scenes place, yet using high-level trappable operations to do that. Maybe those are two different bugs. Either utf8 is part of the Perl core or it isn't; you can't have it both ways. regards, tom lane
Re: [GENERAL] plperl and regexps with accented characters - incompatible?
From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Just as a followup, I reported this as a bug and it is being looked at and discussed: http://rt.perl.org/rt3//Public/Bug/Display.html?id=47576 Appears there is no easy resolution yet. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200711281358 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFHTbpxvJuQZxSWSsgRA+BqAJ9Q1KB6w4ow7GyqXTY3EtZvJRrdkgCfVXlb yC/EaTWPOI6SpvBSRBXTC7s= =LA+E -----END PGP SIGNATURE-----
Greg Sabino Mullane wrote: > Just as a followup, I reported this as a bug and it is > being looked at and discussed: > > http://rt.perl.org/rt3//Public/Bug/Display.html?id=47576 > > Appears there is no easy resolution yet. > > > We might be able to do something with the suggested workaround. I will see what I can do, unless you have already tried. cheers andrew
Andrew Dunstan wrote: > > > Greg Sabino Mullane wrote: >> Just as a followup, I reported this as a bug and it is being looked >> at and discussed: >> >> http://rt.perl.org/rt3//Public/Bug/Display.html?id=47576 >> >> Appears there is no easy resolution yet. >> >> >> > > We might be able to do something with the suggested workaround. I will > see what I can do, unless you have already tried. > > OK, I have a fairly ugly manual workaround, that I don't yet understand, but seems to work for me. In your session, run the following code before you do anything else: CREATE OR REPLACE FUNCTION test((text) RETURNS bool LANGUAGE plperl as $$ return shift =~ /\xa9/i ? 'true' : 'false'; $$; SELECT test('a'); DROP FUNCTION test(text); After that we seem to be good to go with any old UTF8 chars. I'm looking at automating this so the workaround can be hidden, but I'd rather understand it first. (Core guys: If we can hold RC1 for a bit while I get this fixed that would be good.) cheers andrew
Andrew Dunstan wrote: > > > Andrew Dunstan wrote: >> >> >> Greg Sabino Mullane wrote: >>> Just as a followup, I reported this as a bug and it is being looked >>> at and discussed: >>> >>> http://rt.perl.org/rt3//Public/Bug/Display.html?id=47576 >>> >>> Appears there is no easy resolution yet. >>> >>> >>> >> >> We might be able to do something with the suggested workaround. I >> will see what I can do, unless you have already tried. >> >> > > OK, I have a fairly ugly manual workaround, that I don't yet > understand, but seems to work for me. > > In your session, run the following code before you do anything else: > > CREATE OR REPLACE FUNCTION test(text) RETURNS bool LANGUAGE plperl as $$ > return shift =~ /\xa9/i ? 'true' : 'false'; > $$; > SELECT test('a'); > DROP FUNCTION test(text); > > After that we seem to be good to go with any old UTF8 chars. > > I'm looking at automating this so the workaround can be hidden, but > I'd rather understand it first. > > (Core guys: If we can hold RC1 for a bit while I get this fixed that > would be good.) > > The attached patch works for me to eliminate the errors. Please test ASAP. cheers andrew Index: src/pl/plperl/plperl.c =================================================================== RCS file: /cvsroot/pgsql/src/pl/plperl/plperl.c,v retrieving revision 1.132 diff -c -r1.132 plperl.c *** src/pl/plperl/plperl.c 15 Nov 2007 22:25:17 -0000 1.132 --- src/pl/plperl/plperl.c 29 Nov 2007 05:32:22 -0000 *************** *** 149,154 **** --- 149,156 ---- static SV *newSVstring(const char *str); static SV **hv_store_string(HV *hv, const char *key, SV *val); static SV **hv_fetch_string(HV *hv, const char *key); + static SV *plperl_create_sub(char *proname, char *s, bool trusted); + static SV *plperl_call_perl_func(plperl_proc_desc *desc, FunctionCallInfo fcinfo); /* * This routine is a crock, and so is everyplace that calls it. The problem *************** *** 504,509 **** --- 506,558 ---- else { eval_pv(SAFE_OK, FALSE); + if (GetDatabaseEncoding() == PG_UTF8) + { + + /* + * Fill in just enough information to set up this perl + * function in the safe container and call it. + * For some reason not entirely clear, it prevents errors that + * can arise from the regex code later trying to load + * utf8 modules. + */ + + plperl_proc_desc desc; + FunctionCallInfoData fcinfo; + FmgrInfo outfunc; + HeapTuple typeTup; + Form_pg_type typeStruct; + SV *ret; + SV *func; + + /* make sure we don't call ourselves recursively */ + plperl_safe_init_done = true; + + /* compile the function */ + func = plperl_create_sub( + "utf8fix", + "return shift =~ /\\xa9/i ? 'true' : 'false' ;", + true); + + + /* set up to call the function with a single text argument 'a' */ + desc.reference = func; + desc.nargs = 1; + desc.arg_is_rowtype[0] = false; + fcinfo.argnull[0] = false; + fcinfo.arg[0] = + DatumGetTextP(DirectFunctionCall1(textin, + CStringGetDatum("a"))); + typeTup = SearchSysCache(TYPEOID, + TEXTOID, + 0, 0, 0); + typeStruct = (Form_pg_type) GETSTRUCT(typeTup); + fmgr_info(typeStruct->typoutput,&(desc.arg_out_func[0])); + ReleaseSysCache(typeTup); + + /* and make the call */ + ret = plperl_call_perl_func(&desc,&fcinfo); + } } plperl_safe_init_done = true;
Re: [PATCHES] Re: [GENERAL] plperl and regexps with accented characters - incompatible?
From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes: > + * Fill in just enough information to set up this perl > + * function in the safe container and call it. > + * For some reason not entirely clear, it prevents errors that > + * can arise from the regex code later trying to load > + * utf8 modules. How many versions of Perl have you tried this against? regards, tom lane
Re: [GENERAL] plperl and regexps with accented characters - incompatible?
From
hubert depesz lubaczewski
Date:
On Thu, Nov 29, 2007 at 12:39:30AM -0500, Andrew Dunstan wrote: > The attached patch works for me to eliminate the errors. Please test ASAP. tested, works for me: #v+ # CREATE OR REPLACE FUNCTION test(TEXT) RETURNS bool language plperl as $$ return (shift =~ /[a-ząćęłńóśźżĄĆĘŁŃŚÓŹŻ0-9_-]+/i) || 0; $$; CREATE FUNCTION # select test('depesz'); test ------ t (1 row) # select test('depesząćęł'); test ------ t (1 row) # select test('depesząćęł$'); test ------ t (1 row) # select test('dePEsząĆęł$'); test ------ t (1 row) #v- depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV)
Re: [PATCHES] Re: [GENERAL] plperl and regexps with accented characters - incompatible?
From
Andrew Dunstan
Date:
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> + * Fill in just enough information to set up this perl >> + * function in the safe container and call it. >> + * For some reason not entirely clear, it prevents errors that >> + * can arise from the regex code later trying to load >> + * utf8 modules. >> > > How many versions of Perl have you tried this against? > > > Only one :-( I don't have a farm of perl versions hanging round. That's one of the reasons I asked that people test it. The version I tested against is 5.8.8 - the latest stable release. The 5.8 series started in 2003 from what I can see - if anyone has a sufficiently old system that they can test on 5.6.2 that will be useful. I spent an hour wrestling unsuccessfully with it this morning but I don't have more time to spend on it. Systems older than 5.6 don't matter, as we don't do any UTF8 mangling on those. cheers andrew
Re: [PATCHES] Re: [GENERAL] plperl and regexps with accented characters - incompatible?
From
Kris Jurka
Date:
On Thu, 29 Nov 2007, Andrew Dunstan wrote: > The version I tested against is 5.8.8 - the latest stable release. The > 5.8 series started in 2003 from what I can see - if anyone has a > sufficiently old system that they can test on 5.6.2 that will be useful. I've got a 5.6.1 perl here, but it wasn't built shared, so I can't test plperl. I ran the test case Greg posted to the perl bug tracker and it doesn't fail, so unless you're concerned that your change will break 5.6, then it doesn't look like 5.6 needs a fix. Kris Jurka
I wrote: > >> >> OK, I have a fairly ugly manual workaround, that I don't yet >> understand, but seems to work for me. >> >> In your session, run the following code before you do anything else: >> >> CREATE OR REPLACE FUNCTION test(text) RETURNS bool LANGUAGE plperl as $$ >> return shift =~ /\xa9/i ? 'true' : 'false'; >> $$; >> SELECT test('a'); >> DROP FUNCTION test(text); >> >> After that we seem to be good to go with any old UTF8 chars. >> >> I'm looking at automating this so the workaround can be hidden, but >> I'd rather understand it first. >> >> (Core guys: If we can hold RC1 for a bit while I get this fixed that >> would be good.) >> >> > > The attached patch works for me to eliminate the errors. Please test > ASAP. > > Given our time constraints I intend to apply this to HEAD and backpatch it to 8.2 and 8.1, unless there's a strenuous objection. That will give us some buildfarm coverage on it, although we don't seem to have any perl 5.6.x on the buildfarm that I could see. We've had a positive test report, no negative reports, and I'm fairly sure the patch is at worst harmless. cheers andrew
Re: [PATCHES] Re: [GENERAL] plperl and regexps with accented characters - incompatible?
From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes: > The version I tested against is 5.8.8 - the latest stable release. The > 5.8 series started in 2003 from what I can see - if anyone has a > sufficiently old system that they can test on 5.6.2 that will be useful. I got around to trying it with a dusty 5.6.1 I have laying about on my HPPA machine, and the news is not good: CREATE LANGUAGE plperl dumps core deep inside libperl. With or without this patch. As best I can tell at the moment, I have not tested 5.6.1 with anything later than our 7.2 branch, so I don't know exactly where the breakage slipped in. It may be of long standing. Core was generated by `postgres'. Program terminated with signal 11, Segmentation fault. warning: Can't find file postmaster referenced in dld_list. Reading symbols from /usr/lib/libxnet.1...done. Reading symbols from /usr/lib/libc.1...done. Reading symbols from /usr/lib/libdld.1...done. Reading symbols from /home/postgres/testversion/lib/plperl.sl...done. Reading symbols from /opt/perl5.6.1/lib/5.6.1/PA-RISC2.0/CORE/libperl.sl... done. Reading symbols from /usr/lib/libnsl_s.1...done. Reading symbols from /usr/lib/libM.1...done. Reading symbols from /usr/lib/libsec.1...done. #0 0xc00a02fc in ?? () from /usr/lib/libc.1 (gdb) bt #0 0xc00a02fc in ?? () from /usr/lib/libc.1 #1 0xc6fc3bb4 in ?? () from /opt/perl5.6.1/lib/5.6.1/PA-RISC2.0/CORE/libperl.sl #2 0xc6f5a99c in ?? () from /opt/perl5.6.1/lib/5.6.1/PA-RISC2.0/CORE/libperl.sl #3 0xc6f570a4 in ?? () from /opt/perl5.6.1/lib/5.6.1/PA-RISC2.0/CORE/libperl.sl #4 0xc6f56c88 in ?? () from /opt/perl5.6.1/lib/5.6.1/PA-RISC2.0/CORE/libperl.sl #5 0xc0d2b660 in plperl_init_interp () at plperl.c:429 #6 0xc0d2b2bc in _PG_init () at plperl.c:213 #7 0x3ce9f0 in internal_load_library ( libname=0xf8 <Address 0xf8 out of bounds>) at dfmgr.c:296 #8 0x3ce4a4 in load_external_function (filename=0x4016086c " \037(", funcname=0x40062924 "", signalNotFound=1 '\001', filehandle=0x7b03bb8c) at dfmgr.c:110 #9 0x1af7fc in fmgr_c_validator (fcinfo=0x4016086c) at pg_proc.c:509 #10 0x3d1a98 in OidFunctionCall1 (functionId=1075185772, arg1=49153) at fmgr.c:1527 #11 0x1af530 in ProcedureCreate ( procedureName=0x401075f8 "plperl_call_handler", procNamespace=11, replace=0 '\000', returnsSet=0 '\000', returnType=2280, languageObjectId=13, languageValidator=2247, regards, tom lane
Re: [PATCHES] Re: [GENERAL] plperl and regexps with accented characters - incompatible?
From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Tom Lane reports: >> The version I tested against is 5.8.8 - the latest stable release. The >> 5.8 series started in 2003 from what I can see - if anyone has a >> sufficiently old system that they can test on 5.6.2 that will be useful. > I got around to trying it with a dusty 5.6.1 I have laying about on my > HPPA machine, and the news is not good: CREATE LANGUAGE plperl dumps > core deep inside libperl. With or without this patch. Sounds like another good reason to start enforcing a minimum modern Perl version. In the past, I advocated making it a minimum of 5.6, but now I think a minimum of 5.8 is in order. The first version of 5.8 was released in July of 2002, so I don't think we'd be upsetting very many people if we did so. Plus, they'd be potentially dumping core anyway, and our energy is better spent improving Pl/Perl itself at this point rather than tweaking things for old versions of Perl. I don't even think I have a pre 5.8 version around anymore. Would such a requirement cause any problems with packagers? I imagine a perl 5.8 prereq is a common thing these days... - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200712011322 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFHUaaevJuQZxSWSsgRA44LAJ9N47I0bIjjILxkOAAUv1ud0lDPAACdEX1J b3oIV+o0OPrT+RNW03WsGxg= =0I4i -----END PGP SIGNATURE-----
Re: [PATCHES] Re: [GENERAL] plperl and regexps with accented characters - incompatible?
From
Tom Lane
Date:
I wrote: > I got around to trying it with a dusty 5.6.1 I have laying about on my > HPPA machine, and the news is not good: CREATE LANGUAGE plperl dumps > core deep inside libperl. With or without this patch. > As best I can tell at the moment, I have not tested 5.6.1 with anything > later than our 7.2 branch, so I don't know exactly where the breakage > slipped in. It may be of long standing. Actually, libperl seems to dump core in the same place in every PG version, back to and including 7.2, so what seems more likely is that this copy of perl is just plain broken. Since we didn't have any form of regression test for plperl back then, it's entirely possible that I never tested any further than compiling plperl with that setup. So we still need someone to try it with a good copy of 5.6 ... regards, tom lane
Re: [PATCHES] Re: [GENERAL] plperl and regexps with accented characters - incompatible?
From
Andrew Dunstan
Date:
Tom Lane wrote: > I wrote: > >> I got around to trying it with a dusty 5.6.1 I have laying about on my >> HPPA machine, and the news is not good: CREATE LANGUAGE plperl dumps >> core deep inside libperl. With or without this patch. >> > > >> As best I can tell at the moment, I have not tested 5.6.1 with anything >> later than our 7.2 branch, so I don't know exactly where the breakage >> slipped in. It may be of long standing. >> > > Actually, libperl seems to dump core in the same place in every PG > version, back to and including 7.2, so what seems more likely is that > this copy of perl is just plain broken. Since we didn't have any form > of regression test for plperl back then, it's entirely possible that > I never tested any further than compiling plperl with that setup. > > So we still need someone to try it with a good copy of 5.6 ... > > > OK, I have built a fresh copy of perl 5.6.2 and built and linked HEAD against it. It passes the regression tests and the UTF8 test, and doesn't dump core. This is on FC6/x86_64. cheers andrew
Re: [PATCHES] Re: [GENERAL] plperl and regexps with accented characters - incompatible?
From
Steve Singer
Date:
On Sat, 1 Dec 2007, Tom Lane wrote: > I wrote: >> I got around to trying it with a dusty 5.6.1 I have laying about on my >> HPPA machine, and the news is not good: CREATE LANGUAGE plperl dumps >> core deep inside libperl. With or without this patch. > >> As best I can tell at the moment, I have not tested 5.6.1 with anything >> later than our 7.2 branch, so I don't know exactly where the breakage >> slipped in. It may be of long standing. > > Actually, libperl seems to dump core in the same place in every PG > version, back to and including 7.2, so what seems more likely is that > this copy of perl is just plain broken. Since we didn't have any form > of regression test for plperl back then, it's entirely possible that > I never tested any further than compiling plperl with that setup. > > So we still need someone to try it with a good copy of 5.6 ... I tested cvs head which includes the patch on Solaris 9/SPARC with perl 5.6.1 and it seems to work fine. Test output attached. Steve > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >