Thread: Re: [HACKERS] [GENERAL] plperl and regexps with accented characters - incompatible?

Re: [HACKERS] [GENERAL] plperl and regexps with accented characters - incompatible?

From
Andrew Dunstan
Date:

Andrew Dunstan wrote:
>
>
> Andrew Dunstan wrote:
>>
>>
>> Greg Sabino Mullane wrote:
>>> Just as a followup, I reported this as a bug and it is being looked
>>> at and discussed:
>>>
>>> http://rt.perl.org/rt3//Public/Bug/Display.html?id=47576
>>>
>>> Appears there is no easy resolution yet.
>>>
>>>
>>>
>>
>> We might be able to do something with the suggested workaround. I
>> will see what I can do, unless you have already tried.
>>
>>
>
> OK, I have a fairly ugly manual workaround, that I don't yet
> understand, but seems to work for me.
>
> In your session, run the following code before you do anything else:
>
> CREATE OR REPLACE FUNCTION test(text) RETURNS bool LANGUAGE plperl as $$
> return shift =~ /\xa9/i ? 'true' : 'false';
> $$;
> SELECT test('a');
> DROP FUNCTION test(text);
>
> After that we seem to be good to go with any old UTF8 chars.
>
> I'm looking at automating this so the workaround can be hidden, but
> I'd rather understand it first.
>
> (Core guys: If we can hold RC1 for a bit while I get this fixed that
> would be good.)
>
>

The attached patch works for me to eliminate the errors. Please test ASAP.

cheers

andrew
Index: src/pl/plperl/plperl.c
===================================================================
RCS file: /cvsroot/pgsql/src/pl/plperl/plperl.c,v
retrieving revision 1.132
diff -c -r1.132 plperl.c
*** src/pl/plperl/plperl.c    15 Nov 2007 22:25:17 -0000    1.132
--- src/pl/plperl/plperl.c    29 Nov 2007 05:32:22 -0000
***************
*** 149,154 ****
--- 149,156 ----
  static SV  *newSVstring(const char *str);
  static SV **hv_store_string(HV *hv, const char *key, SV *val);
  static SV **hv_fetch_string(HV *hv, const char *key);
+ static SV  *plperl_create_sub(char *proname, char *s, bool trusted);
+ static SV  *plperl_call_perl_func(plperl_proc_desc *desc, FunctionCallInfo fcinfo);

  /*
   * This routine is a crock, and so is everyplace that calls it.  The problem
***************
*** 504,509 ****
--- 506,558 ----
      else
      {
          eval_pv(SAFE_OK, FALSE);
+         if (GetDatabaseEncoding() == PG_UTF8)
+         {
+
+             /*
+              * Fill in just enough information to set up this perl
+              * function in the safe container and call it.
+              * For some reason not entirely clear, it prevents errors that
+              * can arise from the regex code later trying to load
+              * utf8 modules.
+              */
+
+             plperl_proc_desc desc;
+             FunctionCallInfoData fcinfo;
+             FmgrInfo outfunc;
+             HeapTuple   typeTup;
+             Form_pg_type typeStruct;
+             SV *ret;
+             SV *func;
+
+             /* make sure we don't call ourselves recursively */
+             plperl_safe_init_done = true;
+
+             /* compile the function */
+             func = plperl_create_sub(
+                 "utf8fix",
+                 "return shift =~ /\\xa9/i ? 'true' : 'false' ;",
+                 true);
+
+
+             /* set up to call the function with a single text argument 'a' */
+             desc.reference = func;
+             desc.nargs = 1;
+             desc.arg_is_rowtype[0] = false;
+             fcinfo.argnull[0] = false;
+             fcinfo.arg[0] =
+                 DatumGetTextP(DirectFunctionCall1(textin,
+                                                   CStringGetDatum("a")));
+             typeTup = SearchSysCache(TYPEOID,
+                                      TEXTOID,
+                                      0, 0, 0);
+             typeStruct = (Form_pg_type) GETSTRUCT(typeTup);
+             fmgr_info(typeStruct->typoutput,&(desc.arg_out_func[0]));
+             ReleaseSysCache(typeTup);
+
+             /* and make the call */
+             ret = plperl_call_perl_func(&desc,&fcinfo);
+         }
      }

      plperl_safe_init_done = true;

Andrew Dunstan <andrew@dunslane.net> writes:
> +              * Fill in just enough information to set up this perl
> +              * function in the safe container and call it.
> +              * For some reason not entirely clear, it prevents errors that
> +              * can arise from the regex code later trying to load
> +              * utf8 modules.

How many versions of Perl have you tried this against?

            regards, tom lane

Re: [HACKERS] [GENERAL] plperl and regexps with accented characters - incompatible?

From
hubert depesz lubaczewski
Date:
On Thu, Nov 29, 2007 at 12:39:30AM -0500, Andrew Dunstan wrote:
> The attached patch works for me to eliminate the errors. Please test ASAP.

tested, works for me:
#v+
# CREATE OR REPLACE FUNCTION test(TEXT) RETURNS bool language plperl as $$
return (shift =~ /[a-ząćęłńóśźżĄĆĘŁŃŚÓŹŻ0-9_-]+/i) || 0;
$$;
CREATE FUNCTION

# select test('depesz');
 test
------
 t
(1 row)

# select test('depesząćęł');
 test
------
 t
(1 row)

# select test('depesząćęł$');
 test
------
 t
(1 row)

# select test('dePEsząĆęł$');
 test
------
 t
(1 row)
#v-

depesz

--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)


Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>
>> +              * Fill in just enough information to set up this perl
>> +              * function in the safe container and call it.
>> +              * For some reason not entirely clear, it prevents errors that
>> +              * can arise from the regex code later trying to load
>> +              * utf8 modules.
>>
>
> How many versions of Perl have you tried this against?
>
>
>

Only one :-( I don't have a farm of perl versions hanging round. That's
one of the reasons I asked that people test it.

The version I tested against is 5.8.8 -  the latest stable release. The
5.8 series started in 2003 from what I can see - if anyone has a
sufficiently old system that they can test on 5.6.2 that will be useful.
I spent an hour wrestling unsuccessfully with it this morning but I
don't have more time to spend on it. Systems older than 5.6 don't
matter, as we don't do any UTF8 mangling on those.

cheers

andrew




On Thu, 29 Nov 2007, Andrew Dunstan wrote:

> The version I tested against is 5.8.8 - the latest stable release. The
> 5.8 series started in 2003 from what I can see - if anyone has a
> sufficiently old system that they can test on 5.6.2 that will be useful.

I've got a 5.6.1 perl here, but it wasn't built shared, so I can't test
plperl.  I ran the test case Greg posted to the perl bug tracker and it
doesn't fail, so unless you're concerned that your change will break 5.6,
then it doesn't look like 5.6 needs a fix.

Kris Jurka

Re: [HACKERS] [GENERAL] plperl and regexps with accented characters - incompatible?

From
Andrew Dunstan
Date:
I wrote:
>
>>
>> OK, I have a fairly ugly manual workaround, that I don't yet
>> understand, but seems to work for me.
>>
>> In your session, run the following code before you do anything else:
>>
>> CREATE OR REPLACE FUNCTION test(text) RETURNS bool LANGUAGE plperl as $$
>> return shift =~ /\xa9/i ? 'true' : 'false';
>> $$;
>> SELECT test('a');
>> DROP FUNCTION test(text);
>>
>> After that we seem to be good to go with any old UTF8 chars.
>>
>> I'm looking at automating this so the workaround can be hidden, but
>> I'd rather understand it first.
>>
>> (Core guys: If we can hold RC1 for a bit while I get this fixed that
>> would be good.)
>>
>>
>
> The attached patch works for me to eliminate the errors. Please test
> ASAP.
>
>

Given our time constraints I intend to apply this to HEAD and backpatch
it to 8.2 and 8.1, unless there's a strenuous objection. That will give
us some buildfarm coverage on it, although we don't seem to have any
perl 5.6.x on the buildfarm that I could see. We've had a positive test
report, no negative reports, and I'm fairly sure the patch is at worst
harmless.


cheers

andrew

Andrew Dunstan <andrew@dunslane.net> writes:
> The version I tested against is 5.8.8 -  the latest stable release. The
> 5.8 series started in 2003 from what I can see - if anyone has a
> sufficiently old system that they can test on 5.6.2 that will be useful.

I got around to trying it with a dusty 5.6.1 I have laying about on my
HPPA machine, and the news is not good: CREATE LANGUAGE plperl dumps
core deep inside libperl.  With or without this patch.

As best I can tell at the moment, I have not tested 5.6.1 with anything
later than our 7.2 branch, so I don't know exactly where the breakage
slipped in.  It may be of long standing.

Core was generated by `postgres'.
Program terminated with signal 11, Segmentation fault.

warning: Can't find file postmaster referenced in dld_list.
Reading symbols from /usr/lib/libxnet.1...done.
Reading symbols from /usr/lib/libc.1...done.
Reading symbols from /usr/lib/libdld.1...done.
Reading symbols from /home/postgres/testversion/lib/plperl.sl...done.
Reading symbols from /opt/perl5.6.1/lib/5.6.1/PA-RISC2.0/CORE/libperl.sl...
done.
Reading symbols from /usr/lib/libnsl_s.1...done.
Reading symbols from /usr/lib/libM.1...done.
Reading symbols from /usr/lib/libsec.1...done.
#0  0xc00a02fc in ?? () from /usr/lib/libc.1
(gdb) bt
#0  0xc00a02fc in ?? () from /usr/lib/libc.1
#1  0xc6fc3bb4 in ?? ()
   from /opt/perl5.6.1/lib/5.6.1/PA-RISC2.0/CORE/libperl.sl
#2  0xc6f5a99c in ?? ()
   from /opt/perl5.6.1/lib/5.6.1/PA-RISC2.0/CORE/libperl.sl
#3  0xc6f570a4 in ?? ()
   from /opt/perl5.6.1/lib/5.6.1/PA-RISC2.0/CORE/libperl.sl
#4  0xc6f56c88 in ?? ()
   from /opt/perl5.6.1/lib/5.6.1/PA-RISC2.0/CORE/libperl.sl
#5  0xc0d2b660 in plperl_init_interp () at plperl.c:429
#6  0xc0d2b2bc in _PG_init () at plperl.c:213
#7  0x3ce9f0 in internal_load_library (
    libname=0xf8 <Address 0xf8 out of bounds>) at dfmgr.c:296
#8  0x3ce4a4 in load_external_function (filename=0x4016086c "  \037(",
    funcname=0x40062924 "", signalNotFound=1 '\001', filehandle=0x7b03bb8c)
    at dfmgr.c:110
#9  0x1af7fc in fmgr_c_validator (fcinfo=0x4016086c) at pg_proc.c:509
#10 0x3d1a98 in OidFunctionCall1 (functionId=1075185772, arg1=49153)
    at fmgr.c:1527
#11 0x1af530 in ProcedureCreate (
    procedureName=0x401075f8 "plperl_call_handler", procNamespace=11,
    replace=0 '\000', returnsSet=0 '\000', returnType=2280,
    languageObjectId=13, languageValidator=2247,

            regards, tom lane

Re: Re: [HACKERS] [GENERAL] plperl and regexps with accented characters - incompatible?

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


Tom Lane reports:

>> The version I tested against is 5.8.8 -  the latest stable release. The
>> 5.8 series started in 2003 from what I can see - if anyone has a
>> sufficiently old system that they can test on 5.6.2 that will be useful.

> I got around to trying it with a dusty 5.6.1 I have laying about on my
> HPPA machine, and the news is not good: CREATE LANGUAGE plperl dumps
> core deep inside libperl.  With or without this patch.

Sounds like another good reason to start enforcing a minimum modern Perl
version. In the past, I advocated making it a minimum of 5.6, but now I
think a minimum of 5.8 is in order. The first version of 5.8 was released
in July of 2002, so I don't think we'd be upsetting very many people if
we did so. Plus, they'd be potentially dumping core anyway, and our energy
is better spent improving Pl/Perl itself at this point rather than tweaking
things for old versions of Perl. I don't even think I have a pre 5.8
version around anymore. Would such a requirement cause any problems with
packagers? I imagine a perl 5.8 prereq is a common thing these days...


- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200712011322
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFHUaaevJuQZxSWSsgRA44LAJ9N47I0bIjjILxkOAAUv1ud0lDPAACdEX1J
b3oIV+o0OPrT+RNW03WsGxg=
=0I4i
-----END PGP SIGNATURE-----



I wrote:
> I got around to trying it with a dusty 5.6.1 I have laying about on my
> HPPA machine, and the news is not good: CREATE LANGUAGE plperl dumps
> core deep inside libperl.  With or without this patch.

> As best I can tell at the moment, I have not tested 5.6.1 with anything
> later than our 7.2 branch, so I don't know exactly where the breakage
> slipped in.  It may be of long standing.

Actually, libperl seems to dump core in the same place in every PG
version, back to and including 7.2, so what seems more likely is that
this copy of perl is just plain broken.  Since we didn't have any form
of regression test for plperl back then, it's entirely possible that
I never tested any further than compiling plperl with that setup.

So we still need someone to try it with a good copy of 5.6 ...

            regards, tom lane


Tom Lane wrote:
> I wrote:
>
>> I got around to trying it with a dusty 5.6.1 I have laying about on my
>> HPPA machine, and the news is not good: CREATE LANGUAGE plperl dumps
>> core deep inside libperl.  With or without this patch.
>>
>
>
>> As best I can tell at the moment, I have not tested 5.6.1 with anything
>> later than our 7.2 branch, so I don't know exactly where the breakage
>> slipped in.  It may be of long standing.
>>
>
> Actually, libperl seems to dump core in the same place in every PG
> version, back to and including 7.2, so what seems more likely is that
> this copy of perl is just plain broken.  Since we didn't have any form
> of regression test for plperl back then, it's entirely possible that
> I never tested any further than compiling plperl with that setup.
>
> So we still need someone to try it with a good copy of 5.6 ...
>
>
>

OK, I have built a fresh copy of perl 5.6.2 and built and linked HEAD
against it. It passes the regression tests and the UTF8 test, and
doesn't dump core. This is on FC6/x86_64.

cheers

andrew


On Sat, 1 Dec 2007, Tom Lane wrote:

> I wrote:
>> I got around to trying it with a dusty 5.6.1 I have laying about on my
>> HPPA machine, and the news is not good: CREATE LANGUAGE plperl dumps
>> core deep inside libperl.  With or without this patch.
>
>> As best I can tell at the moment, I have not tested 5.6.1 with anything
>> later than our 7.2 branch, so I don't know exactly where the breakage
>> slipped in.  It may be of long standing.
>
> Actually, libperl seems to dump core in the same place in every PG
> version, back to and including 7.2, so what seems more likely is that
> this copy of perl is just plain broken.  Since we didn't have any form
> of regression test for plperl back then, it's entirely possible that
> I never tested any further than compiling plperl with that setup.
>
> So we still need someone to try it with a good copy of 5.6 ...


I tested cvs head which includes the patch on Solaris 9/SPARC with perl
5.6.1 and it seems to work fine.


Test output attached.

Steve


>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>