Thread: Re: Postgresql 8.1: plperl code works with LATIN1, fail

Re: Postgresql 8.1: plperl code works with LATIN1, fail

From
Matthias.Pitzl@izb.de
Date:
> In an 8.1.6 UTF-8 database this example returns false; in 8.2.1 it
> returns true.  See the following commit message and the related bug
> report regarding PL/Perl and UTF-8:
>
> http://archives.postgresql.org/pgsql-committers/2006-10/msg00277.php
> http://archives.postgresql.org/pgsql-bugs/2006-10/msg00077.php
>
> If you can't upgrade to 8.2 then you might be able to work around
> the problem by creating the function as plperlu and adding 'use utf8;'.

> --
> Michael Fuhr

Hello Michael!

As fas as i know 'use utf8;' normally just tells Perl that the source code
is written in UTF-8 and noting more.
For converting from and to UTF-8 in data usually the Encode modul is used.
Or is this different for plperlu?

Greetings,
Matthias

Re: Postgresql 8.1: plperl code works with LATIN1, fail

From
Michael Fuhr
Date:
On Mon, Jan 29, 2007 at 01:34:47PM +0100, Matthias.Pitzl@izb.de wrote:
> > If you can't upgrade to 8.2 then you might be able to work around
> > the problem by creating the function as plperlu and adding 'use utf8;'.
>
> As fas as i know 'use utf8;' normally just tells Perl that the source code
> is written in UTF-8 and noting more.

The string literals in the PL/Perl function body are UTF-8 but Perl
isn't treating them as such.  Isn't "use utf8" or "use encoding 'utf8'"
the way to tell Perl to do so?  The perluniintro manual page says this:

  Only one case remains where an explicit "use utf8" is needed: if
  your Perl script itself is encoded in UTF-8, you can use UTF-8
  in your identifier names, and in string and regular expression
  literals, by saying "use utf8".

Isn't that the situation here?  The PL/Perl function body is a
string encoded in the database's encoding, which in this case is
UTF-8.

> For converting from and to UTF-8 in data usually the Encode modul is used.
> Or is this different for plperlu?

Isn't the Encode module used for doing explicit conversions?  I think
the goal is not to have to do so, i.e., to have PL/Perl treat string
literals as UTF-8 if the database encoding is UTF-8.  PostgreSQL 8.2
does so but earlier versions don't.

--
Michael Fuhr

Re: Postgresql 8.1: plperl code works with LATIN1, fail

From
merlyn@stonehenge.com (Randal L. Schwartz)
Date:
>>>>> "Michael" == Michael Fuhr <mike@fuhr.org> writes:

Michael> Isn't that the situation here?  The PL/Perl function body is a
Michael> string encoded in the database's encoding, which in this case is
Michael> UTF-8.

If that's always the case, then the embedded Perl interpreter should
be started in that mode, perhaps by adding "-Mutf8" to the arg list
of the embedded interpreter.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!