Thread: BUG #5010: perl iconv function returns ? character

BUG #5010: perl iconv function returns ? character

From
"Lampa"
Date:
The following bug has been logged online:

Bug reference:      5010
Logged by:          Lampa
Email address:      lampacz@gmail.com
PostgreSQL version: 8.4.0
Operating system:   Debian testing/unstable
Description:        perl iconv function returns ? character
Details:

See the difference (example is the best explanation):

psql -U postgres -p 5433
psql (8.4.0, server 8.3.7)
WARNING: psql version 8.4, server version 8.3.
         Some psql features might not work.
Type "help" for help.

postgres=# select my_ascii2('Bockaničová');
  my_ascii2
-------------
 Bockanicova
(1 row)

psql -U postgres -p 5432
psql (8.4.0)
Type "help" for help.

postgres=# select my_ascii2('Bockaničová');
  my_ascii2
-------------
 Bockani?ov?
(1 row)


function my_ascii2 is defined:
CREATE FUNCTION my_ascii2(text) RETURNS text AS $$ use strict; use
Text::Iconv; my $conv = Text::Iconv->new("UTF8", "ASCII//TRANSLIT"); return
$conv->convert($_[0]); $$ LANGUAGE plperlu;

8.3.x version works perfectly, 8.4.0 problem

in more complicated queries (joins, conditions) after my_ascii2 function
query are returned incorect count of rows

Re: BUG #5010: perl iconv function returns ? character

From
Robert Haas
Date:
On Tue, Aug 25, 2009 at 8:15 AM, Lampa<lampacz@gmail.com> wrote:
>
> The following bug has been logged online:
>
> Bug reference: =C2=A0 =C2=A0 =C2=A05010
> Logged by: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Lampa
> Email address: =C2=A0 =C2=A0 =C2=A0lampacz@gmail.com
> PostgreSQL version: 8.4.0
> Operating system: =C2=A0 Debian testing/unstable
> Description: =C2=A0 =C2=A0 =C2=A0 =C2=A0perl iconv function returns ? cha=
racter
> Details:
>
> See the difference (example is the best explanation):
>
> psql -U postgres -p 5433
> psql (8.4.0, server 8.3.7)
> WARNING: psql version 8.4, server version 8.3.
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 Some psql features might not work.
> Type "help" for help.
>
> postgres=3D# select my_ascii2('Bockani=C4=8Dov=C3=A1');
> =C2=A0my_ascii2
> -------------
> =C2=A0Bockanicova
> (1 row)
>
> psql -U postgres -p 5432
> psql (8.4.0)
> Type "help" for help.
>
> postgres=3D# select my_ascii2('Bockani=C4=8Dov=C3=A1');
> =C2=A0my_ascii2
> -------------
> =C2=A0Bockani?ov?
> (1 row)
>
>
> function my_ascii2 is defined:
> CREATE FUNCTION my_ascii2(text) RETURNS text AS $$ use strict; use
> Text::Iconv; my $conv =3D Text::Iconv->new("UTF8", "ASCII//TRANSLIT"); re=
turn
> $conv->convert($_[0]); $$ LANGUAGE plperlu;
>
> 8.3.x version works perfectly, 8.4.0 problem

I can't reproduce this on 8.4.0 or CVS HEAD.  I think that whatever
problem you have here is not a PostgreSQL bug.

> in more complicated queries (joins, conditions) after my_ascii2 function
> query are returned incorect count of rows

This may be related to whatever your other problem is.

...Robert

Re: BUG #5010: perl iconv function returns ? character

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Aug 25, 2009 at 8:15 AM, Lampa<lampacz@gmail.com> wrote:
>> function my_ascii2 is defined:
>> CREATE FUNCTION my_ascii2(text) RETURNS text AS $$ use strict; use
>> Text::Iconv; my $conv = Text::Iconv->new("UTF8", "ASCII//TRANSLIT"); return
>> $conv->convert($_[0]); $$ LANGUAGE plperlu;
>>
>> 8.3.x version works perfectly, 8.4.0 problem

> I can't reproduce this on 8.4.0 or CVS HEAD.  I think that whatever
> problem you have here is not a PostgreSQL bug.

I suspect that function will only work as desired in a database with
UTF8 server_encoding.  Maybe the problem is the 8.4 database is set up
with some other encoding?

            regards, tom lane

Re: BUG #5010: perl iconv function returns ? character

From
Lampa
Date:
Cluster is created with cs_CZ.UTF-8 collation.

                                  List of databases
   Name    |  Owner   | Encoding |  Collation  |    Ctype    |
Access privileges
-----------+----------+----------+-------------+-------------+-------------=
----------
 postgres  | postgres | UTF8     | cs_CZ.UTF-8 | cs_CZ.UTF-8 |
 template0 | postgres | UTF8     | cs_CZ.UTF-8 | cs_CZ.UTF-8 | =3Dc/postgres
                                                             :
postgres=3DCTc/postgres
 template1 | postgres | UTF8     | cs_CZ.UTF-8 | cs_CZ.UTF-8 | =3Dc/postgres
                                                             :
postgres=3DCTc/postgres
(3 rows)




2009/9/6 Tom Lane <tgl@sss.pgh.pa.us>:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Aug 25, 2009 at 8:15 AM, Lampa<lampacz@gmail.com> wrote:
>>> function my_ascii2 is defined:
>>> CREATE FUNCTION my_ascii2(text) RETURNS text AS $$ use strict; use
>>> Text::Iconv; my $conv =3D Text::Iconv->new("UTF8", "ASCII//TRANSLIT"); =
return
>>> $conv->convert($_[0]); $$ LANGUAGE plperlu;
>>>
>>> 8.3.x version works perfectly, 8.4.0 problem
>
>> I can't reproduce this on 8.4.0 or CVS HEAD. =A0I think that whatever
>> problem you have here is not a PostgreSQL bug.
>
> I suspect that function will only work as desired in a database with
> UTF8 server_encoding. =A0Maybe the problem is the 8.4 database is set up
> with some other encoding?
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0regards, tom lane
>



--=20
Lampa

Re: BUG #5010: perl iconv function returns ? character

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Aug 25, 2009 at 8:15 AM, Lampa<lampacz@gmail.com> wrote:
>> function my_ascii2 is defined:
>> CREATE FUNCTION my_ascii2(text) RETURNS text AS $$ use strict; use
>> Text::Iconv; my $conv = Text::Iconv->new("UTF8", "ASCII//TRANSLIT"); return
>> $conv->convert($_[0]); $$ LANGUAGE plperlu;
>>
>> 8.3.x version works perfectly, 8.4.0 problem

> I can't reproduce this on 8.4.0 or CVS HEAD.  I think that whatever
> problem you have here is not a PostgreSQL bug.

Hmm ... I can reproduce the problem on Fedora 11.  Given a UTF8-encoded
database (I don't think locale matters), 8.3.7 works as described, but
8.3.8 fails as described, as do 8.4.1 and HEAD.  Given that the only
difference in plperl.c between 8.3.7 and 8.3.8 is the addition of the
PERL_SYS_INIT3 call, I have to suppose that that's screwing up
Text::Iconv somehow.

I'd bet a small amount of money that this is somehow related to the
UTF8-specific code in plperl_safe_init(), which always struck me
as unexplained hocus-pocus.  Since the test function is plperlu,
plperl_safe_init() obviously can't be directly to blame; but I'm
thinking that what it's really doing is papering over some missed
initialization issue that affects plperlu functions too.

            regards, tom lane

Re: BUG #5010: perl iconv function returns ? character

From
Tom Lane
Date:
I wrote:
> Hmm ... I can reproduce the problem on Fedora 11.  Given a UTF8-encoded
> database (I don't think locale matters), 8.3.7 works as described, but
> 8.3.8 fails as described, as do 8.4.1 and HEAD.  Given that the only
> difference in plperl.c between 8.3.7 and 8.3.8 is the addition of the
> PERL_SYS_INIT3 call, I have to suppose that that's screwing up
> Text::Iconv somehow.

Huh ... belay that.  Diking out the PERL_SYS_INIT3 call doesn't make
the problem go away.

What I was actually comparing was the current Fedora 11 8.3.7 RPMs
with 8.3.8 built from source.  I would have said that the RPMs are
not built in any way significantly different from a straight
configure-and-build-from-source, but it appears that something in
the RPM build options makes this work.  Investigating ...

(Whether this has anything to do with the OP's problem on Debian
remains to be determined, but it's definitely busted on Fedora.)

            regards, tom lane

Re: BUG #5010: perl iconv function returns ? character

From
Devrim GÜNDÜZ
Date:
On Sun, 2009-09-06 at 12:52 -0400, Tom Lane wrote:
> I would have said that the RPMs are
> not built in any way significantly different from a straight
> configure-and-build-from-source, but it appears that something in
> the RPM build options makes this work.  Investigating ...

Could it be because of perl-Text-Iconv package?
--
Devrim GÜNDÜZ, RHCE
Command Prompt - http://www.CommandPrompt.com
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr                  http://www.gunduz.org

Re: BUG #5010: perl iconv function returns ? character

From
Tom Lane
Date:
Devrim GÜNDÜZ <devrim@gunduz.org> writes:
> On Sun, 2009-09-06 at 12:52 -0400, Tom Lane wrote:
>> I would have said that the RPMs are
>> not built in any way significantly different from a straight
>> configure-and-build-from-source, but it appears that something in
>> the RPM build options makes this work.  Investigating ...

> Could it be because of perl-Text-Iconv package?

Well, you have to install that before you can test the problem at all,
but the working and non-working cases are using the same Text::IConv
code.  I think I just figured it out though.  I had dismissed locale as
not being the critical difference, but that was foolish (and I paid for
it with an hour of wasted effort).  My RPM installation is working
because it defaults to en_US locale, and my source installation is not
working because it uses C locale.  If I switch to either en_US or cz_CZ
locale then Text::IConv gives the expected result.

I now believe that the OP's actual problem is related to this:
http://archives.postgresql.org/pgsql-committers/2009-07/msg00098.php
He's probably ending up in C locale internally.  If so it'll be fixed
in 8.4.1.

The only observation not accounted for is Robert's statement that he
couldn't reproduce it in 8.4.0 --- but I think the behavior with the bug
is dependent on the postmaster's starting environment, so it would be
easy to fail to duplicate someone else's result.
        regards, tom lane