Home > mailing lists

Re: Perl DBI converts UTF-8 again to UTF-8 before sending it to theserver - Mailing list pgsql-general

From	Christoph Moench-Tegeder
Subject	Re: Perl DBI converts UTF-8 again to UTF-8 before sending it to theserver
Date	October 12, 2019 16:14:24
Msg-id	20191012131424.GA2452@elch.exwg.net Whole thread Raw
In response to	Re: Perl DBI converts UTF-8 again to UTF-8 before sending it to theserver (Matthias Apitz <guru@unixarea.de>)
Responses	Re: Perl DBI converts UTF-8 again to UTF-8 before sending it to the server
List	pgsql-general

Tree view

## Matthias Apitz (guru@unixarea.de):

> but when I now fetch the first row with:
> 
>    @row = $sth->fetchrow_array;
>    $HexStr = unpack("H*", $row[0]);
>    print "HexStr: " . $HexStr . "\n";
>    print "$row[0]\n";
> 
> The resulting column contains ISO data:

As expected: https://perldoc.perl.org/perluniintro.html
  Specifically, if all code points in the string are 0xFF or less, Perl
  uses the native eight-bit character set.

> P<E4>dagogische Hochschule Weingarten

And then it doesn't know that your terminal expects UTF-8 (perl
just dumps the binary string here), because you didn't tell it:
"binmode(STDOUT, ':encoding(utf8)')" would fix that.
See: https://perldoc.perl.org/perlunifaq.html specifically "What if I
don't decode?", "What if I don't encode?" and "Is there a way to
automatically decode or encode?".

The whole PostgreSQL-DBI-UTF8-thingy is working: use "Tĳl Müller"
as test data (that's the dutch "ij"-digraph in there, a character
decidedly not in "latin-9/15" and therefore not "0xFF or less").
That will break your "unpack('H*')" - it tries to unpack that wide
character into a hex byte and "Character in 'H' format wrapped in
unpack". Use "print(join(' ', unpack('U*', $row[0])))" to see that
the ĳ has codepoint 307 (decimal).

Regards,
Christoph

-- 
Spare Space

pgsql-general by date:

From: Andreas Joseph Krogh
Date: 12 October 2019, 14:36:24
Subject: Re: Segmentation fault with PG-12

From: "Daniel Verite"
Date: 12 October 2019, 16:17:55
Subject: Re: Case Insensitive Comparison with Postgres 12

Re: Perl DBI converts UTF-8 again to UTF-8 before sending it to theserver - Mailing list pgsql-general

Previous

Next