Thread: SELECT d02name::bytea FROM ... && DBI::Pg

SELECT d02name::bytea FROM ... && DBI::Pg

From
Matthias Apitz
Date:
Hello,

I can SELECT && print a column in hex with:

pos71=# select d02name::bytea from d02ben where d02bnr = '00001048313' ;
...
 \x50c3a46461676f67697363686520486f6368736368756c65205765696e67617274656e2020 ...

but when I use the same in Perl DBI::Pg with:

...
$sth=$dbh->prepare( "select d02name::bytea from d02ben where d02bnr = '00001048313'");

$sth->execute or die "exec error\n".${DBI::errstr}."\n";

while ( (@row = $sth->fetchrow_array) )  {
   print "$row[0]\n";
}

It prints the UTF-8 string and not the hex string:

./utf8-01.pl                                                  
DBI is version 1.642, DBD::Pg is version 3.8.0
client_encoding=UTF8, server_encoding=UTF8
Pädagogische Hochschule Weingarten

I checked with strace that the PG server delivers to DBI:Pg the
hex string, so it must be converted somewhere (or in print ...) again to
UTF-8. See below.

Any ideas?

    matthias


recvfrom(3, "T\0\0\0

\0\1d02name\0\0\0\0\0\0\0\0\0\0\21\377\377\377\377\377\377\0\0D\0\0\0\376\0\1\0\0\0\364\\x50c3a46461676f67697363686520486f6368736368756c65205765696e67617274656e2020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020C\0\0\0\rSELECT
1\0Z\0\0\0\5I", 16384, 0, NULL, NULL) = 308
write(1, "P\303\244dagogische Hochschule Weingarten                 


-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!



Re: SELECT d02name::bytea FROM ... && DBI::Pg

From
Francisco Olarte
Date:
Matthias:

On Thu, Oct 10, 2019 at 1:25 PM Matthias Apitz <guru@unixarea.de> wrote:
> I can SELECT && print a column in hex with:
> pos71=# select d02name::bytea from d02ben where d02bnr = '00001048313' ;
> ...
>  \x50c3a46461676f67697363686520486f6368736368756c65205765696e67617274656e2020 ...
>
> but when I use the same in Perl DBI::Pg with:
> $sth=$dbh->prepare( "select d02name::bytea from d02ben where d02bnr = '00001048313'");
...
> It prints the UTF-8 string and not the hex string:

May be because perl does not need the contents in hex DBI converts it
to a native string, which is very similar to a "bytea" inside the db (
not in the wire protocol, but it does a similar thing for numbers ).

Notice once you have a string in perl it is trivial to convert it to hex:
$ perl -e 'print(unpack("H*", "1234ABCD"),"\n")'
3132333441424344
$ perl -e 'print(unpack("H*", "día año"),"\n")'
64c3ad612061c3b16f

Francisco Olarte.



Re: SELECT d02name::bytea FROM ... && DBI::Pg

From
Matthias Apitz
Date:
El día jueves, octubre 10, 2019 a las 06:46:02p. m. +0200, Francisco Olarte escribió:

> Matthias:
> 
> On Thu, Oct 10, 2019 at 1:25 PM Matthias Apitz <guru@unixarea.de> wrote:
> > I can SELECT && print a column in hex with:
> > pos71=# select d02name::bytea from d02ben where d02bnr = '00001048313' ;
> > ...
> >  \x50c3a46461676f67697363686520486f6368736368756c65205765696e67617274656e2020 ...
> >
> > but when I use the same in Perl DBI::Pg with:
> > $sth=$dbh->prepare( "select d02name::bytea from d02ben where d02bnr = '00001048313'");
> ...
> > It prints the UTF-8 string and not the hex string:
> 
> May be because perl does not need the contents in hex DBI converts it
> to a native string, which is very similar to a "bytea" inside the db (
> not in the wire protocol, but it does a similar thing for numbers ).

Hmm. But *I* do need the content in hex to see if the varchar column
contains correct encoded UTF-8 data. We're on the way to port a huge
database application from Sybase to PostgreSQL and are facing any kind of
problems one can think of. Magically, sometimes strings, expected to be
coded in UTF-8, arrive in the Perl $variables coded in ISO-8859-1 and than cause
other problems when German Umlauts should be translated into HTML
encodings like ü etc. to be presented in the web browser.

Perl (and Java) sucks, it does magic things below the surface of
string (objects). That's why I like C :-)

    matthias
-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



Re: SELECT d02name::bytea FROM ... && DBI::Pg

From
Francisco Olarte
Date:
Matthias:


On Thu, Oct 10, 2019 at 7:26 PM Matthias Apitz <guru@unixarea.de> wrote:
> Hmm. But *I* do need the content in hex to see if the varchar column
> contains correct encoded UTF-8 data. We're on the way to port a huge
> database application from Sybase to PostgreSQL and are facing any kind of
> problems one can think of. Magically, sometimes strings, expected to be
> coded in UTF-8, arrive in the Perl $variables coded in ISO-8859-1 and than cause
> other problems when German Umlauts should be translated into HTML
> encodings like ü etc. to be presented in the web browser.

This seems to be a perl problem ( I've had my share of them ). I
suppose you can convince perl to upgrade all your vas to utf-8, but I
do not remember how.

Anyway, if you want "the hex representation of the bytea equivalent of
a field ( of which I do not rememeber the original type ), why don't
you ask for it?

If you ask for bytea you are asking for a binary string. The fact it
is transmitted hex-encoded in the wire is an implementation detail.

psql is a text mode interface. To represent binary strings to you it
needs text, so it uses hex.

perl can do binary, so it puts the content in binary. After all your
bytea could have been a jpeg image for all perl/DBI knows.

But you can easily tell perl to pass a binary to hex, with unpack.

OTOH, if you want, you can tell postgres to send you hex, look for
encode/decode in the relevant manual pages, they are under binary
string functions, IIRC. The pg will build a text hex string, send it
on the wire by whatever mechanism it chooses and you'll get the hex
data from DBI. Do not ask pg+DBI for binary ( ::bytea ) and expect
text.

> Perl (and Java) sucks, it does magic things below the surface of
> string (objects). That's why I like C :-)

They have differing degrees of suckines. I've read the Java String
sources, and been horrified by them. Perl, OTOH, sucks for many
things, but has its points. And C, I try to use C++ for everything I
can ( many times using it as just a better ( for me ), C compiler,
I've been known for writting huge chunks of C++ but use malloc/stdio
and friends all around, but I've found programs have way less bugs if
using thighter types ). It's a taste question, and of course, I
wouldn't like to do my hundreds of <100 liners in perl for parsing
huge texts and extracting some data in C. But utf-8/latin-1, you're
right, you have umlauts, we have accendts and n-tildes, I've been
there and will be again. Enough off-topic anyway.

Francisco Olarte.



Re: SELECT d02name::bytea FROM ... && DBI::Pg

From
Christoph Moench-Tegeder
Date:
## Matthias Apitz (guru@unixarea.de):

> Hmm. But *I* do need the content in hex to see if the varchar column
> contains correct encoded UTF-8 data.

 select 'Hello'::bytea::text;

Regards,
Christoph

-- 
Spare Space



Re: SELECT d02name::bytea FROM ... && DBI::Pg

From
Andrew Gierth
Date:
>>>>> "Matthias" == Matthias Apitz <guru@unixarea.de> writes:

 Matthias> Hello,

 Matthias> I can SELECT && print a column in hex with:

 Matthias> pos71=# select d02name::bytea from d02ben where d02bnr = '00001048313' ;

If your intention is to see the raw bytes of the stored text value, for
example to check the encoding, then you should not do it that way.
Casting from text to bytea does NOT do what you think.

Instead use  convert_to(d02name,'SQL_ASCII')  (yes, always specify
SQL_ASCII regardless of what encoding you think it should be in, since
the actual meaning of SQL_ASCII is "no conversions"). For readability,
you may then want to wrap that as
encode(convert_to(d02name,'SQL_ASCII'),'escape')  which will keep the
ASCII characters but use \nnn escapes for non-ascii.

-- 
Andrew (irc:RhodiumToad)