Thread: SELECT d02name::bytea FROM ... && DBI::Pg
Hello, I can SELECT && print a column in hex with: pos71=# select d02name::bytea from d02ben where d02bnr = '00001048313' ; ... \x50c3a46461676f67697363686520486f6368736368756c65205765696e67617274656e2020 ... but when I use the same in Perl DBI::Pg with: ... $sth=$dbh->prepare( "select d02name::bytea from d02ben where d02bnr = '00001048313'"); $sth->execute or die "exec error\n".${DBI::errstr}."\n"; while ( (@row = $sth->fetchrow_array) ) { print "$row[0]\n"; } It prints the UTF-8 string and not the hex string: ./utf8-01.pl DBI is version 1.642, DBD::Pg is version 3.8.0 client_encoding=UTF8, server_encoding=UTF8 Pädagogische Hochschule Weingarten I checked with strace that the PG server delivers to DBI:Pg the hex string, so it must be converted somewhere (or in print ...) again to UTF-8. See below. Any ideas? matthias recvfrom(3, "T\0\0\0 \0\1d02name\0\0\0\0\0\0\0\0\0\0\21\377\377\377\377\377\377\0\0D\0\0\0\376\0\1\0\0\0\364\\x50c3a46461676f67697363686520486f6368736368756c65205765696e67617274656e2020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020C\0\0\0\rSELECT 1\0Z\0\0\0\5I", 16384, 0, NULL, NULL) = 308 write(1, "P\303\244dagogische Hochschule Weingarten -- Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!
Matthias: On Thu, Oct 10, 2019 at 1:25 PM Matthias Apitz <guru@unixarea.de> wrote: > I can SELECT && print a column in hex with: > pos71=# select d02name::bytea from d02ben where d02bnr = '00001048313' ; > ... > \x50c3a46461676f67697363686520486f6368736368756c65205765696e67617274656e2020 ... > > but when I use the same in Perl DBI::Pg with: > $sth=$dbh->prepare( "select d02name::bytea from d02ben where d02bnr = '00001048313'"); ... > It prints the UTF-8 string and not the hex string: May be because perl does not need the contents in hex DBI converts it to a native string, which is very similar to a "bytea" inside the db ( not in the wire protocol, but it does a similar thing for numbers ). Notice once you have a string in perl it is trivial to convert it to hex: $ perl -e 'print(unpack("H*", "1234ABCD"),"\n")' 3132333441424344 $ perl -e 'print(unpack("H*", "día año"),"\n")' 64c3ad612061c3b16f Francisco Olarte.
El día jueves, octubre 10, 2019 a las 06:46:02p. m. +0200, Francisco Olarte escribió: > Matthias: > > On Thu, Oct 10, 2019 at 1:25 PM Matthias Apitz <guru@unixarea.de> wrote: > > I can SELECT && print a column in hex with: > > pos71=# select d02name::bytea from d02ben where d02bnr = '00001048313' ; > > ... > > \x50c3a46461676f67697363686520486f6368736368756c65205765696e67617274656e2020 ... > > > > but when I use the same in Perl DBI::Pg with: > > $sth=$dbh->prepare( "select d02name::bytea from d02ben where d02bnr = '00001048313'"); > ... > > It prints the UTF-8 string and not the hex string: > > May be because perl does not need the contents in hex DBI converts it > to a native string, which is very similar to a "bytea" inside the db ( > not in the wire protocol, but it does a similar thing for numbers ). Hmm. But *I* do need the content in hex to see if the varchar column contains correct encoded UTF-8 data. We're on the way to port a huge database application from Sybase to PostgreSQL and are facing any kind of problems one can think of. Magically, sometimes strings, expected to be coded in UTF-8, arrive in the Perl $variables coded in ISO-8859-1 and than cause other problems when German Umlauts should be translated into HTML encodings like ü etc. to be presented in the web browser. Perl (and Java) sucks, it does magic things below the surface of string (objects). That's why I like C :-) matthias -- Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub
Matthias: On Thu, Oct 10, 2019 at 7:26 PM Matthias Apitz <guru@unixarea.de> wrote: > Hmm. But *I* do need the content in hex to see if the varchar column > contains correct encoded UTF-8 data. We're on the way to port a huge > database application from Sybase to PostgreSQL and are facing any kind of > problems one can think of. Magically, sometimes strings, expected to be > coded in UTF-8, arrive in the Perl $variables coded in ISO-8859-1 and than cause > other problems when German Umlauts should be translated into HTML > encodings like ü etc. to be presented in the web browser. This seems to be a perl problem ( I've had my share of them ). I suppose you can convince perl to upgrade all your vas to utf-8, but I do not remember how. Anyway, if you want "the hex representation of the bytea equivalent of a field ( of which I do not rememeber the original type ), why don't you ask for it? If you ask for bytea you are asking for a binary string. The fact it is transmitted hex-encoded in the wire is an implementation detail. psql is a text mode interface. To represent binary strings to you it needs text, so it uses hex. perl can do binary, so it puts the content in binary. After all your bytea could have been a jpeg image for all perl/DBI knows. But you can easily tell perl to pass a binary to hex, with unpack. OTOH, if you want, you can tell postgres to send you hex, look for encode/decode in the relevant manual pages, they are under binary string functions, IIRC. The pg will build a text hex string, send it on the wire by whatever mechanism it chooses and you'll get the hex data from DBI. Do not ask pg+DBI for binary ( ::bytea ) and expect text. > Perl (and Java) sucks, it does magic things below the surface of > string (objects). That's why I like C :-) They have differing degrees of suckines. I've read the Java String sources, and been horrified by them. Perl, OTOH, sucks for many things, but has its points. And C, I try to use C++ for everything I can ( many times using it as just a better ( for me ), C compiler, I've been known for writting huge chunks of C++ but use malloc/stdio and friends all around, but I've found programs have way less bugs if using thighter types ). It's a taste question, and of course, I wouldn't like to do my hundreds of <100 liners in perl for parsing huge texts and extracting some data in C. But utf-8/latin-1, you're right, you have umlauts, we have accendts and n-tildes, I've been there and will be again. Enough off-topic anyway. Francisco Olarte.
## Matthias Apitz (guru@unixarea.de): > Hmm. But *I* do need the content in hex to see if the varchar column > contains correct encoded UTF-8 data. select 'Hello'::bytea::text; Regards, Christoph -- Spare Space
>>>>> "Matthias" == Matthias Apitz <guru@unixarea.de> writes: Matthias> Hello, Matthias> I can SELECT && print a column in hex with: Matthias> pos71=# select d02name::bytea from d02ben where d02bnr = '00001048313' ; If your intention is to see the raw bytes of the stored text value, for example to check the encoding, then you should not do it that way. Casting from text to bytea does NOT do what you think. Instead use convert_to(d02name,'SQL_ASCII') (yes, always specify SQL_ASCII regardless of what encoding you think it should be in, since the actual meaning of SQL_ASCII is "no conversions"). For readability, you may then want to wrap that as encode(convert_to(d02name,'SQL_ASCII'),'escape') which will keep the ASCII characters but use \nnn escapes for non-ascii. -- Andrew (irc:RhodiumToad)