Thread: Using UTF strings in pg8.3 - storing hexadecimal values in bytea columns

Using UTF strings in pg8.3 - storing hexadecimal values in bytea columns

From
Mario Splivalo
Date:
I have this issue:

postgres=# select E'\xc5\x53\x94\x96\x83\x29';
ERROR:  invalid byte sequence for encoding "UTF8": 0xc553
HINT:  This error can also happen if the byte sequence does not match 
the encoding expected by the server, which is controlled by 
"client_encoding".
postgres=# show client_encoding ; client_encoding
----------------- UTF8
(1 row)

postgres=# show server_encoding ; server_encoding
----------------- UTF8
(1 row)

postgres=# select version();                                            version 

------------------------------------------------------------------------------------------------ PostgreSQL 8.3.5 on
i686-pc-linux-gnu,compiled by GCC gcc (GCC) 4.2.4 
 
(Ubuntu 4.2.4-1ubuntu3)
(1 row)




On postgres 8.2 this worked:

postgres=# select version();                                                    version 

----------------------------------------------------------------------------------------------------------------
PostgreSQL8.2.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 
 
20061115 (prerelease) (Debian 4.1.1-21)
(1 row)

postgres=# select E'\xc5\x53\x94\x96\x83\x29'; ?column?
---------- S)
(1 row)

postgres=# show client_encoding ; client_encoding
----------------- UTF8
(1 row)

postgres=# show server_encoding ; server_encoding
----------------- UTF8
(1 row)


I'm using the above mentioned string to store data into bytea column. I 
did pg_dump of the database on postgres 8.2, and then tried to restore 
it on postgres 8.3, and I got this error. The actuall line that produces 
error is like this:

INSERT INTO vpn_payins_bitfield (vpn_id, payload_pattern, encription, 
encription_key, charset, amount_width, shop_width, counter_width) VALUES 
(3, E'\\W*(\\w+)(?:\\W+(.*))?', 'RC4', 
E'\xc5\x53\x94\x96\x83\x29'::bytea, 'ABCDEGHIKLMOPTWX', 16, 8, 16);

The error is:
ERROR: invalid byte sequence for encoding "UTF8": 0xc553


Now, I see that I can type: "SELECT E'\xFF'" in pg8.2, but can't do that 
in pg8.3.

So, my question is, how do I specify hexadecimal value of C5 to be 
stored in bytea column, in an INSERT statement?
Mike


Mario Splivalo <mario.splivalo@megafon.hr> writes:
> I have this issue:
> postgres=# select E'\xc5\x53\x94\x96\x83\x29';
> ERROR:  invalid byte sequence for encoding "UTF8": 0xc553

This is expected since the string is not valid as text.

> I'm using the above mentioned string to store data into bytea column. I 
> did pg_dump of the database on postgres 8.2, and then tried to restore 
> it on postgres 8.3, and I got this error. The actuall line that produces 
> error is like this:

> INSERT INTO vpn_payins_bitfield (vpn_id, payload_pattern, encription, 
> encription_key, charset, amount_width, shop_width, counter_width) VALUES 
> (3, E'\\W*(\\w+)(?:\\W+(.*))?', 'RC4', 
> E'\xc5\x53\x94\x96\x83\x29'::bytea, 'ABCDEGHIKLMOPTWX', 16, 8, 16);

Exactly what version of pg_dump are you using?  What I get from pg_dump
doesn't look like that.  Bytea fields with -D look more like this:

INSERT INTO foo (f1) VALUES ('\\305S\\224\\226\\203)');
        regards, tom lane


Re: Using UTF strings in pg8.3 - storing hexadecimal values in bytea columns

From
Mario Splivalo
Date:
Tom Lane wrote:
>> I'm using the above mentioned string to store data into bytea column. I 
>> did pg_dump of the database on postgres 8.2, and then tried to restore 
>> it on postgres 8.3, and I got this error. The actuall line that produces 
>> error is like this:
> 
>> INSERT INTO vpn_payins_bitfield (vpn_id, payload_pattern, encription, 
>> encription_key, charset, amount_width, shop_width, counter_width) VALUES 
>> (3, E'\\W*(\\w+)(?:\\W+(.*))?', 'RC4', 
>> E'\xc5\x53\x94\x96\x83\x29'::bytea, 'ABCDEGHIKLMOPTWX', 16, 8, 16);
> 
> Exactly what version of pg_dump are you using?  What I get from pg_dump
> doesn't look like that.  Bytea fields with -D look more like this:
> 
> INSERT INTO foo (f1) VALUES ('\\305S\\224\\226\\203)');

Yes, I mistakenly used pg8.2 pg_dump, when I use pg3.8 dump I get what 
you get. Btw, what is that S after 305? 305 octal is C5 hexadecimal. How 
do I enter hexadecimal C5 without UTF encoding errors?
Mike


Mario Splivalo <mario.splivalo@megafon.hr> writes:
> Tom Lane wrote:
>> Exactly what version of pg_dump are you using?  What I get from pg_dump
>> doesn't look like that.  Bytea fields with -D look more like this:
>> 
>> INSERT INTO foo (f1) VALUES ('\\305S\\224\\226\\203)');

> Yes, I mistakenly used pg8.2 pg_dump, when I use pg3.8 dump I get what 
> you get.

I was quoting the output of 8.2.latest pg_dump.  Maybe you have a very
old subrelease?  But no version of pg_dump would've put an explicit
cast to bytea in there.

> Btw, what is that S after 305?

Hex 53 is 'S' I believe.

> 305 octal is C5 hexadecimal. How 
> do I enter hexadecimal C5 without UTF encoding errors?

bytea only supports octal, so \\305 is the way to do it.  The way you
were doing it was guaranteed to fail on corner cases such as \0 and \
itself, because you were converting at the string-literal stage not
byteain().
        regards, tom lane


Re: Using UTF strings in pg8.3 - storing hexadecimal values in bytea columns

From
Richard Huxton
Date:
Mario Splivalo wrote:
> Richard Huxton wrote:
>> Mario Splivalo wrote:
>>> I have this issue:
>>>
>>> postgres=# select E'\xc5\x53\x94\x96\x83\x29';
>>> ERROR:  invalid byte sequence for encoding "UTF8": 0xc553
>>
>> I think you want to be using octal escapes. That's text you're
>> generating above.
>>
>> CREATE TABLE bytea_test (b bytea);
>> INSERT INTO bytea_test (b) VALUES (E'\\305\\123\\224\\226');
>> SELECT * FROM bytea_test;
>>        b
>> ---------------
>>  \305S\224\226
>> (1 row)
> 
> That's true, but I'd still like to use hexadecimal notation. Manual
> states that I could say '\xC5', but then I get those encoding errors.

I think you're reading the "text" rather than "bytea" part of the manual.

4.1.2.1. String Constants
"...and \xhexdigits, where hexdigits represents a hexadecimal byte
value. (It is your responsibility that the byte sequences you create are
valid characters in the server character set encoding.)"

No mention of hex in the bytea section of the manual.

--  Richard Huxton Archonet Ltd


Richard Huxton <dev@archonet.com> writes:
> Mario Splivalo wrote:
>> That's true, but I'd still like to use hexadecimal notation.

You could use decode():

regression=# select decode('c5a4', 'hex'); decode  
----------\305\244
(1 row)
        regards, tom lane


Re: Using UTF strings in pg8.3 - storing hexadecimal values in bytea columns

From
Mario Splivalo
Date:
Richard Huxton wrote:
> Mario Splivalo wrote:
>> Richard Huxton wrote:
>>> Mario Splivalo wrote:
>>>> I have this issue:
>>>>
>>>> postgres=# select E'\xc5\x53\x94\x96\x83\x29';
>>>> ERROR:  invalid byte sequence for encoding "UTF8": 0xc553
>>> I think you want to be using octal escapes. That's text you're
>>> generating above.
>>>
>>> CREATE TABLE bytea_test (b bytea);
>>> INSERT INTO bytea_test (b) VALUES (E'\\305\\123\\224\\226');
>>> SELECT * FROM bytea_test;
>>>        b
>>> ---------------
>>>  \305S\224\226
>>> (1 row)
>> That's true, but I'd still like to use hexadecimal notation. Manual
>> states that I could say '\xC5', but then I get those encoding errors.
> 
> I think you're reading the "text" rather than "bytea" part of the manual.
> 
> 4.1.2.1. String Constants
> "...and \xhexdigits, where hexdigits represents a hexadecimal byte
> value. (It is your responsibility that the byte sequences you create are
> valid characters in the server character set encoding.)"
> 
> No mention of hex in the bytea section of the manual.
> 

Hm, you're right. I guess that part of manual confuses me. Nevertheless, 
I'd still like to be able to enter hexadecimal values to the bytea 
fields. As Tom Lane suggested, I can do this:

test1=# select decode('C5', 'hex'); decode
-------- \305
(1 row)


But, if I want it other way around, I get the error:

test1=# select envode(E'\305', 'hex');
ERROR:  invalid byte sequence for encoding "UTF8": 0xc5
HINT:  This error can also happen if the byte sequence does not match 
the encoding expected by the server, which is controlled by 
"client_encoding".
test1=#

Is there a way to circumvent encoding when dealing with binary data? Or 
am I completely confused here and have no clue what am I talking about?
Mike


Re: Using UTF strings in pg8.3 - storing hexadecimal values in bytea columns

From
Mario Splivalo
Date:
Tom Lane wrote:
> Mario Splivalo <mario.splivalo@megafon.hr> writes:
>> Tom Lane wrote:
>>> Exactly what version of pg_dump are you using?  What I get from pg_dump
>>> doesn't look like that.  Bytea fields with -D look more like this:
>>>
>>> INSERT INTO foo (f1) VALUES ('\\305S\\224\\226\\203)');
> 
>> Yes, I mistakenly used pg8.2 pg_dump, when I use pg3.8 dump I get what 
>> you get.
> 
> I was quoting the output of 8.2.latest pg_dump.  Maybe you have a very
> old subrelease?  But no version of pg_dump would've put an explicit
> cast to bytea in there.

mike@som:~$ pg_dump -V
pg_dump (PostgreSQL) 8.2.4
mike@som:~$

Since I need to have my servers running 24/7 with NO downtime I seldom 
choose to upgrade minor versions, unless there is a major bug that 
affects me. This upgrade from 8.2 to 8.3 is planned, and I have liberty 
of having 3-4 hours of downtime.

> 
>> Btw, what is that S after 305?
> 
> Hex 53 is 'S' I believe.

Still don't get it :) If I have hexadecimal value of C5, that is octal 
305, and I don't get where that S came from.

> 
>> 305 octal is C5 hexadecimal. How 
>> do I enter hexadecimal C5 without UTF encoding errors?
> 
> bytea only supports octal, so \\305 is the way to do it.  The way you
> were doing it was guaranteed to fail on corner cases such as \0 and \
> itself, because you were converting at the string-literal stage not
> byteain().

Ok, that makes sense. Since I just store that data into the database, 
maybe I could store them as strings (varchars), and then do the 
conversion on the client side (java).
Mike


Re: Using UTF strings in pg8.3 - storing hexadecimal values in bytea columns

From
Mario Splivalo
Date:
Mario Splivalo wrote:
> But, if I want it other way around, I get the error:
> 
> test1=# select envode(E'\305', 'hex');
> ERROR:  invalid byte sequence for encoding "UTF8": 0xc5
> HINT:  This error can also happen if the byte sequence does not match 
> the encoding expected by the server, which is controlled by 
> "client_encoding".
> test1=#
> 
> Is there a way to circumvent encoding when dealing with binary data? Or 
> am I completely confused here and have no clue what am I talking about?
> 

Now, this works:

test1=# select encode(decode('C5', 'hex'), 'hex'); encode
-------- c5
(1 row)


I see that my client encoding has a problem interpreting '\xC5':

test1=# select E'\xC5';
ERROR:  invalid byte sequence for encoding "UTF8": 0xc5
HINT:  This error can also happen if the byte sequence does not match 
the encoding expected by the server, which is controlled by 
"client_encoding".

So the only way is to use decode function, right?
Mike


Postgres version of all_ind_cols

From
"Bart van Houdt"
Date:
Hi all,

This might be a stupid question, but I wasn't able to find any information on it, so here it goes:
Oracle knows a view which contains information about indexed columns (all_ind_cols), is there something similar
availablein Postgres?
 

I want to be able to create an full overview of each column with the following information:
- index name
- table name the index is on
- column name the index is on
- column position in the index
- sort order

Any help would be much appreciated :)

Regards,

Bart van Houdt
Syfact International B.V.
Database developer

Re: Postgres version of all_ind_cols

From
Mario Splivalo
Date:
Bart van Houdt wrote:
> Hi all,
> 
> This might be a stupid question, but I wasn't able to find any information on it, so here it goes:
> Oracle knows a view which contains information about indexed columns (all_ind_cols), is there something similar
availablein Postgres?
 
> 
> I want to be able to create an full overview of each column with the following information:
> - index name
> - table name the index is on
> - column name the index is on
> - column position in the index
> - sort order
> 
> Any help would be much appreciated :)
> 

You should check this, for instance:
http://www.postgresql.org/docs/8.3/interactive/monitoring-stats.html#MONITORING-STATS-VIEWS

There is, also, information_schema in postgres, as well as pg_catalog, 
you may wish to check those.
Mike


Re: Postgres version of all_ind_cols

From
"Bart van Houdt"
Date:
Cool!
Thx :D

Bart van Houdt
Syfact International B.V.
Database developer

-----Original Message-----
From: Mario Splivalo [mailto:mario.splivalo@megafon.hr] 
Sent: dinsdag 11 november 2008 15:03
To: pgsql-sql@postgresql.org
Cc: Bart van Houdt
Subject: Re: [SQL] Postgres version of all_ind_cols

Bart van Houdt wrote:
> Hi all,
> 
> This might be a stupid question, but I wasn't able to find any information on it, so here it goes:
> Oracle knows a view which contains information about indexed columns (all_ind_cols), is there something similar
availablein Postgres?
 
> 
> I want to be able to create an full overview of each column with the following information:
> - index name
> - table name the index is on
> - column name the index is on
> - column position in the index
> - sort order
> 
> Any help would be much appreciated :)
> 

You should check this, for instance:
http://www.postgresql.org/docs/8.3/interactive/monitoring-stats.html#MONITORING-STATS-VIEWS

There is, also, information_schema in postgres, as well as pg_catalog, 
you may wish to check those.
Mike

Re: Postgres version of all_ind_cols

From
"A. Kretschmer"
Date:
am  Tue, dem 11.11.2008, um 14:57:20 +0100 mailte Bart van Houdt folgendes:
> Hi all,

Please, don't hijack other threads.


> 
> This might be a stupid question, but I wasn't able to find any information on it, so here it goes:
> Oracle knows a view which contains information about indexed columns (all_ind_cols), is there something similar
availablein Postgres?
 
> 
> I want to be able to create an full overview of each column with the following information:
> - index name
> - table name the index is on
> - column name the index is on
> - column position in the index
> - sort order
> 
> Any help would be much appreciated :)

Take a look at:
http://www.postgresql.org/docs/current/interactive/catalog-pg-index.html


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net