Re: Why don't I get a LATIN1 encoding here with SET ENCODING? - Mailing list pgsql-sql

From Craig Ringer
Subject Re: Why don't I get a LATIN1 encoding here with SET ENCODING?
Date
Msg-id 4AF10245.3050707@postnewspapers.com.au
Whole thread Raw
In response to Re: Why don't I get a LATIN1 encoding here with SET ENCODING?  (Bryce Nesbitt <bryce2@obviously.com>)
List pgsql-sql
Bryce Nesbitt wrote:
> Craig Ringer wrote:
>> Yes - but you are *not* presenting a Latin-1 character. You're
>> presenting four Latin-1 characters:
>>
>>   '\', '3', '7', '5'
> Well, then I have a different question.  If I can view a bytea column as
> so:
> 
>> select object from context_objects where context_key=100041;
>           object
> -----------------------------
> \005\007\021\007Article\003
> (1 row)
> 
> How can I modify the data and put it back into the column?

Presuming the working environment:



CREATE TABLE context_objects( id serial primary key, object bytea not null
);

-- to avoid E'' and double-escaping
SET standard_conforming_strings = on;

INSERT INTO context_objects (object)
VALUES ( '\005\007\021\007Article\003' ), ( 'test\375' );




I'd just SELECT the `object' value, receiving it in escape form from the
PostgreSQL protocol:

SELECT object FROM context_objects          object
-----------------------------
\005\007\021\007Article\003

... which is a string of 27 characters in the local client_encoding. So
in Perl (or whatever) I'd have a string of length 27:
 \005\007\021\007Article\003

I'd then convert that by parsing the escape sequences. (As Perl uses \0
as the octal escape prefix instead of just \, it's necessary to write a
simple loop for that. I don't do Perl voluntarily and haven't used it
for a while so I won't write one inline here, but it should be easy enough.)

Once you have the byte string (length 12 bytes) you manipulate it
however you want, then convert it back to octal escape form for sending
to the Pg server.

client_encoding has no effect on any of this so long as you're using a
client encoding that preserves the lower 128 characters, ie basically
anything except shift-jis.



Now, if your 'object' is actually a string in a single known text
encoding, not just a sequence of bytes, then another approach is
possible. First, stop using `bytea' for text. If the byte sequences are
all known to be latin-1 encoded text, for example, use:

ALTER TABLE context_objects ALTER COLUMN object TYPE text
USING ( convert_from(object, 'latin-1'));

--
-- Now retrieve the rows, which will be returned as text in
-- the client_encoding. Note that \375 is ý in latin-1.
--
testdb=> SELECT object FROM context_objects;           object
-----------------------------\x05\x07\x11\x07Article\x03testý
(2 rows)

(Note: If they're all of different encodings, but you know the
encodings, you can make the encoding param of convert_from a column
reference instead).

Now you have 'object' as server-side utf-8 encoded text that'll be
automatically converted to and from the specified client_encoding . If
you want to get the raw server-side-encoded byte sequence you can always
cast to bytea in your query to get the utf-8 byte sequence in octal
escape form:

testdb=> SELECT object::bytea FROM context_objects;          object
-----------------------------\005\007\021\007Article\003test\303\275


... but usually you'd just let the DB




> I'm trying to track down while a ten year old system no longer works
> after a Perl DBI upgrade.

Is it just Perl DBI? Or is it also Pg?

Could it be related to the standards_conforming_strings change that
affects string escaping ?

--
Craig Ringer


pgsql-sql by date:

Previous
From: Bryce Nesbitt
Date:
Subject: Re: Why don't I get a LATIN1 encoding here with SET ENCODING?
Next
From: Craig Ringer
Date:
Subject: Re: Why don't I get a LATIN1 encoding here with SET ENCODING?