Re: Why don't I get a LATIN1 encoding here with SET ENCODING? - Mailing list pgsql-sql
From | Craig Ringer |
---|---|
Subject | Re: Why don't I get a LATIN1 encoding here with SET ENCODING? |
Date | |
Msg-id | 4AF10245.3050707@postnewspapers.com.au Whole thread Raw |
In response to | Re: Why don't I get a LATIN1 encoding here with SET ENCODING? (Bryce Nesbitt <bryce2@obviously.com>) |
List | pgsql-sql |
Bryce Nesbitt wrote: > Craig Ringer wrote: >> Yes - but you are *not* presenting a Latin-1 character. You're >> presenting four Latin-1 characters: >> >> '\', '3', '7', '5' > Well, then I have a different question. If I can view a bytea column as > so: > >> select object from context_objects where context_key=100041; > object > ----------------------------- > \005\007\021\007Article\003 > (1 row) > > How can I modify the data and put it back into the column? Presuming the working environment: CREATE TABLE context_objects( id serial primary key, object bytea not null ); -- to avoid E'' and double-escaping SET standard_conforming_strings = on; INSERT INTO context_objects (object) VALUES ( '\005\007\021\007Article\003' ), ( 'test\375' ); I'd just SELECT the `object' value, receiving it in escape form from the PostgreSQL protocol: SELECT object FROM context_objects object ----------------------------- \005\007\021\007Article\003 ... which is a string of 27 characters in the local client_encoding. So in Perl (or whatever) I'd have a string of length 27: \005\007\021\007Article\003 I'd then convert that by parsing the escape sequences. (As Perl uses \0 as the octal escape prefix instead of just \, it's necessary to write a simple loop for that. I don't do Perl voluntarily and haven't used it for a while so I won't write one inline here, but it should be easy enough.) Once you have the byte string (length 12 bytes) you manipulate it however you want, then convert it back to octal escape form for sending to the Pg server. client_encoding has no effect on any of this so long as you're using a client encoding that preserves the lower 128 characters, ie basically anything except shift-jis. Now, if your 'object' is actually a string in a single known text encoding, not just a sequence of bytes, then another approach is possible. First, stop using `bytea' for text. If the byte sequences are all known to be latin-1 encoded text, for example, use: ALTER TABLE context_objects ALTER COLUMN object TYPE text USING ( convert_from(object, 'latin-1')); -- -- Now retrieve the rows, which will be returned as text in -- the client_encoding. Note that \375 is ý in latin-1. -- testdb=> SELECT object FROM context_objects; object -----------------------------\x05\x07\x11\x07Article\x03testý (2 rows) (Note: If they're all of different encodings, but you know the encodings, you can make the encoding param of convert_from a column reference instead). Now you have 'object' as server-side utf-8 encoded text that'll be automatically converted to and from the specified client_encoding . If you want to get the raw server-side-encoded byte sequence you can always cast to bytea in your query to get the utf-8 byte sequence in octal escape form: testdb=> SELECT object::bytea FROM context_objects; object -----------------------------\005\007\021\007Article\003test\303\275 ... but usually you'd just let the DB > I'm trying to track down while a ten year old system no longer works > after a Perl DBI upgrade. Is it just Perl DBI? Or is it also Pg? Could it be related to the standards_conforming_strings change that affects string escaping ? -- Craig Ringer