Re: How can I return a UTF8 string from a hex representation in a latin9 database? - Mailing list pgsql-general
From | Erik Wienhold |
---|---|
Subject | Re: How can I return a UTF8 string from a hex representation in a latin9 database? |
Date | |
Msg-id | q3w55c2ihmd62x2kpdw6sreazrolt2hsi5t4yjd6myoqw5y6sj@iimlufjzgkhw Whole thread Raw |
In response to | How can I return a UTF8 string from a hex representation in a latin9 database? ("PGUser2020" <pg@diorite.uk>) |
Responses |
Re: How can I return a UTF8 string from a hex representation in a latin9 database?
|
List | pgsql-general |
On 2023-11-19 21:37 +0100, PGUser2020 wrote: > I have a database for a legacy application that requires an 8 bit > database (i.e. the application itself won't function on a UTF8 > database). Technically speaking, UTF-8 is an 8-bit encoding. But I guess that application would then show mojibake if UTF-8 were stored. > Looking at ways to extend the functionality to be able to handle a few > specified fields in Unicode. > > Had the idea to store a UTF8 string as either hex pairs or Base64 > inside a VARCHAR field, which is fine. I can do that. Do you have to use existing Latin-9 text columns to store UTF-8? If not then I'd go with bytea instead of text (varchar) if possible and also supported by your client. Otherwise it may be difficult to distinguish between "normal" Latin-9 text and the hex- or base64-encoded UTF-8. Although bytea could also store anything, not just UTF-8, so you'd have to deal with invalid data anyway. > What needs to happen though, is to build a view, that will return the > decoded hex (or b64) as a UTF8 string to a client which has specified > client encoding UTF8. Is the same client sending and reading that data? If yes, why can't the client do the hex-encoding of the UTF-8 string and only send/read those encoded strings so that database won't event see UTF-8? Why must the database be involved in this custom encoding scheme instead of just storing BLOBs (either as bytea or some encoded text)? > I've tried various combinations of convert_from, and convert_to, and > convert, but I just can't seem to get it to return the string a UTF8 > select to the client. > > So if I have this data: > > select * from mytable; mycolumn > ------------------------------------------ > ceb120ceb220ceb320ceb420ceb520cf83cf84 > > Then: > > select convert_from(decode(mycolumn, 'hex')::bytea, 'utf-8') from mytable where usr='BATCH'; > > ERROR: character with byte sequence 0xce 0xb1 in encoding "UTF8" has no equivalent in encoding "LATIN9" > > So the database encoding is still relevant , this is expected by the > description of convert_from in the documentation of course. > > Is there some combination of functions I can use to have a client > select this column from this table in a LATIN9 database and get a UTF8 > string back? The client can disable encoding conversion by setting client_encoding to sql_ascii: latin9_test=# show server_encoding; server_encoding ----------------- LATIN9 (1 row) latin9_test=# set client_encoding to sql_ascii; SET latin9_test=# show client_encoding; client_encoding ----------------- SQL_ASCII (1 row) latin9_test=# select convert_from(decode('ceb120ceb220ceb320ceb420ceb520cf83cf84', 'hex'), 'sql_ascii'); convert_from --------------------- α β γ δ ε στ (1 row) Maybe that's also an option for your client. -- Erik
pgsql-general by date: