Thread: How can I return a UTF8 string from a hex representation in a latin9 database?
How can I return a UTF8 string from a hex representation in a latin9 database?
From
"PGUser2020"
Date:
Hello. Hopefully I can explain myself I have a database for a legacy application that requires an 8 bit database (i.e. the application itself won't function ona UTF8 database). Looking at ways to extend the functionality to be able to handle a few specified fields in Unicode. Had the idea to store a UTF8 string as either hex pairs or Base64 inside a VARCHAR field, which is fine. I can do that. What needs to happen though, is to build a view, that will return the decoded hex (or b64) as a UTF8 string to a client whichhas specified client encoding UTF8. I've tried various combinations of convert_from, and convert_to, and convert, but I just can't seem to get it to returnthe string a UTF8 select to the client. So if I have this data: select * from mytable; mycolumn ------------------------------------------ ceb120ceb220ceb320ceb420ceb520cf83cf84 Then: select convert_from(decode(mycolumn, 'hex')::bytea, 'utf-8') from mytable where usr='BATCH'; ERROR: character with byte sequence 0xce 0xb1 in encoding "UTF8" has no equivalent in encoding "LATIN9" So the database encoding is still relevant , this is expected by the description of convert_from in the documentation ofcourse. Is there some combination of functions I can use to have a client select this column from this table in a LATIN9 databaseand get a UTF8 string back? Any thoughts appreciated, thank you.
Re: How can I return a UTF8 string from a hex representation in a latin9 database?
From
Erik Wienhold
Date:
On 2023-11-19 21:37 +0100, PGUser2020 wrote: > I have a database for a legacy application that requires an 8 bit > database (i.e. the application itself won't function on a UTF8 > database). Technically speaking, UTF-8 is an 8-bit encoding. But I guess that application would then show mojibake if UTF-8 were stored. > Looking at ways to extend the functionality to be able to handle a few > specified fields in Unicode. > > Had the idea to store a UTF8 string as either hex pairs or Base64 > inside a VARCHAR field, which is fine. I can do that. Do you have to use existing Latin-9 text columns to store UTF-8? If not then I'd go with bytea instead of text (varchar) if possible and also supported by your client. Otherwise it may be difficult to distinguish between "normal" Latin-9 text and the hex- or base64-encoded UTF-8. Although bytea could also store anything, not just UTF-8, so you'd have to deal with invalid data anyway. > What needs to happen though, is to build a view, that will return the > decoded hex (or b64) as a UTF8 string to a client which has specified > client encoding UTF8. Is the same client sending and reading that data? If yes, why can't the client do the hex-encoding of the UTF-8 string and only send/read those encoded strings so that database won't event see UTF-8? Why must the database be involved in this custom encoding scheme instead of just storing BLOBs (either as bytea or some encoded text)? > I've tried various combinations of convert_from, and convert_to, and > convert, but I just can't seem to get it to return the string a UTF8 > select to the client. > > So if I have this data: > > select * from mytable; mycolumn > ------------------------------------------ > ceb120ceb220ceb320ceb420ceb520cf83cf84 > > Then: > > select convert_from(decode(mycolumn, 'hex')::bytea, 'utf-8') from mytable where usr='BATCH'; > > ERROR: character with byte sequence 0xce 0xb1 in encoding "UTF8" has no equivalent in encoding "LATIN9" > > So the database encoding is still relevant , this is expected by the > description of convert_from in the documentation of course. > > Is there some combination of functions I can use to have a client > select this column from this table in a LATIN9 database and get a UTF8 > string back? The client can disable encoding conversion by setting client_encoding to sql_ascii: latin9_test=# show server_encoding; server_encoding ----------------- LATIN9 (1 row) latin9_test=# set client_encoding to sql_ascii; SET latin9_test=# show client_encoding; client_encoding ----------------- SQL_ASCII (1 row) latin9_test=# select convert_from(decode('ceb120ceb220ceb320ceb420ceb520cf83cf84', 'hex'), 'sql_ascii'); convert_from --------------------- α β γ δ ε στ (1 row) Maybe that's also an option for your client. -- Erik
Re: How can I return a UTF8 string from a hex representation in a latin9 database?
From
"PGUser2020"
Date:
On 2023-11-20 02:59 +00:00 GMT, "Erik Wienhold" <ewie@ewie.name> wrote: > On 2023-11-19 21:37 +0100, PGUser2020 wrote: > > Technically speaking, UTF-8 is an 8-bit encoding. But I guess that > application would then show mojibake if UTF-8 were stored. > Yes sorry, I should have said single byte rather than 8 bit. There must be no possibility that a single character occupiesmore than one byte as the (e.g.) varchar(10) and char(5) fields overflow lengths otherwise. > > Do you have to use existing Latin-9 text columns to store UTF-8? If not > then I'd go with bytea instead of text (varchar) if possible and also > supported by your client. Otherwise it may be difficult to distinguish > between "normal" Latin-9 text and the hex- or base64-encoded UTF-8. > Although bytea could also store anything, not just UTF-8, so you'd have > to deal with invalid data anyway. > I do have to use existing columns yes, and they are varchar latin9 columns. > Is the same client sending and reading that data? If yes, why can't the > client do the hex-encoding of the UTF-8 string and only send/read those > encoded strings so that database won't event see UTF-8? Why must the > database be involved in this custom encoding scheme instead of just > storing BLOBs (either as bytea or some encoded text)? > So one of the external clients applications which is interacting with this database will do just that -- it will make a hexstring from its utf8 input and store that in a varchar > > The client can disable encoding conversion by setting client_encoding to > sql_ascii: > > latin9_test=# show server_encoding; > server_encoding > ----------------- > LATIN9 > (1 row) > > latin9_test=# set client_encoding to sql_ascii; > SET > latin9_test=# show client_encoding; > client_encoding > ----------------- > SQL_ASCII > (1 row) > > latin9_test=# select convert_from(decode('ceb120ceb220ceb320ceb420ceb520cf83cf84', 'hex'), 'sql_ascii'); > convert_from > --------------------- > α β γ δ ε στ > (1 row) > > Maybe that's also an option for your client. > It is very useful and exactly what I was looking for thanks. This technique should allow me to create a login, mask a table with a view containing this decode, and use search_path toget the view returned in preference to the base table.