Re: Convert from hex to string - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: Convert from hex to string |
Date | |
Msg-id | 56572995.2030400@aklaver.com Whole thread Raw |
In response to | Re: Convert from hex to string (Francisco Olarte <folarte@peoplecall.com>) |
List | pgsql-general |
On 11/25/2015 11:12 PM, Francisco Olarte wrote: > Hi Adrian: > > On Wed, Nov 25, 2015 at 9:33 PM, Adrian Klaver > <adrian.klaver@aklaver.com> wrote: >> I will grant you that working with encodings is like working with >> timestamps, explicit is better. The thing I am having a problem with is how >> not knowing the context of the bytea value is different in the implicit vs >> explicit case: > >> Granted the above is contrived and bound to fail, but the point is you need >> to know what created the bytea however it got there. Now if you are in >> charge of both ends of the process, then the above is your own fault. >> Otherwise, you are down to detective work on what encoding was used whether >> it was implicit or explicit. As the OP was working in a single context I am >> not seeing the issue in making use of that context to do the heavy lifting. >> For the use cases that you show I agree that a defined >> convert_to/encode/decode/convert_from chain is a best practice and something >> I had not really thought out, so thanks. > > Well, I tend to think as a programmer. So I was thinking of the > scenario where you want to get some text from the database and move it > around, and you control all the code. What I was trying to point is > that using explicit all the character sets that matter are in the > database, so I can move it around freely, as I'm the one defining the > queries, while in the implicit case I have to know, or get from the > catalogs, the encoding of the database. I do not know what context the > OP was working and wanted to point he was mixing types. Postgres has a > lot of them, specially to/from text, and I've found the hard way that > lots of implicit conversions are great for one shot programs or > interactive tests, but relying on implicit type conversions for real > production code, put in a source, causes a lot of problems. In a > single session case you can even use implicit conversion + encode and > then paste the result into a convert_from adding quotes and x and it's > going to work, but if you write down that in code you are going to be > confused if something fails later, things like: > > cdrs=> select encode('Año','hex'); > encode > ---------- > 41c3b16f > cdrs=> select convert_from('\x41c3b16f', 'UTF-8'); > convert_from > -------------- > Año > (1 row) > > seem like you are converting back and forth, but then: > > cdrs=> select convert_from(encode('Año','hex'), 'UTF-8'); > ERROR: function convert_from(text, unknown) does not exist > LINE 1: select convert_from(encode('Año','hex'), 'UTF-8'); > > And also, the encode() example will give different results depending > on database encoding. Using explicit charsets and correct types frees > me from it. After all, to make the full round trip a > covert-to+encode+decode+convert-from is needed, and making it explciti > makes things easier. > > > After all, not everybody knows that all values in the database pass > through a conversion to/from text to be sent to / received from psql, > and that you can use it if you just want the hex digits in the current > database encoding: > > cdrs=> select 'Año'::bytea; > bytea > ------------ > \x41c3b16f > (1 row) > > But I doubt using cast instead of encode can be recomended. > > And for the heavy lifting, you are just saving some keystrokes, which > IMO is a cheap price to pay for having a nearly self documenting > conversion chain. > > Anyway, I think people should be told to respect types, and people > should be teach that strings are sequences of characters, that to do > what people think is 'hex encoding' ( two digits per *byte*, no > delimiter ), you need to first transform the string to bytes, then > hex-encode that. Nearly all the encoding problems I see is because > people thing strings are sequences of bytes, which they ceased to be > when multibyte encodings where detected ( and even without them, in > Java and I think some of the windows NT API Strings where sequences of > 16 bits thingos, first UCS-2. then UTF16. Once people stop trying to > encode/decode strings directly normally they problems vanish. I am going to go over the above some more, but in the mean time thanks for the detailed explanations of a subject I am weak on. I am getting into this in dealing with the Python 2 --> 3 changes in its string/bytes handling. Wish the Python developers had listened to the apart about respecting types and not made str mean two entirely different things in 2 vs 3:) > > Francisco Olarte. > -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: