Re: Convert from hex to string - Mailing list pgsql-general

From Adrian Klaver
Subject Re: Convert from hex to string
Date
Msg-id 56572995.2030400@aklaver.com
Whole thread Raw
In response to Re: Convert from hex to string  (Francisco Olarte <folarte@peoplecall.com>)
List pgsql-general
On 11/25/2015 11:12 PM, Francisco Olarte wrote:
> Hi Adrian:
>
> On Wed, Nov 25, 2015 at 9:33 PM, Adrian Klaver
> <adrian.klaver@aklaver.com> wrote:
>> I will grant you that working with encodings is like working with
>> timestamps, explicit is better. The thing I am having a problem with is how
>> not knowing the context of the bytea value is different in the implicit vs
>> explicit case:
>
>> Granted the above is contrived and bound to fail, but the point is you need
>> to know what created the bytea however it got there. Now if you are in
>> charge of both ends of the process, then the above is your own fault.
>> Otherwise, you are down to detective work on what encoding was used whether
>> it was implicit or explicit. As the OP was working in a single context I am
>> not seeing the issue in making use of that context to do the heavy lifting.
>> For the use cases that you show I agree that a defined
>> convert_to/encode/decode/convert_from chain is a best practice and something
>> I had not really thought out, so thanks.
>
> Well, I tend to think as a programmer. So I was thinking of the
> scenario where you want to get some text from the database and move it
> around, and you control all the code. What I was trying to point is
> that using explicit all the character sets that matter are in the
> database, so I can move it around freely, as I'm the one defining the
> queries, while in the implicit case I have to know, or get from the
> catalogs, the encoding of the database. I do not know what context the
> OP was working and wanted to point he was mixing types. Postgres has a
> lot of them, specially to/from text, and I've found the hard way that
> lots of implicit conversions are great for one shot programs or
> interactive tests, but relying on implicit type conversions for real
> production code, put in a source, causes a lot of problems. In a
> single session case you can even use implicit conversion + encode and
> then paste the result into a convert_from adding quotes and x and it's
> going to work, but if you write down that in code you are going to be
> confused if something fails later, things like:
>
> cdrs=> select encode('Año','hex');
>    encode
> ----------
>   41c3b16f
> cdrs=> select convert_from('\x41c3b16f', 'UTF-8');
>   convert_from
> --------------
>   Año
> (1 row)
>
> seem like you are converting back and forth, but then:
>
> cdrs=> select convert_from(encode('Año','hex'), 'UTF-8');
> ERROR:  function convert_from(text, unknown) does not exist
> LINE 1: select convert_from(encode('Año','hex'), 'UTF-8');
>
> And also, the encode() example will give different results depending
> on database encoding. Using explicit charsets and correct types frees
> me from it. After all, to make the full round trip a
> covert-to+encode+decode+convert-from is needed, and making it explciti
> makes things easier.
>
>
> After all, not everybody knows that all values in the database pass
> through a conversion to/from text to be sent to / received from psql,
> and that you can use it if you just want the hex digits in the current
> database encoding:
>
> cdrs=> select 'Año'::bytea;
>     bytea
> ------------
>   \x41c3b16f
> (1 row)
>
> But I doubt using cast instead of encode can be recomended.
>
> And for the heavy lifting, you are just saving some keystrokes, which
> IMO is a cheap price to pay for having a nearly self documenting
> conversion chain.
>
> Anyway, I think people should be told to respect types, and people
> should be teach that strings are sequences of characters, that to do
> what people think is 'hex encoding' ( two digits per *byte*, no
> delimiter ), you need to first transform the string to bytes, then
> hex-encode that. Nearly all the encoding problems I see is because
> people thing strings are sequences of bytes, which they ceased to be
> when multibyte encodings where detected ( and even without them, in
> Java and I think some of the windows NT API Strings where sequences of
> 16 bits thingos, first UCS-2. then UTF16. Once people stop trying to
> encode/decode strings directly normally they problems vanish.

I am going to go over the above some more, but in the mean time thanks
for the detailed explanations of a subject I am weak on. I am getting
into this in dealing with the Python 2 --> 3 changes in its string/bytes
handling. Wish the Python developers had listened to the apart about
respecting types and not made str mean two entirely different things in
2 vs 3:)

>
> Francisco Olarte.
>


--
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: Karsten Hilbert
Date:
Subject: Re: long transactions / backend memory usage
Next
From: krichy@tvnetwork.hu
Date:
Subject: Re: long transactions / backend memory usage