Re: Format base - Code contribution - Mailing list pgsql-hackers

From Miles Elam
Subject Re: Format base - Code contribution
Date
Msg-id CAPVvHdMtkMJ-7+X7koio9i4rsvgKww6GwEyofkSrBOJd-jhFpQ@mail.gmail.com
Whole thread Raw
In response to Re: Format base - Code contribution  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
Hi Chris, thanks for the reply.

On Wed, Apr 25, 2018 at 8:03 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
Personally, I think this is a better candidate for being incorporated
directly rather than as a contrib. This sort of utility is much less
useful if you cannot rely on it being present.

I guess I've gotten used to the idea of contrib being both a test bed of newer functionality—e.g., tsearch—without committing to a final API in the core. Also can't imagine using PostgreSQL without pgcrypto being available. But this is a perception issue on my part. I'm looking into where to put this into core now.
 
I'm not convinced by the wisdom of adding int8 overloads, etc, with a
second argument. I'd rather this be named as a separate function. I
realise that many programming languages do this, but it's IMO less
discoverable this way, and might make our life harder if we later need
to overload these functions in a different way.

Totally fair observation. Easier for users in the short term, may be harder in the long term.
 
We already have to_hex. So to_base seems a reasonable choice. Then
adding a from_hex, from_base seems natural.

I have some misgivings about the existing to_hex now that I've had a chance to go over it. It follows the printf model with %x for integers. I feel this was a mistake. Hexadecimal, while enormously useful for bitwise analysis, is still an output for human eyes. The fact that a negative int value could be substantially different from a negative bigint value is problematic. I understand the underlying reason for it, but a cursory check in the mailing list archives shows more than a couple folks who got tripped up by it.

I do not think that base 10 output should be wildly different from base 16 (or base 8). I don't think anyone would consider it intuitive to print out, for example, 2147483647 for to_base(-1, 10), yet that's exactly what's done for base 16 with the current implementation of to_hex. I see these problems as apples and oranges. To be more precise, I consider the current to_hex to be wrong, but too late to fix. to_bitwise_hex, to_raw_hex, or similar would be more appropriate. In C, it's clear at all times what the size may be. Within an SQL query, things can become far more ambiguous.

Most modern, high-level languages will present 15 as hex F and -15 as hex -F, which is uniform no matter the underlying type size. All numeric types in PostgreSQL are signed. Getting a wildly different value because some smallint got silently coverted into an integer is non-intuitive to say the least.

So it would appear there should be a strict demarcation between to_hex and the proposed to_base.
 
Bonus points if you add
to/from base64 and oct while you're at it.

I can happily do it, but again, I think from_hex and from_oct should follow as inverses to_hex and to_oct, not to_base/from_base for the reasons given above. As for base64, that's another problematic one. To most folks, base64 means a binary encoding of data into ASCII. Again, solving a different problem. I think it would be a good idea to avoid mixed messages to the user here even if to the point of limiting to an upper limit of base 62 (0-9, A-Z, and a-z) and erroring out above that. I'd like to go to 64 if for no other reason than the power of 2 affinity, but I don't think it should be done lightly at the expense of user confusion. On the bright side, encode/decode are both well-established within PostgreSQL and clearly dealing with bytea values rather than integer values.

We don't seem to have a "from_hex" or "int8_from_hex", which is a
bewildering oversight really, and we don't accept literals:

Thanks for the illustration into PostgreSQL parser behavior. The flexibility of PostgreSQL can obviously be both a curse and a blessing. Hoping I can add to the blessings.

--
      Quidquid latine dictum sit, altum sonatur.      
    - Whatever is said in Latin sounds profound.

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Parallel Aggregates for string_agg and array_agg
Next
From: David Rowley
Date:
Subject: Re: Should we add GUCs to allow partition pruning to be disabled?