Re: Pre-proposal: unicode normalized text - Mailing list pgsql-hackers

From Isaac Morland
Subject Re: Pre-proposal: unicode normalized text
Date
Msg-id CAMsGm5c86VfCeqJe-2O32ph7RLEJ0xVL3XRahPTg6YJcxahzLw@mail.gmail.com
Whole thread Raw
In response to Re: Pre-proposal: unicode normalized text  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Fri, 6 Oct 2023 at 15:07, Jeff Davis <pgsql@j-davis.com> wrote:
On Fri, 2023-10-06 at 13:33 -0400, Robert Haas wrote:
> What I think people really want is a whole column in
> some encoding that isn't the normal one for that database.

Do people really want that? I'd be curious to know why.

A lot of modern projects are simply declaring UTF-8 to be the "one true
way". I am not suggesting that we do that, but it seems odd to go in
the opposite direction and have greater flexibility for many encodings.

And even if they want it, we can give it to them when we send/accept the data from the client; just because they want to store ISO-8859-1 doesn't mean the actual bytes on the disk need to be that. And by "client" maybe I mean the client end of the network connection, and maybe I mean the program that is calling in to libpq.

If they try to submit data that cannot possibly be encoded in the stated encoding because the bytes they submit don't correspond to any string in that encoding, then that is unambiguously an error, just as trying to put February 30 in a date column is an error.

Is there a single other data type where anybody is even discussing letting the client tell us how to write the data on disk?

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Pre-proposal: unicode normalized text
Next
From: Nathan Bossart
Date:
Subject: Re: [PoC/RFC] Multiple passwords, interval expirations