Home > mailing lists

Re: Pre-proposal: unicode normalized text - Mailing list pgsql-hackers

From	Isaac Morland
Subject	Re: Pre-proposal: unicode normalized text
Date	October 6, 2023 22:15:16
Msg-id	CAMsGm5c86VfCeqJe-2O32ph7RLEJ0xVL3XRahPTg6YJcxahzLw@mail.gmail.com Whole thread Raw
In response to	Re: Pre-proposal: unicode normalized text (Jeff Davis <pgsql@j-davis.com>)
List	pgsql-hackers

Tree view

On Fri, 6 Oct 2023 at 15:07, Jeff Davis <pgsql@j-davis.com> wrote:

On Fri, 2023-10-06 at 13:33 -0400, Robert Haas wrote:
> What I think people really want is a whole column in
> some encoding that isn't the normal one for that database.

Do people really want that? I'd be curious to know why.

A lot of modern projects are simply declaring UTF-8 to be the "one true
way". I am not suggesting that we do that, but it seems odd to go in
the opposite direction and have greater flexibility for many encodings.

And even if they want it, we can give it to them when we send/accept the data from the client; just because they want to store ISO-8859-1 doesn't mean the actual bytes on the disk need to be that. And by "client" maybe I mean the client end of the network connection, and maybe I mean the program that is calling in to libpq.

If they try to submit data that cannot possibly be encoded in the stated encoding because the bytes they submit don't correspond to any string in that encoding, then that is unambiguously an error, just as trying to put February 30 in a date column is an error.

Is there a single other data type where anybody is even discussing letting the client tell us how to write the data on disk?

pgsql-hackers by date:

From: Jeff Davis
Date: 06 October 2023, 22:07:17
Subject: Re: Pre-proposal: unicode normalized text

From: Nathan Bossart
Date: 06 October 2023, 22:26:31
Subject: Re: [PoC/RFC] Multiple passwords, interval expirations

Re: Pre-proposal: unicode normalized text - Mailing list pgsql-hackers

Previous

Next