Home > mailing lists

Re: Pre-proposal: unicode normalized text - Mailing list pgsql-hackers

From	Isaac Morland
Subject	Re: Pre-proposal: unicode normalized text
Date	October 5, 2023 16:10:23
Msg-id	CAMsGm5fFwAa1=kUxwODk0dNhj7b57Q54BgdxL+Cep+tCHTWi-g@mail.gmail.com Whole thread Raw
In response to	Re: Pre-proposal: unicode normalized text (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Pre-proposal: unicode normalized text
List	pgsql-hackers

Tree view

On Thu, 5 Oct 2023 at 07:32, Robert Haas <robertmhaas@gmail.com> wrote:

But I do think that sometimes users are reluctant to perform encoding
conversions on the data that they have. Sometimes they're not
completely certain what encoding their data is in, and sometimes
they're worried that the encoding conversion might fail or produce
wrong answers. In theory, if your existing data is validly encoded and
you know what encoding it's in and it's easily mapped onto UTF-8,
there's no problem. You can just transcode it and be done. But a lot
of times the reality is a lot messier than that.

In the case you describe, the users don’t have text at all; they have bytes, and a vague belief about what encoding the bytes might be in and therefore what characters they are intended to represent. The correct way to store that in the database is using bytea. Text types should be for when you know what characters you want to store. In this scenario, the implementation detail of what encoding the database uses internally to write the data on the disk doesn't matter, any more than it matters to a casual user how a table is stored on disk.

Similarly, I don't believe we have a "YMD" data type which stores year, month, and day, without being specific as to whether it's Gregorian or Julian; if you have that situation, make a 3-tuple type or do something else. "Date" is for when you actually know what day you want to record.

pgsql-hackers by date:

From: Robert Haas
Date: 05 October 2023, 15:51:40
Subject: Re: pgBufferUsage.blk_{read|write}_time are zero although there are pgBufferUsage.local_blks_{read|written}

From: Bharath Rupireddy
Date: 05 October 2023, 16:13:30
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node

Re: Pre-proposal: unicode normalized text - Mailing list pgsql-hackers

Previous

Next