Re: Pre-proposal: unicode normalized text - Mailing list pgsql-hackers

From Isaac Morland
Subject Re: Pre-proposal: unicode normalized text
Date
Msg-id CAMsGm5fFwAa1=kUxwODk0dNhj7b57Q54BgdxL+Cep+tCHTWi-g@mail.gmail.com
Whole thread Raw
In response to Re: Pre-proposal: unicode normalized text  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Pre-proposal: unicode normalized text
List pgsql-hackers
On Thu, 5 Oct 2023 at 07:32, Robert Haas <robertmhaas@gmail.com> wrote:
 
But I do think that sometimes users are reluctant to perform encoding
conversions on the data that they have. Sometimes they're not
completely certain what encoding their data is in, and sometimes
they're worried that the encoding conversion might fail or produce
wrong answers. In theory, if your existing data is validly encoded and
you know what encoding it's in and it's easily mapped onto UTF-8,
there's no problem. You can just transcode it and be done. But a lot
of times the reality is a lot messier than that.

In the case you describe, the users don’t have text at all; they have bytes, and a vague belief about what encoding the bytes might be in and therefore what characters they are intended to represent. The correct way to store that in the database is using bytea. Text types should be for when you know what characters you want to store. In this scenario, the implementation detail of what encoding the database uses internally to write the data on the disk doesn't matter, any more than it matters to a casual user how a table is stored on disk.

Similarly, I don't believe we have a "YMD" data type which stores year, month, and day, without being specific as to whether it's Gregorian or Julian; if you have that situation, make a 3-tuple type or do something else. "Date" is for when you actually know what day you want to record.

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: pgBufferUsage.blk_{read|write}_time are zero although there are pgBufferUsage.local_blks_{read|written}
Next
From: Bharath Rupireddy
Date:
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node