Home > mailing lists

Re: Pre-proposal: unicode normalized text - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: Pre-proposal: unicode normalized text
Date	October 5, 2023 00:37:40
Msg-id	94a0e631b787d56fec1b17e03fe045b5e14907c1.camel@j-davis.com Whole thread Raw
In response to	Re: Pre-proposal: unicode normalized text (Isaac Morland <isaac.morland@gmail.com>)
Responses	Re: Pre-proposal: unicode normalized text
List	pgsql-hackers

Tree view

On Wed, 2023-10-04 at 14:14 -0400, Isaac Morland wrote:
> Always store only UTF-8 in the database

What problem does that solve? I don't see our encoding support as a big
source of problems, given that database-wide UTF-8 already works fine.
In fact, some postgres features only work with UTF-8.

I agree that we shouldn't add a bunch of bookkeeping and type system
support for per-column encodings without a clear use case, because that
would have a cost. But right now it's just a database-wide thing.

I don't see encodings as a major area to solve problems or innovate. At
the end of the day, encodings have little semantic significance, and
therefore limited upside and limited downside. Collations and
normalization get more interesting, but those are happening at a higher
layer than the encoding.

> What about characters not in UTF-8?

Honestly I'm not clear on this topic. Are the "private use" areas in
unicode enough to cover use cases for characters not recognized by
unicode? Which encodings in postgres can represent characters that
can't be automatically transcoded (without failure) to unicode?

Obviously if we have some kind of unicode-based type, it would only
work with encodings that are a subset of unicode.

Regards,
    Jeff Davis

pgsql-hackers by date:

From: Chapman Flack
Date: 05 October 2023, 00:32:50
Subject: Re: Pre-proposal: unicode normalized text

From: Nico Williams
Date: 05 October 2023, 01:15:47
Subject: Re: Pre-proposal: unicode normalized text

Re: Pre-proposal: unicode normalized text - Mailing list pgsql-hackers

Previous

Next