Home > mailing lists

Re: Collation version tracking for macOS - Mailing list pgsql-hackers

From	Jeremy Schneider
Subject	Re: Collation version tracking for macOS
Date	June 3, 2022 19:13:33
Msg-id	1874de62-6bec-4bc1-1d14-0a2730b125da@ardentperf.com Whole thread Raw
In response to	Re: Collation version tracking for macOS (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Collation version tracking for macOS
List	pgsql-hackers

Tree view

On 6/3/22 9:21 AM, Tom Lane wrote:
> 
> According to that document, they changed it in macOS 11, which came out
> a year and a half ago.  Given the lack of complaints, it doesn't seem
> like this is urgent enough to mandate a post-beta change that would
> have lots of downside (namely, false-positive warnings for every other
> macOS update).

Sorry, I'm going to rant for a minute... it is my very strong opinion
that using language like "false positive" here is misguided and dangerous.

If new version of sort order is released, for example when they recently
updated backwards-secondary sorting in french [CLDR-2905] or matching of
v and w in swedish and finnish [CLDR-7088], it is very dangerous to use
language like “false positive” to describe a database where there just
didn't happen to be any rows with accented french characters at the
point in time where PostgreSQL magically changed which version of sort
order it was using from the 2010 french version to the 2020 french version.

No other piece of software that calls itself a database would do what
PostgreSQL is doing: just give users a "warning" after suddenly changing
the sort order algorithm (most users won't even read warnings in their
logs). Oracle, DB2, SQL Server and even MySQL carefully version
collation data, hardcode a pseudo-linguistic collation into the DB (like
PG does for timezones), and if they provide updates to linguistic sort
order (from Unicode CLDR) then they allow the user to explicitly specify
which version of french or german ICU sorting they are want to use.
Different versions are treated as different sort orders; they are not
conflated.

I have personally seen PostgreSQL databases where an update to an old
version of glibc was applied (I'm not even talking 2.28 here) and it
resulted in data loss b/c crash recovery couldn't replay WAL records and
the user had to do a PITR. That's aside from the more common issues of
segfaults or duplicate records that violate unique constraints or wrong
query results like missing data. And it's not just updates - people can
set up a hot standby on a different version and see many of these
problems too.

Collation versioning absolutely must be first class and directly
controlled by users, and it's very dangerous to allow users - at all -
to take an index and then use a different version than what the index
was built with.

Not to mention all the other places in the DB where collation is used...
partitioning, constraints, and any other place where persisted data can
make an assumption about any sort of string comparison.

It feels to me like we're still not really thinking clearly about this
within the PG community, and that the seriousness of this issue is not
fully understood.

-Jeremy Schneider

-- 
http://about.me/jeremy_schneider

pgsql-hackers by date:

From: Nathan Bossart
Date: 03 June 2022, 17:29:11
Subject: Re: Proposal: adding a better description in psql command about large objects

From: Thomas Munro
Date: 03 June 2022, 19:23:05
Subject: Re: Collation version tracking for macOS

Re: Collation version tracking for macOS - Mailing list pgsql-hackers

Previous

Next