Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken - Mailing list pgsql-general

From Thomas Munro
Subject Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
Date
Msg-id CAEepm=0EfHkDyBQUB1YDH6rgd_EVQnvVNaTf_cUwVMGu8aNjQQ@mail.gmail.com
Whole thread Raw
In response to Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken  ("Maeldron T." <maeldron@gmail.com>)
List pgsql-general
On Wed, Dec 7, 2016 at 10:40 PM, Maeldron T. <maeldron@gmail.com> wrote:
> Anyway, ICU is turned on for PostgreSQL 9.6 even in the pkg version. Hurray.

Hmm, a curious choice, considering that FreeBSD finally has built-in
collations that work!

Using the port's ICU patch doesn't change anything about the risks
here as far as I know.  ICU continually pulls corrections and
improvements from CLDR, and FreeBSD's libc is now doing the same, so
actually both ICU and libc might be getting the very same stream of
collation changes, just at different speeds.

IMHO we can't continue to ignore this problem: we need to teach
Postgres to recognise when collations change.  That could mean
adopting like ICU and then tracking when every index is potentially
invalidated by a version change (see Peter Eisentraut's proposal), or
adopting non-portable implementation-specific techniques.  For the
latter, we'd probably not want to litter core Postgres with
assumptions about how each OS does things.  One fairly blunt approach
I came up with goes like this:

1.  Add a new optional GUC system_collation_version_command =
'/some/user/supplied/script.sh'.
2.  When postmaster starts, run it and remember the output in memory.
3.  When a database is created, store it for this database.
4.  When connecting to a database, complain loudly if version at
startup doesn't match the stored version.
5.  Update the stored value to the startup value when you REINDEX
DATABASE (ie that's how to clear the warning).

Then package mantainers could supply a script that know how to do the
right thing on this OS.  For example it could be the package version
string from the currently installed locales package, or an MD5 hash of
the contents of all files in /usr/share/locales/, or whatever suits.
The reason for only running the script at postmaster startup is that
there is a risk of libc caching data, so that a REINDEX would use old
data but running the command would see new files on disk, so we need
to make sure that a cluster restart is necessary after upgrading your
OS to clear the warning.

That's horribly blunt: it makes you reindex the whole database even if
you don't use a collation that changed, or don't even use btrees, etc.
You could do something more nuanced and complicated that works at the
level of individual locales and indexes (see the link I posted earlier
to check_pg_collations for some ideas), but the idea would be
basically the same.

--
Thomas Munro
http://www.enterprisedb.com


pgsql-general by date:

Previous
From: "Sinclair, Ian D (Ian)"
Date:
Subject: warning about oom_adj with PG 9.4 logger
Next
From: Rob Sargent
Date:
Subject: Re: high transaction rate