Re: Collation versioning - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Collation versioning
Date
Msg-id CA+hUKGLaR8zXhXhirNG5aT2EQP8ARFimHFai7iUgdYz_jNkocw@mail.gmail.com
Whole thread Raw
In response to Re: Collation versioning  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: Collation versioning
List pgsql-hackers
On Fri, Sep 28, 2018 at 9:30 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 16/09/2018 20:12, Douglas Doole wrote:
> > All this collation stuff is great, and I know users want it, but it
> > feels like were pushing them out of an airplane with a ripped parachute
> > every time the collation libraries change. Maybe they'll land safely or
> > maybe things will get very messy.
>
> At some point, a schema designer also needs to take some responsibility
> for making smart choices for longevity.  It is known that collations can
> change, and the sort of changes that can happen are also generally
> understood.  So if you want to use range partitioning on text fields,
> maybe you shouldn't, or at least choose the ranges conservatively.
> Similarly, maybe you shouldn't have timestamp range partition boundaries
> around DST changes or on the 29th of every month, and maybe you
> shouldn't partition float values at negative zero.  Some ideas are
> better than others.  We will help you recognize and fix breakage, but we
> can't prevent it altogether.

Since there's a chance of an "unconference" session on locale versions
tomorrow at PGCon, here's a fresh rebase of the patchset to add
per-database-object collation version tracking.  It doesn't handle
default collations yet (not hard AFAIK, will try that soon), but it
does work well enough to demonstrate the generate principal.  I won't
attach the CHECK support just yet, because it needs more work, but the
point of it was to demonstrate that pg_depend can handle this for all
kinds of database objects in one standard way, rather than sprinkling
collation version stuff all over the place in pg_index, pg_constraint,
etc, and I think it did that already.

postgres=# create table t (k text collate "en-x-icu");
CREATE TABLE
postgres=# create index on t(k);
CREATE INDEX
postgres=# select refobjversion from pg_depend where refobjversion != '';
 refobjversion
---------------
 153.72
(1 row)

Mess with it artificially (or install a different version of ICU):
postgres=# update pg_depend set refobjversion = '42' where
refobjversion = '153.72';
UPDATE 1

In a new session, we get a warning when first loading the index
because the version doesn't match:
postgres=# select * from t where k = 'x';
psql: WARNING:  index "t_k_idx" depends on collation 12711 version
"42", but the current version is "153.72"
DETAIL:  The index may be corrupted due to changes in sort order.
HINT:  REINDEX to avoid the risk of corruption.
 k
---
(0 rows)

The warning can be cleared for the indexes on that one table like so:
postgres=# reindex table t;
REINDEX

You can see that it's captured the new version:
postgres=# select refobjversion from pg_depend where refobjversion != '';
 refobjversion
---------------
 153.72
(1 row)

-- 
Thomas Munro
https://enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Indexing - comparison of tree structures
Next
From: David Rowley
Date:
Subject: Re: Confusing error message for REINDEX TABLE CONCURRENTLY