Re: Collation versioning - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Collation versioning
Date
Msg-id CAOBaU_Zm7J6Jd_Bw9oFz5JMdFWY5fX_oD9qPQ1qRhXj-wzrS3Q@mail.gmail.com
Whole thread Raw
In response to Re: Collation versioning  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Collation versioning  (Julien Rouhaud <rjuju123@gmail.com>)
Re: Collation versioning  (Thomas Munro <thomas.munro@gmail.com>)
Re: Collation versioning  (Thomas Munro <thomas.munro@gmail.com>)
Re: Collation versioning  (Julien Rouhaud <rjuju123@gmail.com>)
List pgsql-hackers
Hello Thomas,

On Mon, Nov 4, 2019 at 4:58 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Fri, Nov 1, 2019 at 2:21 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > Are you planning to continue working on it?  For the record, that's
> > something needed to be able to implement a filter in REINDEX command
> > [1].
>
> Bonjour Julien,
>
> Unfortunately I haven't had time to work on it seriously, but here's a
> quick rebase to get the proof-of-concept back into working shape.
> It's nice to see progress in other bits of the problem-space.  I hope
> to have time to look at this patch set again soon, but if you or
> someone else would like hack on or think about it too, please feel
> free!

Thanks!  I already did some hack on it when looking at the code so I
can try to make some progress.

> Yes indeed this is exactly the same problem that you're trying to
> solve, approached from a different starting point.
>
> Here are some problems to think about:
>
> * We'd need to track dependencies on the default collation once we
> have versioning for that (see
> https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a%402ndquadrant.com).
> That is how most people actually consume collations out there in real
> life, and yet we don't normally track dependencies on the default
> collation and I don't know if that's simply a matter of ripping out
> all the code that looks like "xxx != DEFAULT_COLLATION_ID" in the
> dependency analysis code or if there's more to it.

This isn't enough.  What would remain is:

- teach get_collation_version_for_oid() to return the  default
collation name, which is simple
- have recordDependencyOnVersion() actually records the dependency,
which wouldn't happen as the default collation is pinned.

An easy fix would be to teach isObjectPinned() to ignore
CollationRelationId / DEFAULT_COLLATION_OID, but that's ugly and would
allow too many dependencies to be stored.  Not pinning the default
collation during initdb doesn't sound a good alternative either.
Maybe adding a force flag or a new DependencyType that'd mean "normal
but forced" would be ok?

> * Andres mentioned off-list that pg_depend rows might get blown away
> and recreated in some DDL circumstances.  We need to look into that.
> * Another is that pg_upgrade won't preserve pg_depend rows, so you'd
> need some catalog manipulation (direct or via new DDL) to fix that.
> * Some have expressed doubt that pg_depend is the right place for
> this; let's see if any counter-proposals appear.

When working on the REINDEX FILTER, I totally missed this thread and
wrote a POC saving the version in pg_index.  That's not ideal though,
as you need to record multiple version strings.  In my version I used
a json type, using the collprovider  as the key, but that's not enough
for ICU as each collation can have a different version string.  I'm
not a huge fan of using pg_depend to record the version, but storing a
collprovider/collname -> version per row in pg_index is definitely a
no go, so I don't have any better counter-proposal.

>
> > # reindex table t1;
> > WARNING:  01000: index "t1_val_idx" depends on collation 13330 version
> > "a153.97.35.8", but the current version is "153.97.35.8"
> > DETAIL:  The index may be corrupted due to changes in sort order.
> > HINT:  REINDEX to avoid the risk of corruption.
> > LOCATION:  index_check_collation_version, index.c:1263
>
> Duh.  Yeah, that's stupid and needs to be fixed somehow.

I don't have a clever solution for that either.



pgsql-hackers by date:

Previous
From: Kuntal Ghosh
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Next
From: vignesh C
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions