Re: Bogus collation version recording in recordMultipleDependencies - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Bogus collation version recording in recordMultipleDependencies
Date
Msg-id CA+hUKGJRWsyKtkQw=TfNgWQryGb3Rz7j07LV9W=9_LCq79wYRQ@mail.gmail.com
Whole thread Raw
In response to Re: Bogus collation version recording in recordMultipleDependencies  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Bogus collation version recording in recordMultipleDependencies
Re: Bogus collation version recording in recordMultipleDependencies
List pgsql-hackers
On Sat, Apr 17, 2021 at 8:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Per the changes in collate.icu.utf8.out, this gets rid of
> a lot of imaginary collation dependencies, but it also gets
> rid of some arguably-real ones.  In particular, calls of
> record_eq and its siblings will be considered not to have
> any collation dependencies, although we know that internally
> those will look up per-column collations of their input types.
> We could imagine special-casing record_eq etc here, but that
> sure seems like a hack.

Thanks for looking into all this.  Hmm.

> I"m starting to have a bad feeling about 257836a75 overall.
> As I think I've complained before, I do not like anything about
> what it's done to pg_depend; it's forcing that relation to serve
> two masters, neither one well. ...

We did worry about (essentially) this question quite a bit in the
discussion thread, but we figured that you'd otherwise have to create
a parallel infrastructure that would look almost identical (for
example [1]).

> ...  We now see that the same remark
> applies to find_expr_references(), because the semantics of
> "which collations does this expression's behavior depend on" aren't
> identical to "which collations need to be recorded as direct
> dependencies of this expression", especially not if you'd prefer
> to minimize either list.  (Which is important.) ...

Bugs in the current analyser code aside, if we had a second catalog
and a second analyser for this stuff, then you'd still have the union
of both minimised sets in total, with some extra duplication because
you'd have some rows in both places that are currently handled by one
row, no?

> ... Moreover, for all
> the complexity it's introducing, it's next door to useless for
> glibc collations --- we might as well tell people "reindex
> everything when your glibc version changes", which could be done
> with a heck of a lot less infrastructure. ...

You do gain reliable tracking of which indexes remain to be rebuilt,
and warnings for common hazards like hot standbys with mismatched
glibc, so I think it's pretty useful.  As for the poverty of
information from glibc, I don't see why it should hold ICU, Windows,
FreeBSD users back.  In fact I am rather hoping that by shipping this,
glibc developers will receive encouragement to add the trivial
interface we need to do better.

> ... The situation on Windows
> looks pretty user-unfriendly as well, per the other thread.

That is unfortunate, it seems like such a stupid problem.  Restating
here for the sake of the list:  initdb just needs to figure out how to
ask for the current environment's locale in BCP 47 format ("en-US")
when setting the default for your template databases, not the
traditional format ("English_United States.1252") that Microsoft
explicitly tells us not to store in databases and that doesn't work in
the versioning API, but since we're mostly all Unix hackers we don't
know how.

> So I wonder if, rather than continuing to pursue this right now,
> we shouldn't revert 257836a75 and try again later with a new design
> that doesn't try to commandeer the existing dependency infrastructure.
> We might have a better idea about what to do on Windows by the time
> that's done, too.

It seems to me that there are two things that would be needed to
salvage this for PG14: (1) deciding that we're unlikely to come up
with a better idea than using pg_depend for this (following the
argument that it'd only create duplication to have a parallel
dedicated catalog), (2) fixing any remaining flaws in the dependency
analyser code.  I'll look into the details some more on Monday.

[1] https://www.postgresql.org/message-id/e9e22c5e-c018-f4ea-24c8-5b6d6fdacf30%402ndquadrant.com



pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation
Next
From: Tom Lane
Date:
Subject: Re: Bogus collation version recording in recordMultipleDependencies