Thread: Bogus collation version recording in recordMultipleDependencies
I noticed some broken-looking logic in recordMultipleDependencies concerning how it records collation versions. It was a bit harder than I expected to demonstrate the bugs, but I eventually succeeded with u8=# create function foo(varchar) returns bool language sql return false; CREATE FUNCTION u8=# create collation mycoll from "en_US"; CREATE COLLATION u8=# CREATE DOMAIN d4 AS character varying(3) COLLATE "aa_DJ" CONSTRAINT yes_or_no_check CHECK (value = 'YES' collate mycoll or foo(value)); CREATE DOMAIN u8=# select objid, pg_describe_object(classid,objid,objsubid) as obj, pg_describe_object(refclassid,refobjid,refobjsubid)as ref, deptype, refobjversion from pg_depend where objid = 'd4'::regtype; objid | obj | ref | deptype | refobjversion -------+---------+-------------------+---------+--------------- 37421 | type d4 | schema public | n | 37421 | type d4 | collation "aa_DJ" | n | (2 rows) u8=# select objid, pg_describe_object(classid,objid,objsubid) as obj, pg_describe_object(refclassid,refobjid,refobjsubid)as ref, deptype, refobjversion from pg_depend where refobjid = 'd4'::regtype; objid | obj | ref | deptype | refobjversion -------+----------------------------+---------+---------+--------------- 37420 | type d4[] | type d4 | i | 37422 | constraint yes_or_no_check | type d4 | a | (2 rows) u8=# select objid, pg_describe_object(classid,objid,objsubid) as obj, pg_describe_object(refclassid,refobjid,refobjsubid)as ref, deptype, refobjversion from pg_depend where objid = 37422; objid | obj | ref | deptype | refobjversion -------+----------------------------+---------------------------------+---------+--------------- 37422 | constraint yes_or_no_check | type d4 | a | 37422 | constraint yes_or_no_check | collation mycoll | n | 2.28 37422 | constraint yes_or_no_check | function foo(character varying) | n | 2.28 37422 | constraint yes_or_no_check | collation "default" | n | (4 rows) (This is in a glibc-based build, with C as the database's default collation.) One question here is whether it's correct that the domain's dependency on collation "aa_DJ" is unversioned. Maybe that's intentional, but it seems worth asking. Anyway, there are two pretty obvious bugs in the dependencies for the domain's CHECK constraint: the version for collation mycoll leaks into the entry for function foo, and an entirely useless (because unversioned) dependency is recorded on the default collation. ... well, it's almost entirely useless. If we fix things to not do that (as per patch 0001 below), the results of the create_index regression test become unstable, because there's two queries that inquire into the dependencies of indexes, and their results change depending on whether the default collation has a version or not. I'd be inclined to just take out the portions of that test that depend on that question, but maybe somebody will complain that there's a loss of useful coverage. I don't agree, but maybe I'll be overruled. If we do feel we need to stay bug-compatible with that behavior, then the alternate 0002 patch just fixes the version-leakage-across-entries problem, while still removing the unnecessary assumption that C, POSIX, and DEFAULT are the only pinned collations. (To be clear: 0002 passes check-world as-is, while 0001 is not committable without some regression-test fiddling.) Thoughts? regards, tom lane diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c index 362db7fe91..1217c01b8a 100644 --- a/src/backend/catalog/pg_depend.c +++ b/src/backend/catalog/pg_depend.c @@ -73,7 +73,6 @@ recordMultipleDependencies(const ObjectAddress *depender, max_slots, slot_init_count, slot_stored_count; - char *version = NULL; if (nreferenced <= 0) return; /* nothing to do */ @@ -104,31 +103,22 @@ recordMultipleDependencies(const ObjectAddress *depender, slot_init_count = 0; for (i = 0; i < nreferenced; i++, referenced++) { - bool ignore_systempin = false; + char *version = NULL; if (record_version) { /* For now we only know how to deal with collations. */ if (referenced->classId == CollationRelationId) { - /* C and POSIX don't need version tracking. */ + /* These are unversioned, so don't waste cycles on them. */ if (referenced->objectId == C_COLLATION_OID || referenced->objectId == POSIX_COLLATION_OID) continue; version = get_collation_version_for_oid(referenced->objectId, false); - - /* - * Default collation is pinned, so we need to force recording - * the dependency to store the version. - */ - if (referenced->objectId == DEFAULT_COLLATION_OID) - ignore_systempin = true; } } - else - Assert(!version); /* * If the referenced object is pinned by the system, there's no real @@ -136,7 +126,7 @@ recordMultipleDependencies(const ObjectAddress *depender, * version. This saves lots of space in pg_depend, so it's worth the * time taken to check. */ - if (!ignore_systempin && isObjectPinned(referenced, dependDesc)) + if (version == NULL && isObjectPinned(referenced, dependDesc)) continue; if (slot_init_count < max_slots) diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c index 362db7fe91..174ce4b7df 100644 --- a/src/backend/catalog/pg_depend.c +++ b/src/backend/catalog/pg_depend.c @@ -73,7 +73,6 @@ recordMultipleDependencies(const ObjectAddress *depender, max_slots, slot_init_count, slot_stored_count; - char *version = NULL; if (nreferenced <= 0) return; /* nothing to do */ @@ -104,6 +103,7 @@ recordMultipleDependencies(const ObjectAddress *depender, slot_init_count = 0; for (i = 0; i < nreferenced; i++, referenced++) { + char *version = NULL; bool ignore_systempin = false; if (record_version) @@ -111,7 +111,7 @@ recordMultipleDependencies(const ObjectAddress *depender, /* For now we only know how to deal with collations. */ if (referenced->classId == CollationRelationId) { - /* C and POSIX don't need version tracking. */ + /* These are unversioned, so don't waste cycles on them. */ if (referenced->objectId == C_COLLATION_OID || referenced->objectId == POSIX_COLLATION_OID) continue; @@ -120,15 +120,16 @@ recordMultipleDependencies(const ObjectAddress *depender, false); /* - * Default collation is pinned, so we need to force recording - * the dependency to store the version. + * If we have a version, make sure we record it even if the + * collation is pinned. Also force recording a dependency on + * the "default" collation even if it hasn't got a version; + * this is an ugly kluge to ensure stability of regression + * test results. */ - if (referenced->objectId == DEFAULT_COLLATION_OID) + if (version || referenced->objectId == DEFAULT_COLLATION_OID) ignore_systempin = true; } } - else - Assert(!version); /* * If the referenced object is pinned by the system, there's no real
On Wed, Apr 14, 2021 at 01:18:07PM -0400, Tom Lane wrote: > > One question here is whether it's correct that the domain's dependency > on collation "aa_DJ" is unversioned. Maybe that's intentional, but it > seems worth asking. This is intentional I think, we should record collation version only for object that might break if the collation version is updated. So creating an index on that domain would record the collation version. > Anyway, there are two pretty obvious bugs in the dependencies for the > domain's CHECK constraint: the version for collation mycoll leaks > into the entry for function foo, and an entirely useless (because > unversioned) dependency is recorded on the default collation. Agreed. > (To be clear: 0002 passes check-world as-is, while 0001 is not > committable without some regression-test fiddling.) I'm probably missing something obvious but both 0001 and 0002 pass check-world for me, on a glibc box and --with-icu. > Thoughts? I think this is an open item, so I added one for now.
Julien Rouhaud <rjuju123@gmail.com> writes: > On Wed, Apr 14, 2021 at 01:18:07PM -0400, Tom Lane wrote: >> (To be clear: 0002 passes check-world as-is, while 0001 is not >> committable without some regression-test fiddling.) > I'm probably missing something obvious but both 0001 and 0002 pass check-world > for me, on a glibc box and --with-icu. 0001 fails for me :-(. I think that requires default collation to be C. regards, tom lane
On Thu, Apr 15, 2021 at 10:06:24AM -0400, Tom Lane wrote: > Julien Rouhaud <rjuju123@gmail.com> writes: > > On Wed, Apr 14, 2021 at 01:18:07PM -0400, Tom Lane wrote: > >> (To be clear: 0002 passes check-world as-is, while 0001 is not > >> committable without some regression-test fiddling.) > > > I'm probably missing something obvious but both 0001 and 0002 pass check-world > > for me, on a glibc box and --with-icu. > > 0001 fails for me :-(. I think that requires default collation to be C. Oh right, adding --no-locale to the regress opts I see that create_index is failing, and that's not the one I was expecting. We could change create_index test to create c2 with a C collation, in order to test that we don't track dependency on unversioned locales, and add an extra test in collate.linux.utf8 to check that we do track a dependency on the default collation as this test isn't run in the --no-locale case. The only case not tested would be default unversioned collation, but I'm not sure where to properly test that. Maybe a short leading test in collate.linux.utf8 that would be run on linux in that case (when getdatabaseencoding() != 'UTF8')? It would require an extra alternate file but it wouldn't cause too much maintenance problem as there should be only one test.
Julien Rouhaud <rjuju123@gmail.com> writes: > On Thu, Apr 15, 2021 at 10:06:24AM -0400, Tom Lane wrote: >> 0001 fails for me :-(. I think that requires default collation to be C. > Oh right, adding --no-locale to the regress opts I see that create_index is > failing, and that's not the one I was expecting. > We could change create_index test to create c2 with a C collation, in order to > test that we don't track dependency on unversioned locales, and add an extra > test in collate.linux.utf8 to check that we do track a dependency on the > default collation as this test isn't run in the --no-locale case. The only > case not tested would be default unversioned collation, but I'm not sure where > to properly test that. Maybe a short leading test in collate.linux.utf8 that > would be run on linux in that case (when getdatabaseencoding() != 'UTF8')? It > would require an extra alternate file but it wouldn't cause too much > maintenance problem as there should be only one test. Since the proposed patch removes the dependency code's special-case handling of the default collation, I don't feel like we need to jump through hoops to prove that the default collation is tracked the same as other collations. A regression test with alternative outputs is a significant ongoing maintenance burden, and I do not see where we're getting a commensurate improvement in test coverage. Especially since, AFAICS, the two alternative outputs would essentially have to accept both the "it works" and "it doesn't work" outcomes. So I propose that we do 0001 below, which is my first patch plus your suggestion about fixing up create_index.sql. This passes check-world for me under both C and en_US.utf8 prevailing locales. regards, tom lane diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c index 362db7fe91..1217c01b8a 100644 --- a/src/backend/catalog/pg_depend.c +++ b/src/backend/catalog/pg_depend.c @@ -73,7 +73,6 @@ recordMultipleDependencies(const ObjectAddress *depender, max_slots, slot_init_count, slot_stored_count; - char *version = NULL; if (nreferenced <= 0) return; /* nothing to do */ @@ -104,31 +103,22 @@ recordMultipleDependencies(const ObjectAddress *depender, slot_init_count = 0; for (i = 0; i < nreferenced; i++, referenced++) { - bool ignore_systempin = false; + char *version = NULL; if (record_version) { /* For now we only know how to deal with collations. */ if (referenced->classId == CollationRelationId) { - /* C and POSIX don't need version tracking. */ + /* These are unversioned, so don't waste cycles on them. */ if (referenced->objectId == C_COLLATION_OID || referenced->objectId == POSIX_COLLATION_OID) continue; version = get_collation_version_for_oid(referenced->objectId, false); - - /* - * Default collation is pinned, so we need to force recording - * the dependency to store the version. - */ - if (referenced->objectId == DEFAULT_COLLATION_OID) - ignore_systempin = true; } } - else - Assert(!version); /* * If the referenced object is pinned by the system, there's no real @@ -136,7 +126,7 @@ recordMultipleDependencies(const ObjectAddress *depender, * version. This saves lots of space in pg_depend, so it's worth the * time taken to check. */ - if (!ignore_systempin && isObjectPinned(referenced, dependDesc)) + if (version == NULL && isObjectPinned(referenced, dependDesc)) continue; if (slot_init_count < max_slots) diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out index 830fdddf24..7f8f91b92c 100644 --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out @@ -2026,10 +2026,10 @@ REINDEX TABLE concur_reindex_tab; -- notice NOTICE: table "concur_reindex_tab" has no indexes to reindex REINDEX (CONCURRENTLY) TABLE concur_reindex_tab; -- notice NOTICE: table "concur_reindex_tab" has no indexes that can be reindexed concurrently -ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index +ALTER TABLE concur_reindex_tab ADD COLUMN c2 text COLLATE "C"; -- add toast index -- Normal index with integer column CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1); --- Normal index with text column +-- Normal index with text column (with unversioned collation) CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2); -- UNIQUE index with expression CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1)); @@ -2069,16 +2069,14 @@ WHERE classid = 'pg_class'::regclass AND obj | objref | deptype ------------------------------------------+------------------------------------------------------------+--------- index concur_reindex_ind1 | constraint concur_reindex_ind1 on table concur_reindex_tab | i - index concur_reindex_ind2 | collation "default" | n index concur_reindex_ind2 | column c2 of table concur_reindex_tab | a index concur_reindex_ind3 | column c1 of table concur_reindex_tab | a index concur_reindex_ind3 | table concur_reindex_tab | a - index concur_reindex_ind4 | collation "default" | n index concur_reindex_ind4 | column c1 of table concur_reindex_tab | a index concur_reindex_ind4 | column c2 of table concur_reindex_tab | a materialized view concur_reindex_matview | schema public | n table concur_reindex_tab | schema public | n -(10 rows) +(8 rows) REINDEX INDEX CONCURRENTLY concur_reindex_ind1; REINDEX TABLE CONCURRENTLY concur_reindex_tab; @@ -2098,16 +2096,14 @@ WHERE classid = 'pg_class'::regclass AND obj | objref | deptype ------------------------------------------+------------------------------------------------------------+--------- index concur_reindex_ind1 | constraint concur_reindex_ind1 on table concur_reindex_tab | i - index concur_reindex_ind2 | collation "default" | n index concur_reindex_ind2 | column c2 of table concur_reindex_tab | a index concur_reindex_ind3 | column c1 of table concur_reindex_tab | a index concur_reindex_ind3 | table concur_reindex_tab | a - index concur_reindex_ind4 | collation "default" | n index concur_reindex_ind4 | column c1 of table concur_reindex_tab | a index concur_reindex_ind4 | column c2 of table concur_reindex_tab | a materialized view concur_reindex_matview | schema public | n table concur_reindex_tab | schema public | n -(10 rows) +(8 rows) -- Check that comments are preserved CREATE TABLE testcomment (i int); @@ -2487,7 +2483,7 @@ WARNING: cannot reindex system catalogs concurrently, skipping all Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- c1 | integer | | not null | - c2 | text | | | + c2 | text | C | | Indexes: "concur_reindex_ind1" PRIMARY KEY, btree (c1) "concur_reindex_ind2" btree (c2) diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql index 8bc76f7c6f..51c9a12151 100644 --- a/src/test/regress/sql/create_index.sql +++ b/src/test/regress/sql/create_index.sql @@ -797,10 +797,10 @@ CREATE TABLE concur_reindex_tab (c1 int); -- REINDEX REINDEX TABLE concur_reindex_tab; -- notice REINDEX (CONCURRENTLY) TABLE concur_reindex_tab; -- notice -ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index +ALTER TABLE concur_reindex_tab ADD COLUMN c2 text COLLATE "C"; -- add toast index -- Normal index with integer column CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1); --- Normal index with text column +-- Normal index with text column (with unversioned collation) CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2); -- UNIQUE index with expression CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
On Fri, Apr 16, 2021 at 10:03:42AM -0400, Tom Lane wrote: > > Since the proposed patch removes the dependency code's special-case > handling of the default collation, I don't feel like we need to jump > through hoops to prove that the default collation is tracked the > same as other collations. A regression test with alternative outputs > is a significant ongoing maintenance burden, and I do not see where > we're getting a commensurate improvement in test coverage. Especially > since, AFAICS, the two alternative outputs would essentially have to > accept both the "it works" and "it doesn't work" outcomes. Fine by me, I was mentioning those if we wanted to keep some extra coverage for that by I agree it doesn't add much value. > So I propose that we do 0001 below, which is my first patch plus your > suggestion about fixing up create_index.sql. This passes check-world > for me under both C and en_US.utf8 prevailing locales. That's what I ended up with too, so LGTM!
Julien Rouhaud <rjuju123@gmail.com> writes: > On Fri, Apr 16, 2021 at 10:03:42AM -0400, Tom Lane wrote: >> So I propose that we do 0001 below, which is my first patch plus your >> suggestion about fixing up create_index.sql. This passes check-world >> for me under both C and en_US.utf8 prevailing locales. > That's what I ended up with too, so LGTM! Pushed, thanks for review! (and I'll update the open items list in a sec) regards, tom lane
I wrote: >> That's what I ended up with too, so LGTM! > Pushed, thanks for review! (and I'll update the open items list in a > sec) ... or maybe not just yet. Andres' buildfarm critters seem to have a different opinion than my machine about what the output of collate.icu.utf8 ought to be. I wonder what the prevailing LANG setting is for them, and which ICU version they're using. regards, tom lane
Hi, On 2021-04-16 12:55:28 -0400, Tom Lane wrote: > I wrote: > >> That's what I ended up with too, so LGTM! > > > Pushed, thanks for review! (and I'll update the open items list in a > > sec) > > ... or maybe not just yet. Andres' buildfarm critters seem to have > a different opinion than my machine about what the output of > collate.icu.utf8 ought to be. I wonder what the prevailing LANG > setting is for them, and which ICU version they're using. andres@andres-pg-buildfarm-valgrind:~/src/pgbuildfarm-client-stock$ grep calliph *.conf build-farm-copyparse.conf: animal => "calliphoridae", build-farm-copyparse.conf: build_root => '/mnt/resource/andres/bf/calliphoridae', andres@andres-pg-buildfarm-valgrind:~/src/pgbuildfarm-client-stock$ dpkg -l|grep icu ii icu-devtools 67.1-6 amd64 Development utilities for InternationalComponents for Unicode ii libicu-dev:amd64 67.1-6 amd64 Development files for InternationalComponents for Unicode ii libicu67:amd64 67.1-6 amd64 International Components for Unicode andres@andres-pg-buildfarm-valgrind:~/src/pgbuildfarm-client-stock$ locale LANG=C.UTF-8 LANGUAGE= LC_CTYPE="C.UTF-8" LC_NUMERIC="C.UTF-8" LC_TIME="C.UTF-8" LC_COLLATE="C.UTF-8" LC_MONETARY="C.UTF-8" LC_MESSAGES="C.UTF-8" LC_PAPER="C.UTF-8" LC_NAME="C.UTF-8" LC_ADDRESS="C.UTF-8" LC_TELEPHONE="C.UTF-8" LC_MEASUREMENT="C.UTF-8" LC_IDENTIFICATION="C.UTF-8" LC_ALL= Greetings, Andres Freund
I wrote: > ... or maybe not just yet. Andres' buildfarm critters seem to have > a different opinion than my machine about what the output of > collate.icu.utf8 ought to be. I wonder what the prevailing LANG > setting is for them, and which ICU version they're using. Oh, I bet it's "C.utf8", because I can reproduce the failure with that. This crystallizes a nagging feeling I'd had that you were misdescribing the collate.icu.utf8 test as not being run under --no-locale. Actually, it's only skipped if the encoding isn't UTF8, not the same thing. I think we need to remove the default-collation cases from that test too. regards, tom lane
Andres Freund <andres@anarazel.de> writes: > On 2021-04-16 12:55:28 -0400, Tom Lane wrote: >> ... or maybe not just yet. Andres' buildfarm critters seem to have >> a different opinion than my machine about what the output of >> collate.icu.utf8 ought to be. I wonder what the prevailing LANG >> setting is for them, and which ICU version they're using. > LANG=C.UTF-8 I'd guessed that shortly later, but thanks for confirming. regards, tom lane
I wrote: > Oh, I bet it's "C.utf8", because I can reproduce the failure with that. > This crystallizes a nagging feeling I'd had that you were misdescribing > the collate.icu.utf8 test as not being run under --no-locale. Actually, > it's only skipped if the encoding isn't UTF8, not the same thing. > I think we need to remove the default-collation cases from that test too. Hmm ... this is more subtle than it seemed. I tried to figure out where the default-collation dependencies were coming from, and it's quite non-obvious, at least for some of them. Observe: u8de=# create table t1 (f1 text collate "fr_FR"); CREATE TABLE u8de=# create index on t1(f1) where f1 > 'foo'; CREATE INDEX u8de=# SELECT objid::regclass, refobjid::regcollation, refobjversion FROM pg_depend d LEFT JOIN pg_class c ON c.oid = d.objid WHERE refclassid = 'pg_collation'::regclass AND coalesce(relkind, 'i') = 'i' AND relname LIKE 't1_%'; objid | refobjid | refobjversion -----------+-----------+--------------- t1_f1_idx | "fr_FR" | 2.28 t1_f1_idx | "fr_FR" | 2.28 t1_f1_idx | "default" | 2.28 (3 rows) (The "default" item doesn't show up if default collation is C, which is what's causing the buildfarm instability.) Now, it certainly looks like that index definition ought to only have fr_FR dependencies. I dug into it and discovered that the reason we're coming up with a dependency on "default" is that the WHERE clause looks like {OPEXPR :opno 666 :opfuncid 742 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 14484 :args ( {VAR :varno 1 :varattno 1 :vartype 25 :vartypmod -1 :varcollid 14484 :varlevelsup 0 :varnosyn 1 :varattnosyn 1 :location 23 } {CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false :constisnull false :location 28 :constvalue 7 [ 28 0 0 0 102 111 111 ] } ) :location 26 } So sure enough, the comparison operator's inputcollid is fr_FR, but the 'foo' constant has constcollid = "default". That will have exactly zero impact on the semantics of the expression, but dependency.c doesn't realize that and reports it as a dependency anyway. I feel like this is telling us that there's a fundamental misunderstanding in find_expr_references_walker about which collation dependencies to report. It's reporting all the leaf-node collations, and ignoring the ones that actually count semantically, that is the inputcollid fields of function and operator nodes. Not sure what's the best thing to do here. Redesigning this post-feature-freeze doesn't seem terribly appetizing, but on the other hand, this index collation recording feature has put a premium on not overstating the collation dependencies of an expression. We don't want to tell users that an index is broken when it isn't really. regards, tom lane
I wrote: > I feel like this is telling us that there's a fundamental > misunderstanding in find_expr_references_walker about which > collation dependencies to report. It's reporting all the > leaf-node collations, and ignoring the ones that actually > count semantically, that is the inputcollid fields of > function and operator nodes. > Not sure what's the best thing to do here. Redesigning > this post-feature-freeze doesn't seem terribly appetizing, > but on the other hand, this index collation recording > feature has put a premium on not overstating the collation > dependencies of an expression. We don't want to tell users > that an index is broken when it isn't really. I felt less hesitant to modify find_expr_references_walker's behavior w.r.t. collations after realizing that most of it was not of long standing, but came in with 257836a75. So here's a draft patch that redesigns it as suggested above. Along the way I discovered that GetTypeCollations was quite broken for ranges and arrays, so this fixes that too. Per the changes in collate.icu.utf8.out, this gets rid of a lot of imaginary collation dependencies, but it also gets rid of some arguably-real ones. In particular, calls of record_eq and its siblings will be considered not to have any collation dependencies, although we know that internally those will look up per-column collations of their input types. We could imagine special-casing record_eq etc here, but that sure seems like a hack. I"m starting to have a bad feeling about 257836a75 overall. As I think I've complained before, I do not like anything about what it's done to pg_depend; it's forcing that relation to serve two masters, neither one well. We now see that the same remark applies to find_expr_references(), because the semantics of "which collations does this expression's behavior depend on" aren't identical to "which collations need to be recorded as direct dependencies of this expression", especially not if you'd prefer to minimize either list. (Which is important.) Moreover, for all the complexity it's introducing, it's next door to useless for glibc collations --- we might as well tell people "reindex everything when your glibc version changes", which could be done with a heck of a lot less infrastructure. The situation on Windows looks pretty user-unfriendly as well, per the other thread. So I wonder if, rather than continuing to pursue this right now, we shouldn't revert 257836a75 and try again later with a new design that doesn't try to commandeer the existing dependency infrastructure. We might have a better idea about what to do on Windows by the time that's done, too. regards, tom lane diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index 8d8e926c21..41093ea6ae 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -1835,8 +1835,17 @@ recordDependencyOnSingleRelExpr(const ObjectAddress *depender, * the datatype. However we do need a type dependency if there is no such * indirect dependency, as for example in Const and CoerceToDomain nodes. * - * Similarly, we don't need to create dependencies on collations except where - * the collation is being freshly introduced to the expression. + * Collations are handled primarily by recording the inputcollid's of node + * types that have them, as those are the ones that are semantically + * significant during expression evaluation. We also record the collation of + * CollateExpr nodes, since those will be needed to print such nodes even if + * they don't really affect semantics. Collations of leaf nodes such as Vars + * can be ignored on the grounds that if they're not default, they came from + * the referenced object (e.g., a table column), so the dependency on that + * object is enough. (Note: in a post-const-folding expression tree, a + * CollateExpr's collation could have been absorbed into a Const or + * RelabelType node. While ruleutils.c prints such collations for clarity, + * we may ignore them here as they have no semantic effect.) */ static bool find_expr_references_walker(Node *node, @@ -1876,29 +1885,6 @@ find_expr_references_walker(Node *node, /* If it's a plain relation, reference this column */ add_object_address(OCLASS_CLASS, rte->relid, var->varattno, context->addrs); - - /* Top-level collation if valid */ - if (OidIsValid(var->varcollid)) - add_object_address(OCLASS_COLLATION, var->varcollid, 0, - context->addrs); - /* Otherwise, it may be a type with internal collations */ - else if (var->vartype >= FirstNormalObjectId) - { - List *collations; - ListCell *lc; - - collations = GetTypeCollations(var->vartype); - - foreach(lc, collations) - { - Oid coll = lfirst_oid(lc); - - if (OidIsValid(coll)) - add_object_address(OCLASS_COLLATION, - lfirst_oid(lc), 0, - context->addrs); - } - } } /* @@ -1920,15 +1906,6 @@ find_expr_references_walker(Node *node, add_object_address(OCLASS_TYPE, con->consttype, 0, context->addrs); - /* - * We must also depend on the constant's collation: it could be - * different from the datatype's, if a CollateExpr was const-folded to - * a simple constant. - */ - if (OidIsValid(con->constcollid)) - add_object_address(OCLASS_COLLATION, con->constcollid, 0, - context->addrs); - /* * If it's a regclass or similar literal referring to an existing * object, add a reference to that object. (Currently, only the @@ -2013,10 +1990,6 @@ find_expr_references_walker(Node *node, /* A parameter must depend on the parameter's datatype */ add_object_address(OCLASS_TYPE, param->paramtype, 0, context->addrs); - /* and its collation, just as for Consts */ - if (OidIsValid(param->paramcollid)) - add_object_address(OCLASS_COLLATION, param->paramcollid, 0, - context->addrs); } else if (IsA(node, FuncExpr)) { @@ -2024,6 +1997,9 @@ find_expr_references_walker(Node *node, add_object_address(OCLASS_PROC, funcexpr->funcid, 0, context->addrs); + if (OidIsValid(funcexpr->inputcollid)) + add_object_address(OCLASS_COLLATION, funcexpr->inputcollid, 0, + context->addrs); /* fall through to examine arguments */ } else if (IsA(node, OpExpr)) @@ -2032,6 +2008,9 @@ find_expr_references_walker(Node *node, add_object_address(OCLASS_OPERATOR, opexpr->opno, 0, context->addrs); + if (OidIsValid(opexpr->inputcollid)) + add_object_address(OCLASS_COLLATION, opexpr->inputcollid, 0, + context->addrs); /* fall through to examine arguments */ } else if (IsA(node, DistinctExpr)) @@ -2040,6 +2019,9 @@ find_expr_references_walker(Node *node, add_object_address(OCLASS_OPERATOR, distinctexpr->opno, 0, context->addrs); + if (OidIsValid(distinctexpr->inputcollid)) + add_object_address(OCLASS_COLLATION, distinctexpr->inputcollid, 0, + context->addrs); /* fall through to examine arguments */ } else if (IsA(node, NullIfExpr)) @@ -2048,6 +2030,9 @@ find_expr_references_walker(Node *node, add_object_address(OCLASS_OPERATOR, nullifexpr->opno, 0, context->addrs); + if (OidIsValid(nullifexpr->inputcollid)) + add_object_address(OCLASS_COLLATION, nullifexpr->inputcollid, 0, + context->addrs); /* fall through to examine arguments */ } else if (IsA(node, ScalarArrayOpExpr)) @@ -2056,6 +2041,9 @@ find_expr_references_walker(Node *node, add_object_address(OCLASS_OPERATOR, opexpr->opno, 0, context->addrs); + if (OidIsValid(opexpr->inputcollid)) + add_object_address(OCLASS_COLLATION, opexpr->inputcollid, 0, + context->addrs); /* fall through to examine arguments */ } else if (IsA(node, Aggref)) @@ -2064,6 +2052,9 @@ find_expr_references_walker(Node *node, add_object_address(OCLASS_PROC, aggref->aggfnoid, 0, context->addrs); + if (OidIsValid(aggref->inputcollid)) + add_object_address(OCLASS_COLLATION, aggref->inputcollid, 0, + context->addrs); /* fall through to examine arguments */ } else if (IsA(node, WindowFunc)) @@ -2072,6 +2063,9 @@ find_expr_references_walker(Node *node, add_object_address(OCLASS_PROC, wfunc->winfnoid, 0, context->addrs); + if (OidIsValid(wfunc->inputcollid)) + add_object_address(OCLASS_COLLATION, wfunc->inputcollid, 0, + context->addrs); /* fall through to examine arguments */ } else if (IsA(node, SubscriptingRef)) @@ -2116,10 +2110,6 @@ find_expr_references_walker(Node *node, else add_object_address(OCLASS_TYPE, fselect->resulttype, 0, context->addrs); - /* the collation might not be referenced anywhere else, either */ - if (OidIsValid(fselect->resultcollid)) - add_object_address(OCLASS_COLLATION, fselect->resultcollid, 0, - context->addrs); } else if (IsA(node, FieldStore)) { @@ -2146,10 +2136,6 @@ find_expr_references_walker(Node *node, /* since there is no function dependency, need to depend on type */ add_object_address(OCLASS_TYPE, relab->resulttype, 0, context->addrs); - /* the collation might not be referenced anywhere else, either */ - if (OidIsValid(relab->resultcollid)) - add_object_address(OCLASS_COLLATION, relab->resultcollid, 0, - context->addrs); } else if (IsA(node, CoerceViaIO)) { @@ -2158,10 +2144,6 @@ find_expr_references_walker(Node *node, /* since there is no exposed function, need to depend on type */ add_object_address(OCLASS_TYPE, iocoerce->resulttype, 0, context->addrs); - /* the collation might not be referenced anywhere else, either */ - if (OidIsValid(iocoerce->resultcollid)) - add_object_address(OCLASS_COLLATION, iocoerce->resultcollid, 0, - context->addrs); } else if (IsA(node, ArrayCoerceExpr)) { @@ -2170,10 +2152,6 @@ find_expr_references_walker(Node *node, /* as above, depend on type */ add_object_address(OCLASS_TYPE, acoerce->resulttype, 0, context->addrs); - /* the collation might not be referenced anywhere else, either */ - if (OidIsValid(acoerce->resultcollid)) - add_object_address(OCLASS_COLLATION, acoerce->resultcollid, 0, - context->addrs); /* fall through to examine arguments */ } else if (IsA(node, ConvertRowtypeExpr)) @@ -2213,6 +2191,24 @@ find_expr_references_walker(Node *node, add_object_address(OCLASS_OPFAMILY, lfirst_oid(l), 0, context->addrs); } + foreach(l, rcexpr->inputcollids) + { + Oid inputcollid = lfirst_oid(l); + + if (OidIsValid(inputcollid)) + add_object_address(OCLASS_COLLATION, inputcollid, 0, + context->addrs); + } + /* fall through to examine arguments */ + } + else if (IsA(node, MinMaxExpr)) + { + MinMaxExpr *mmexpr = (MinMaxExpr *) node; + + /* minmaxtype will match one of the inputs, so no need to record it */ + if (OidIsValid(mmexpr->inputcollid)) + add_object_address(OCLASS_COLLATION, mmexpr->inputcollid, 0, + context->addrs); /* fall through to examine arguments */ } else if (IsA(node, CoerceToDomain)) diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c index ec5d122432..22f289d9f8 100644 --- a/src/backend/catalog/pg_type.c +++ b/src/backend/catalog/pg_type.c @@ -564,13 +564,19 @@ GetTypeCollations(Oid typeoid) } else if (typeTup->typtype == TYPTYPE_RANGE) { - Oid rangeid = get_range_subtype(typeTup->oid); + Oid rangecoll = get_range_collation(typeTup->oid); - Assert(OidIsValid(rangeid)); + if (OidIsValid(rangecoll)) + result = list_append_unique_oid(result, rangecoll); + else + { + Oid rangeid = get_range_subtype(typeTup->oid); - result = list_concat_unique_oid(result, GetTypeCollations(rangeid)); + Assert(OidIsValid(rangeid)); + result = list_concat_unique_oid(result, GetTypeCollations(rangeid)); + } } - else if (OidIsValid(typeTup->typelem)) + else if (IsTrueArrayType(typeTup)) result = list_concat_unique_oid(result, GetTypeCollations(typeTup->typelem)); diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out index 779405ef32..faf99f76b5 100644 --- a/src/test/regress/expected/collate.icu.utf8.out +++ b/src/test/regress/expected/collate.icu.utf8.out @@ -1958,7 +1958,7 @@ CREATE DOMAIN d_custom AS t_custom; CREATE COLLATION custom ( LOCALE = 'fr-x-icu', PROVIDER = 'icu' ); -CREATE TYPE myrange AS range (subtype = text); +CREATE TYPE myrange AS range (subtype = text, collation = "POSIX"); CREATE TYPE myrange_en_fr_ga AS range(subtype = t_en_fr_ga); CREATE TABLE collate_test ( id integer, @@ -2054,36 +2054,17 @@ ORDER BY 1, 2, 3; icuidx05_d_en_fr_ga_arr | "en-x-icu" | up to date icuidx05_d_en_fr_ga_arr | "fr-x-icu" | up to date icuidx05_d_en_fr_ga_arr | "ga-x-icu" | up to date - icuidx06_d_en_fr_ga | "default" | XXX - icuidx06_d_en_fr_ga | "en-x-icu" | up to date icuidx06_d_en_fr_ga | "fr-x-icu" | up to date - icuidx06_d_en_fr_ga | "ga-x-icu" | up to date - icuidx07_d_en_fr_ga | "default" | XXX - icuidx07_d_en_fr_ga | "en-x-icu" | up to date - icuidx07_d_en_fr_ga | "fr-x-icu" | up to date icuidx07_d_en_fr_ga | "ga-x-icu" | up to date - icuidx08_d_en_fr_ga | "en-x-icu" | up to date - icuidx08_d_en_fr_ga | "fr-x-icu" | up to date - icuidx08_d_en_fr_ga | "ga-x-icu" | up to date - icuidx09_d_en_fr_ga | "en-x-icu" | up to date - icuidx09_d_en_fr_ga | "fr-x-icu" | up to date - icuidx09_d_en_fr_ga | "ga-x-icu" | up to date - icuidx10_d_en_fr_ga_es | "en-x-icu" | up to date icuidx10_d_en_fr_ga_es | "es-x-icu" | up to date - icuidx10_d_en_fr_ga_es | "fr-x-icu" | up to date - icuidx10_d_en_fr_ga_es | "ga-x-icu" | up to date - icuidx11_d_es | "default" | XXX icuidx11_d_es | "es-x-icu" | up to date - icuidx12_custom | "default" | XXX icuidx12_custom | custom | up to date - icuidx13_custom | "default" | XXX icuidx13_custom | custom | up to date - icuidx14_myrange | "default" | XXX icuidx15_myrange_en_fr_ga | "en-x-icu" | up to date icuidx15_myrange_en_fr_ga | "fr-x-icu" | up to date icuidx15_myrange_en_fr_ga | "ga-x-icu" | up to date icuidx17_part | "en-x-icu" | up to date -(57 rows) +(38 rows) -- Validate that REINDEX will update the stored version. UPDATE pg_depend SET refobjversion = 'not a version' diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql index 7f40c56039..4c71f4d249 100644 --- a/src/test/regress/sql/collate.icu.utf8.sql +++ b/src/test/regress/sql/collate.icu.utf8.sql @@ -755,7 +755,7 @@ CREATE COLLATION custom ( LOCALE = 'fr-x-icu', PROVIDER = 'icu' ); -CREATE TYPE myrange AS range (subtype = text); +CREATE TYPE myrange AS range (subtype = text, collation = "POSIX"); CREATE TYPE myrange_en_fr_ga AS range(subtype = t_en_fr_ga); CREATE TABLE collate_test (
On Sat, Apr 17, 2021 at 8:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Per the changes in collate.icu.utf8.out, this gets rid of > a lot of imaginary collation dependencies, but it also gets > rid of some arguably-real ones. In particular, calls of > record_eq and its siblings will be considered not to have > any collation dependencies, although we know that internally > those will look up per-column collations of their input types. > We could imagine special-casing record_eq etc here, but that > sure seems like a hack. Thanks for looking into all this. Hmm. > I"m starting to have a bad feeling about 257836a75 overall. > As I think I've complained before, I do not like anything about > what it's done to pg_depend; it's forcing that relation to serve > two masters, neither one well. ... We did worry about (essentially) this question quite a bit in the discussion thread, but we figured that you'd otherwise have to create a parallel infrastructure that would look almost identical (for example [1]). > ... We now see that the same remark > applies to find_expr_references(), because the semantics of > "which collations does this expression's behavior depend on" aren't > identical to "which collations need to be recorded as direct > dependencies of this expression", especially not if you'd prefer > to minimize either list. (Which is important.) ... Bugs in the current analyser code aside, if we had a second catalog and a second analyser for this stuff, then you'd still have the union of both minimised sets in total, with some extra duplication because you'd have some rows in both places that are currently handled by one row, no? > ... Moreover, for all > the complexity it's introducing, it's next door to useless for > glibc collations --- we might as well tell people "reindex > everything when your glibc version changes", which could be done > with a heck of a lot less infrastructure. ... You do gain reliable tracking of which indexes remain to be rebuilt, and warnings for common hazards like hot standbys with mismatched glibc, so I think it's pretty useful. As for the poverty of information from glibc, I don't see why it should hold ICU, Windows, FreeBSD users back. In fact I am rather hoping that by shipping this, glibc developers will receive encouragement to add the trivial interface we need to do better. > ... The situation on Windows > looks pretty user-unfriendly as well, per the other thread. That is unfortunate, it seems like such a stupid problem. Restating here for the sake of the list: initdb just needs to figure out how to ask for the current environment's locale in BCP 47 format ("en-US") when setting the default for your template databases, not the traditional format ("English_United States.1252") that Microsoft explicitly tells us not to store in databases and that doesn't work in the versioning API, but since we're mostly all Unix hackers we don't know how. > So I wonder if, rather than continuing to pursue this right now, > we shouldn't revert 257836a75 and try again later with a new design > that doesn't try to commandeer the existing dependency infrastructure. > We might have a better idea about what to do on Windows by the time > that's done, too. It seems to me that there are two things that would be needed to salvage this for PG14: (1) deciding that we're unlikely to come up with a better idea than using pg_depend for this (following the argument that it'd only create duplication to have a parallel dedicated catalog), (2) fixing any remaining flaws in the dependency analyser code. I'll look into the details some more on Monday. [1] https://www.postgresql.org/message-id/e9e22c5e-c018-f4ea-24c8-5b6d6fdacf30%402ndquadrant.com
Thomas Munro <thomas.munro@gmail.com> writes: > I'll look into the details some more on Monday. Fair enough. Although there are only a few buildfarm members complaining, I don't really want to leave them red all weekend. I could either commit the patch I just presented, or revert ef387bed8 ... got a preference? regards, tom lane
On Sat, Apr 17, 2021 at 10:47 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > I'll look into the details some more on Monday. > > Fair enough. > > Although there are only a few buildfarm members complaining, I don't > really want to leave them red all weekend. I could either commit the > patch I just presented, or revert ef387bed8 ... got a preference? +1 for committing the new patch for now. I will look into to the record problem. More in a couple of days.
Thomas Munro <thomas.munro@gmail.com> writes: > On Sat, Apr 17, 2021 at 10:47 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Although there are only a few buildfarm members complaining, I don't >> really want to leave them red all weekend. I could either commit the >> patch I just presented, or revert ef387bed8 ... got a preference? > +1 for committing the new patch for now. I will look into to the > record problem. More in a couple of days. OK, done. regards, tom lane
On Fri, Apr 16, 2021 at 01:07:52PM -0400, Tom Lane wrote: > I wrote: > > ... or maybe not just yet. Andres' buildfarm critters seem to have > > a different opinion than my machine about what the output of > > collate.icu.utf8 ought to be. I wonder what the prevailing LANG > > setting is for them, and which ICU version they're using. > > Oh, I bet it's "C.utf8", because I can reproduce the failure with that. > This crystallizes a nagging feeling I'd had that you were misdescribing > the collate.icu.utf8 test as not being run under --no-locale. Actually, > it's only skipped if the encoding isn't UTF8, not the same thing. > I think we need to remove the default-collation cases from that test too. IIUC pg_regress --no-locale will call initdb --no-locale which force the locale to C, and in that case pg_get_encoding_from_locale() does force SQL_ASCII as encoding. But yes I clearly didn't think at all that you could set the various env variables to C.utf8 which can then run the collate.icu.utf8 or linux.utf8 :(
On Fri, Apr 16, 2021 at 10:24:21PM -0400, Tom Lane wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > On Sat, Apr 17, 2021 at 10:47 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Although there are only a few buildfarm members complaining, I don't > >> really want to leave them red all weekend. I could either commit the > >> patch I just presented, or revert ef387bed8 ... got a preference? > > > +1 for committing the new patch for now. I will look into to the > > record problem. More in a couple of days. > > OK, done. Thanks for the fixes! I'll also look at the problem.
On Sat, Apr 17, 2021 at 10:01:53AM +1200, Thomas Munro wrote: > On Sat, Apr 17, 2021 at 8:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Per the changes in collate.icu.utf8.out, this gets rid of > > a lot of imaginary collation dependencies, but it also gets > > rid of some arguably-real ones. In particular, calls of > > record_eq and its siblings will be considered not to have > > any collation dependencies, although we know that internally > > those will look up per-column collations of their input types. > > We could imagine special-casing record_eq etc here, but that > > sure seems like a hack. > > [...] > > > So I wonder if, rather than continuing to pursue this right now, > > we shouldn't revert 257836a75 and try again later with a new design > > that doesn't try to commandeer the existing dependency infrastructure. > > We might have a better idea about what to do on Windows by the time > > that's done, too. > > It seems to me that there are two things that would be needed to > salvage this for PG14: (1) deciding that we're unlikely to come up > with a better idea than using pg_depend for this (following the > argument that it'd only create duplication to have a parallel > dedicated catalog), (2) fixing any remaining flaws in the dependency > analyser code. I'll look into the details some more on Monday. So IIUC the issue here is that the code could previously record useless collation version dependencies in somes cases, which could lead to false positive possible corruption messages (and of course additional bloat on pg_depend). False positive messages can't be avoided anyway, as a collation version update may not corrupt the actually indexed set of data, especially for glibc. But with the infrastructure as-is advanced user can look into the new version changes and choose to ignore changes for a specific set of collation, which is way easier to do with the recorded dependencies. The new situation is now that the code can record too few version dependencies leading to false negative detection, which is way more problematic. This was previously discussed around [1]. Quoting Thomas: > To state more explicitly what's happening here, we're searching the > expression trees for subexpresions that have a collation as part of > their static type. We don't know which functions or operators are > actually affected by the collation, though. For example, if an > expression says "x IS NOT NULL" and x happens to be a subexpression of > a type with a particular collation, we don't now that this > expression's value can't possibly be affected by the collation version > changing. So, the system will nag you to rebuild an index just > because you mentioned it, even though the index can't be corrupted. > To do better than that, I suppose we'd need declarations in the > catalog to say which functions/operators are collation sensitive. We agreed that having possible false positive dependencies was acceptable for the initial implementation and that we will improve it in later versions, as otherwise the alternative is to reindex everything without getting any warning, which clearly isn't better anyway. FTR was had the same agreement to not handle specific AMs that don't care about collation (like hash or bloom) in [2], even though I provided a patch to handle that case ([3]) which was dropped later on ([4]). Properly and correctly handling collation version dependency in expressions is a hard problem and will definitely require additional fields in pg_proc, so we clearly can't add that in pg14. So yes we have to decide whether we want to keep the feature in pg14 with the known limitations (and in that case probably revert f24b15699, possibly improving documentation on the possibility of false positive) or revert it entirely. Unsurprisingly, I think that the feature as-is is already a significant improvement, which can be easily improved, so my vote is to keep it in pg14. And just to be clear I'm volunteering to work on the expression problem and all other related improvements for the next version, whether the current feature is reverted or not. [1]: https://www.postgresql.org/message-id/CA%2BhUKGK8CwBcTcXWL2kUjpHT%2B6t2hEFCzkcZ-Z7xXbz%3DC4NLCQ%40mail.gmail.com [2]: https://www.postgresql.org/message-id/13b0c950-80f9-4c10-7e0f-f59feac56a98%402ndquadrant.com [3]: https://www.postgresql.org/message-id/20200908144507.GA57691%40nol [4]: https://www.postgresql.org/message-id/CA%2BhUKGKHj4aYmmwKZdZjkD%3DCWRmn%3De6UsS7S%2Bu6oLrrp0orgsw%40mail.gmail.com
Julien Rouhaud <rjuju123@gmail.com> writes: > On Sat, Apr 17, 2021 at 10:01:53AM +1200, Thomas Munro wrote: >> It seems to me that there are two things that would be needed to >> salvage this for PG14: (1) deciding that we're unlikely to come up >> with a better idea than using pg_depend for this (following the >> argument that it'd only create duplication to have a parallel >> dedicated catalog), (2) fixing any remaining flaws in the dependency >> analyser code. I'll look into the details some more on Monday. > So IIUC the issue here is that the code could previously record useless > collation version dependencies in somes cases, ... > The new situation is now that the code can record too few version dependencies > leading to false negative detection, which is way more problematic. I'm not sure that an error in this direction is all that much more problematic than the other direction. If it's okay to claim that indexes need to be rebuilt when they don't really, then we could just drop this entire overcomplicated infrastructure and report that all indexes need to be rebuilt after any collation version change. But in any case you're oversimplifying tremendously. The previous code is just as capable of errors of omission, because it was inquiring into the wrong composite types, ie those of leaf expression nodes. The ones we'd need to look at are the immediate inputs of record_eq and siblings. Here are a couple of examples where the leaf types are unhelpful: ... where row(a,b,c)::composite_type < row(d,e,f)::composite_type; ... where function_returning_composite(...) < function_returning_composite(...); And even if we do this, we're not entirely in the clear in an abstract sense, because this only covers cases in which an immediate input is of a known named composite type. Cases dealing in anonymous RECORD types simply can't be resolved statically. It might be that that can't occur in the specific situation of CREATE INDEX expressions, but I'm not 100% sure of it. The apparent counterexample of ... where row(a,b) < row(a,c) isn't one because we parse that as RowCompareExpr not an application of record_lt. > We agreed that having possible false positive dependencies was acceptable for > the initial implementation and that we will improve it in later versions, as > otherwise the alternative is to reindex everything without getting any warning, > which clearly isn't better anyway. [ shrug... ] You have both false positives and false negatives in the thing as it stood before f24b15699. I'm not convinced that it's possible to completely avoid either issue via static analysis. I'm inclined to think that false negatives around record_eq-like functions are not such a problem for real index definitions, and we'd be better off with fewer false positives. But it's all judgment calls. regards, tom lane
Hi, On 2021-04-18 11:29:42 -0400, Tom Lane wrote: > I'm not sure that an error in this direction is all that much more > problematic than the other direction. If it's okay to claim that > indexes need to be rebuilt when they don't really, then we could just > drop this entire overcomplicated infrastructure and report that all > indexes need to be rebuilt after any collation version change. That doesn't ring true to me. There's a huge difference between needing to rebuild all indexes, especially primary key indexes which often are over int8 etc, and unnecessarily needing to rebuild indexes doing comparatively rare things. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2021-04-18 11:29:42 -0400, Tom Lane wrote: >> I'm not sure that an error in this direction is all that much more >> problematic than the other direction. If it's okay to claim that >> indexes need to be rebuilt when they don't really, then we could just >> drop this entire overcomplicated infrastructure and report that all >> indexes need to be rebuilt after any collation version change. > That doesn't ring true to me. There's a huge difference between needing > to rebuild all indexes, especially primary key indexes which often are > over int8 etc, and unnecessarily needing to rebuild indexes doing > comparatively rare things. It would not be that hard to exclude indexes on int8, or other cases that clearly have zero collation dependencies. And I think I might have some faith in such a solution. Right now I have zero faith that the patch as it stands gives trustworthy answers. I think that the real fundamental bug is supposing that static analysis can give 100% correct answers. Even if it did do so in a given state of the database, consider this counterexample: create type myrow as (f1 int, f2 int); create table mytable (id bigint, r1 myrow, r2 myrow); create index myindex on mytable(id) where r1 < r2; alter type myrow add attribute f3 text; myindex is recorded as having no collation dependency, but that is now wrong. regards, tom lane
On Mon, Apr 19, 2021 at 10:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think that the real fundamental bug is supposing that static analysis > can give 100% correct answers. Even if it did do so in a given state > of the database, consider this counterexample: > > create type myrow as (f1 int, f2 int); > create table mytable (id bigint, r1 myrow, r2 myrow); > create index myindex on mytable(id) where r1 < r2; > alter type myrow add attribute f3 text; > > myindex is recorded as having no collation dependency, but that is > now wrong. Is it really the case that static analysis of the kind that you'd need to make this 100% robust is fundamentally impossible? I find that proposition hard to believe. I'm not sure that you were making a totally general statement, rather than a statement about the patch/implementation, so perhaps I just missed the point. -- Peter Geoghegan
On Sun, Apr 18, 2021 at 4:23 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > So IIUC the issue here is that the code could previously record useless > collation version dependencies in somes cases, which could lead to false > positive possible corruption messages (and of course additional bloat on > pg_depend). False positive messages can't be avoided anyway, as a collation > version update may not corrupt the actually indexed set of data, especially for > glibc. This argument seems completely absurd to me. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Mon, Apr 19, 2021 at 10:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think that the real fundamental bug is supposing that static analysis >> can give 100% correct answers. > Is it really the case that static analysis of the kind that you'd need > to make this 100% robust is fundamentally impossible? I find that > proposition hard to believe. I didn't mean to imply that it's necessarily theoretically impossible, but given our lack of visibility into what a function or operator will do, plus the way that the collation feature was bolted on with minimal system-level redesign, it's sure pretty darn hard. Code like record_eq is doing a lot at runtime that we can't really see from static analysis. Anyway, given the ALTER TYPE ADD ATTRIBUTE counterexample, I'm definitely starting to lean towards "revert and try again in v15". I feel we'd be best off to consider functions/operators that operate on container types to be "maybe"s rather than certainly safe or certainly not safe. I think that such things appear sufficiently rarely in index specifications that it's not worth it to try to do an exact analysis of them, even if we were sure we could get that 100% right. But that doesn't seem to be an idea that can trivially be added to the current design. regards, tom lane
On Mon, Apr 19, 2021 at 11:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I didn't mean to imply that it's necessarily theoretically impossible, > but given our lack of visibility into what a function or operator > will do, plus the way that the collation feature was bolted on > with minimal system-level redesign, it's sure pretty darn hard. > Code like record_eq is doing a lot at runtime that we can't really > see from static analysis. It's worth pointing out that code like record_eq is not (or at least should not be) fundamentally unpredictable and unruly. The fact that record_eq does typecache lookups and whatnot seems to me to be an implementation detail. What record_eq is entitled to assume about collations could be formalized by some general high-level specification. It ought to be possible to do this, just as it ought to be possible for us to statically determine if a composite type is safe to use with B-Tree deduplication. Whether or not it's worth the trouble is another matter, but it might be if a single effort solved a bunch of related problems, not just the collation dependency problem. > Anyway, given the ALTER TYPE ADD ATTRIBUTE counterexample, I'm > definitely starting to lean towards "revert and try again in v15". The counterexample concerns me because it seems to indicate a lack of sophistication in how dependencies are managed with corner cases -- I don't think that it's okay to leave the behavior unspecified in a stable release. But I also think that we should consider if code like record_eq is in fact the real problem (or just the lack of any general specification that constrains code like it in useful ways, perhaps). This probably won't affect whether or not the patch gets reverted now, but it still matters. -- Peter Geoghegan
On Tue, Apr 20, 2021 at 5:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think that the real fundamental bug is supposing that static analysis > can give 100% correct answers. ... Well, the goal was to perform analysis to the extent possible statically since that would cover the vast majority of cases and is practically all you can do. Clearly there is always going to be a category of invisible dependencies inside procedural code in general (halting problem). We did think about the idea of using new declarations about functions/operators to know which ones actually care about collation, rather than assuming that they all do (bugs aside), as an optimisation, and then that mechanism could in theory also be used to say that functions that don't appear to depend on collations actually do internal, but that all seemed like vast overkill, so we left it for possible later improvements. The question on my mind is whether reverting the feature and trying again for 15 could produce anything fundamentally better at a design level, or would just fix problems in the analyser code that we could fix right now. For example, if you think there actually is a potential better plan than using pg_depend for this, that'd definitely be good to know about. > ... Even if it did do so in a given state > of the database, consider this counterexample: > > create type myrow as (f1 int, f2 int); > create table mytable (id bigint, r1 myrow, r2 myrow); > create index myindex on mytable(id) where r1 < r2; > alter type myrow add attribute f3 text; > > myindex is recorded as having no collation dependency, but that is > now wrong. Hrmph. Yeah. We didn't consider types that change later like this, and handling those correctly does seem to warrant some more thought and work than we perhaps have time for. My first thought is that we'd need to teach it to trigger reanalysis.
Thomas Munro <thomas.munro@gmail.com> writes: > ... The question > on my mind is whether reverting the feature and trying again for 15 > could produce anything fundamentally better at a design level, or > would just fix problems in the analyser code that we could fix right > now. Well, as I said, I think what we ought to do is treat any record-accepting functions/operators as "don't know, better assume it's collation dependent". What's not clear to me is how that concept could be shoehorned into the existing design. > For example, if you think there actually is a potential better > plan than using pg_depend for this, that'd definitely be good to know > about. I really dislike using pg_depend, for a couple of reasons: * You've broken the invariant that dependencies on pinned objects are never recorded. Now, some of them exist, for reasons having nothing to do with the primary goals of pg_depend. If that's not a sign of bad relational design, I don't know what is. I didn't look at the code, but I wonder if you didn't have to lobotomize some error checks in dependency.c because of that. (Perhaps some sort of special-case representation for the default collation would help here?) * pg_depend used to always be all-not-null. Now, most rows in it will need a nulls bitmap, adding 8 bytes per row (on maxalign=8 hardware) to what had been fairly narrow rows. By my arithmetic that's 13.3% bloat in what is already one of our largest catalogs. That's quite unpleasant. (It would actually be cheaper to store an empty-string refobjversion for non-collation entries; a single-byte string would fit into the pad space after deptype, adding nothing to the row width.) > Hrmph. Yeah. We didn't consider types that change later like this, > and handling those correctly does seem to warrant some more thought > and work than we perhaps have time for. My first thought is that we'd > need to teach it to trigger reanalysis. That seems like a nonstarter, even before you think about race conditions. regards, tom lane
On Tue, Apr 20, 2021 at 8:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:> > Thomas Munro <thomas.munro@gmail.com> writes: > > For example, if you think there actually is a potential better > > plan than using pg_depend for this, that'd definitely be good to know > > about. > > I really dislike using pg_depend, for a couple of reasons: > > * You've broken the invariant that dependencies on pinned objects > are never recorded. Now, some of them exist, for reasons having > nothing to do with the primary goals of pg_depend. If that's not > a sign of bad relational design, I don't know what is. I didn't > look at the code, but I wonder if you didn't have to lobotomize > some error checks in dependency.c because of that. (Perhaps > some sort of special-case representation for the default > collation would help here?) Hmm, OK, thanks, that's something to go back and think about. > * pg_depend used to always be all-not-null. Now, most rows in it > will need a nulls bitmap, adding 8 bytes per row (on maxalign=8 > hardware) to what had been fairly narrow rows. By my arithmetic > that's 13.3% bloat in what is already one of our largest > catalogs. That's quite unpleasant. (It would actually be > cheaper to store an empty-string refobjversion for non-collation > entries; a single-byte string would fit into the pad space > after deptype, adding nothing to the row width.) That seems like a good idea. > > Hrmph. Yeah. We didn't consider types that change later like this, > > and handling those correctly does seem to warrant some more thought > > and work than we perhaps have time for. My first thought is that we'd > > need to teach it to trigger reanalysis. > > That seems like a nonstarter, even before you think about race > conditions. Yeah, that runs directly into non-trivial locking problems. I felt like some of the other complaints could conceivably be addressed in time, including dumb stuff like Windows default locale string format and hopefully some expression analysis problems, but not this. I'll hold off reverting for a few more days to see if anyone has any other thoughts on that, because there doesn't seem to be any advantage in being too hasty about it.
On Mon, Apr 19, 2021 at 11:13:37AM -0700, Peter Geoghegan wrote: > On Sun, Apr 18, 2021 at 4:23 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > So IIUC the issue here is that the code could previously record useless > > collation version dependencies in somes cases, which could lead to false > > positive possible corruption messages (and of course additional bloat on > > pg_depend). False positive messages can't be avoided anyway, as a collation > > version update may not corrupt the actually indexed set of data, especially for > > glibc. > > This argument seems completely absurd to me. I'm not sure why? For glibc at least, I don't see how we could not end up raising false positive as you have a single glibc version for all its collations. If a user has say en_US and fr_FR, or any quite stable collation, most of the glibc upgrades (except 2.28 of course) won't corrupt your indexes.
On Tue, Apr 20, 2021 at 12:05:27PM +1200, Thomas Munro wrote: > > Yeah, that runs directly into non-trivial locking problems. I felt > like some of the other complaints could conceivably be addressed in > time, including dumb stuff like Windows default locale string format > and hopefully some expression analysis problems, but not this. I'll > hold off reverting for a few more days to see if anyone has any other > thoughts on that, because there doesn't seem to be any advantage in > being too hasty about it. I also feel that the ALTER TYPE example Tom showed earlier isn't something trivial to fix and cannot be done in pg14 :(
On Mon, Apr 19, 2021 at 6:45 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > This argument seems completely absurd to me. > > I'm not sure why? For glibc at least, I don't see how we could not end up > raising false positive as you have a single glibc version for all its > collations. If a user has say en_US and fr_FR, or any quite stable collation, > most of the glibc upgrades (except 2.28 of course) won't corrupt your indexes. If the versions differ and your index happens to not be corrupt because it just so happened to not depend on any of the rules that have changed, then a complaint about the collation versions changing is not what I'd call a false positive. You can call it that if you want, I suppose -- it's just a question of semantics. But I don't think you should conflate two very different things. You seem to be suggesting that they're equivalent just because you can refer to both of them using the same term. It's obvious that you could have an absence of index corruption even in the presence of a collation incompatibility. Especially when there is only 1 tuple in the index, say -- obviously the core idea is to manage the dependency on versioned collations, which isn't magic. Do you really think that's equivalent to having incorrect version dependencies? -- Peter Geoghegan
On Mon, Apr 19, 2021 at 07:27:24PM -0700, Peter Geoghegan wrote: > On Mon, Apr 19, 2021 at 6:45 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > This argument seems completely absurd to me. > > > > I'm not sure why? For glibc at least, I don't see how we could not end up > > raising false positive as you have a single glibc version for all its > > collations. If a user has say en_US and fr_FR, or any quite stable collation, > > most of the glibc upgrades (except 2.28 of course) won't corrupt your indexes. > > If the versions differ and your index happens to not be corrupt > because it just so happened to not depend on any of the rules that > have changed, then a complaint about the collation versions changing > is not what I'd call a false positive. You can call it that if you > want, I suppose -- it's just a question of semantics. But I don't > think you should conflate two very different things. You seem to be > suggesting that they're equivalent just because you can refer to both > of them using the same term. > > It's obvious that you could have an absence of index corruption even > in the presence of a collation incompatibility. Especially when there > is only 1 tuple in the index, say Yes, and technically you could still have corruption on indexes containing 1 or even 0 rows in case of collation provider upgrade, eg if you have a WHERE clause on the index that does depend on a collation. > -- obviously the core idea is to > manage the dependency on versioned collations, which isn't magic. Do > you really think that's equivalent to having incorrect version > dependencies? No I don't think that's equivalent. What I wanted to say that it's impossible to raise a WARNING only if the index can really be corrupted (corner cases like empty tables or similar apart) for instance because of how glibc report versions, so raising WARNING in some limited corner cases that definitely can't be corrupted (like because the index expression itself doesn't depend on the ordering), which clearly isn't the same thing, was in my opinion an acceptable trade-off in a first version. Sorry if that was (or still is) poorly worded. In any case it was proven that the current approach has way bigger deficiencies so it's probably not relevant anymore.
On Tue, Apr 20, 2021 at 1:48 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > On Tue, Apr 20, 2021 at 12:05:27PM +1200, Thomas Munro wrote: > > Yeah, that runs directly into non-trivial locking problems. I felt > > like some of the other complaints could conceivably be addressed in > > time, including dumb stuff like Windows default locale string format > > and hopefully some expression analysis problems, but not this. I'll > > hold off reverting for a few more days to see if anyone has any other > > thoughts on that, because there doesn't seem to be any advantage in > > being too hasty about it. > > I also feel that the ALTER TYPE example Tom showed earlier isn't something > trivial to fix and cannot be done in pg14 :( Just an idea: It might be possible to come up with a scheme where ALTER TYPE ADD ATTRIBUTE records versions somewhere at column add time, and index_check_collation_versions() finds and checks those when they aren't superseded by index->collation versions created by REINDEX, or already present due to other dependencies on the same collation. Of course, the opposite problem applies when you ALTER TYPE DROP ATTRIBUTE: you might have some zombie refobjversions you don't need anymore, but that would seem to be the least of your worries if you drop attributes from composite types used in indexes: create type myrow as (f1 int, f2 int); create table mytable (r1 myrow primary key); insert into mytable select row(generate_series(1, 10), generate_series(10, 1, -1))::myrow; select * from mytable; alter type myrow drop attribute f1; select * from mytable; select * from mytable where r1 = row(6); -- !!! reindex table mytable; select * from mytable where r1 = row(6);
Hi, On 2021-04-20 12:05:27 +1200, Thomas Munro wrote: > I'll hold off reverting for a few more days to see if anyone has any > other thoughts on that, because there doesn't seem to be any advantage > in being too hasty about it. I'm not really convinced that this is warranted, and that it isn't better addressed by reducing the scope of the feature: When using index collation versions to decide whether to reindex individual indexes it is important to not have any false negatives - otherwise the feature could trigger corruption. However, the feature has a second, IMO more crucial, aspect: Preventing silent corruption due to collation changes. There are regular reports of people corrupting their indexes (and subsequently constraints) due to collation changes (or collation differences between primary/replica). To be effective detecting such cases it is not required to catch 100% of all dangerous cases, just that a high fraction of cases is caught. And handling the composite type case doesn't seem like it'd impact the percentage of detected collation issues all that much. For one, indexes on composite types aren't all that common, and additing new columns to those composite types is likely even rarer. For another, I'd expect that nearly all databases that have indexes on composite types also have indexes on non-composite text columns - which'd be likely to catch the issue. Given that this is a regularly occurring source of corruption for users, and not one just negligent operators run into (we want people to upgrade OS versions), I think we ought to factor that into our decision what to do. Greetings, Andres Freund
On 4/21/21 4:28 PM, Andres Freund wrote: > Hi, > > On 2021-04-20 12:05:27 +1200, Thomas Munro wrote: >> I'll hold off reverting for a few more days to see if anyone has any >> other thoughts on that, because there doesn't seem to be any advantage >> in being too hasty about it. > I'm not really convinced that this is warranted, and that it isn't > better addressed by reducing the scope of the feature: > > When using index collation versions to decide whether to reindex > individual indexes it is important to not have any false negatives - > otherwise the feature could trigger corruption. > > However, the feature has a second, IMO more crucial, aspect: Preventing > silent corruption due to collation changes. There are regular reports of > people corrupting their indexes (and subsequently constraints) due to > collation changes (or collation differences between primary/replica). > To be effective detecting such cases it is not required to catch 100% of > all dangerous cases, just that a high fraction of cases is caught. > > And handling the composite type case doesn't seem like it'd impact the > percentage of detected collation issues all that much. For one, indexes > on composite types aren't all that common, and additing new columns to > those composite types is likely even rarer. For another, I'd expect that > nearly all databases that have indexes on composite types also have > indexes on non-composite text columns - which'd be likely to catch the > issue. > > Given that this is a regularly occurring source of corruption for users, > and not one just negligent operators run into (we want people to upgrade > OS versions), I think we ought to factor that into our decision what to > do. > Hi, this is an open item for release 14 . The discussion seems to have gone silent for a couple of weeks. Are we in a position to make any decisions? I hear what Andres says, but is anyone acting on it? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Thu, May 6, 2021 at 8:58 AM Andrew Dunstan <andrew@dunslane.net> wrote: > this is an open item for release 14 . The discussion seems to have gone > silent for a couple of weeks. Are we in a position to make any > decisions? I hear what Andres says, but is anyone acting on it? I'm going to revert this and resubmit for 15. That'll give proper time to reconsider the question of whether pg_depend is right for this, and come up with a non-rushed response to the composite type problem etc.
On 5/5/21 5:12 PM, Thomas Munro wrote: > On Thu, May 6, 2021 at 8:58 AM Andrew Dunstan <andrew@dunslane.net> wrote: >> this is an open item for release 14 . The discussion seems to have gone >> silent for a couple of weeks. Are we in a position to make any >> decisions? I hear what Andres says, but is anyone acting on it? > I'm going to revert this and resubmit for 15. That'll give proper > time to reconsider the question of whether pg_depend is right for > this, and come up with a non-rushed response to the composite type > problem etc. OK, thanks. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Thu, May 6, 2021 at 9:23 AM Andrew Dunstan <andrew@dunslane.net> wrote: > On 5/5/21 5:12 PM, Thomas Munro wrote: > > On Thu, May 6, 2021 at 8:58 AM Andrew Dunstan <andrew@dunslane.net> wrote: > >> this is an open item for release 14 . The discussion seems to have gone > >> silent for a couple of weeks. Are we in a position to make any > >> decisions? I hear what Andres says, but is anyone acting on it? > > I'm going to revert this and resubmit for 15. That'll give proper > > time to reconsider the question of whether pg_depend is right for > > this, and come up with a non-rushed response to the composite type > > problem etc. > > OK, thanks. Reverted. Rebasing notes: 1. Commit b4c9695e moved toast table declarations so I adapted to the new scheme, but commit 0cc99327 had taken the OIDs that pg_collation was previously using, so I had to pick some new ones from the temporary range for later reassignment. 2. It took me quite a while to figure out that the collversion column now needs BKI_DEFAULT(_null_), or the perl script wouldn't accept the contents of pg_collation.dat. 3. In a separate commit, I rescued a few sentences of text from the documentation about libc collation versions and reinstated them in the most obvious place, because although the per-index tracking has been reverted, the per-collation version tracking (limited as it is) is now back and works on more OSes than before.