Thread: Fuzzy thinking in is_publishable_class
is_publishable_class has a test "relid >= FirstNormalObjectId", which I think we should drop, for two reasons: 1. It makes the comment claiming that this function tests the same things as check_publication_add_relation a lie. 2. The comment about it claims that the purpose is to reject information_schema relations, but if that's so, it's ineffective. We consider it supported to drop and recreate information_schema, and have indeed recommended doing so for some minor-version upgrades. After that, the information_schema relations would no longer have OIDs recognizable to this test. So what is the motivation for this test? If there's an important reason for it, we need to find a less fragile way to express it. regards, tom lane
I wrote: > is_publishable_class has a test "relid >= FirstNormalObjectId", > which I think we should drop, for two reasons: > 1. It makes the comment claiming that this function tests the same > things as check_publication_add_relation a lie. > 2. The comment about it claims that the purpose is to reject > information_schema relations, but if that's so, it's ineffective. > We consider it supported to drop and recreate information_schema, > and have indeed recommended doing so for some minor-version > upgrades. After that, the information_schema relations would no > longer have OIDs recognizable to this test. > So what is the motivation for this test? If there's an important > reason for it, we need to find a less fragile way to express it. After further digging around, I wonder whether this test wasn't somehow related to the issue described in https://postgr.es/m/2321.1557263978@sss.pgh.pa.us That doesn't completely make sense, since the restriction on relkind should render it moot whether IsCatalogClass thinks that a toast table is a catalog table, but maybe there's a link? regards, tom lane
I wrote: > is_publishable_class has a test "relid >= FirstNormalObjectId", > which I think we should drop, for two reasons: > ... > So what is the motivation for this test? If there's an important > reason for it, we need to find a less fragile way to express it. I tried removing the FirstNormalObjectId check, and found that the reason for it seems to be "the subscription/t/004_sync.pl test falls over without it". That's because that test supposes that the *only* entry in pg_subscription_rel will be for the test table that it creates. Without the FirstNormalObjectId check, the information_schema relations also show up in pg_subscription_rel, confusing the script's simplistic status check. I'm of two minds what to do about that. One approach is to just define a "FOR ALL TABLES" publication as including the information_schema tables, in which case 004_sync.pl is wrong and we should fix it by adding a suitable WHERE restriction to its pg_subscription_rel check. However, possibly that would break some applications that are likewise assuming that no built-in tables appear in pg_subscription_rel. But, if what we want is the definition that "information_schema is excluded from publishable tables", I'm not satisfied with this implementation of that rule. Dropping/recreating information_schema would cause the behavior to change. We could, at the cost of an additional syscache lookup, check the name of the schema that a potentially publishable table belongs to and exclude information_schema by name. I don't have much idea about how performance-critical is_publishable_class is, so I don't know how acceptable that seems. regards, tom lane
On 2019-05-09 04:37, Tom Lane wrote: > I tried removing the FirstNormalObjectId check, and found that the > reason for it seems to be "the subscription/t/004_sync.pl test > falls over without it". That's because that test supposes that > the *only* entry in pg_subscription_rel will be for the test table > that it creates. Without the FirstNormalObjectId check, the > information_schema relations also show up in pg_subscription_rel, > confusing the script's simplistic status check. right > I'm of two minds what to do about that. One approach is to just > define a "FOR ALL TABLES" publication as including the information_schema > tables, certainly not > But, if what we want is the definition that "information_schema is > excluded from publishable tables", I'm not satisfied with this > implementation of that rule. Dropping/recreating information_schema > would cause the behavior to change. We could, at the cost of an > additional syscache lookup, check the name of the schema that a > potentially publishable table belongs to and exclude information_schema > by name. I don't have much idea about how performance-critical > is_publishable_class is, so I don't know how acceptable that seems. I would classify the tables in information_schema on the side of being a system catalog, meaning that they are not replicated and they are covered by whatever REINDEX SYSTEM thinks it should cover. It would also make sense to integrate both of these concepts more consistently with the user_catalog_table feature. Perhaps the information_schema tables could be made user catalogs. Really we should just have a single flag in pg_class that says "I'm a catalog", applicable both to built-in catalogs and to user-defined catalogs. I think we can get rid of the ability to reload the information_schema after initdb. That was interesting in the early phase of its development, but now it just creates complications. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 09/05/2019 04:37, Tom Lane wrote: > I wrote: >> is_publishable_class has a test "relid >= FirstNormalObjectId", >> which I think we should drop, for two reasons: >> ... >> So what is the motivation for this test? If there's an important >> reason for it, we need to find a less fragile way to express it. > > I tried removing the FirstNormalObjectId check, and found that the > reason for it seems to be "the subscription/t/004_sync.pl test > falls over without it". That's because that test supposes that > the *only* entry in pg_subscription_rel will be for the test table > that it creates. Without the FirstNormalObjectId check, the > information_schema relations also show up in pg_subscription_rel, > confusing the script's simplistic status check. > > I'm of two minds what to do about that. One approach is to just > define a "FOR ALL TABLES" publication as including the information_schema > tables, in which case 004_sync.pl is wrong and we should fix it by > adding a suitable WHERE restriction to its pg_subscription_rel check. > However, possibly that would break some applications that are likewise > assuming that no built-in tables appear in pg_subscription_rel. > I was and still am worried that including information_schema in "FOR ALL TABLES" will result in breakage or at least unexpected behavior in case user adjusts anything in the information_schema catalogs. IMHO only user created tables should be part of "FOR ALL TABLES" hence the FirstNormalObjectId check. The fact that information_schema can be recreated and is not considered system catalog by some commands but kind of is by others is more problem of how we added information_schema and it's definitely not ideal, we should either consider it system schema like pg_catalog is or consider it everywhere an user catalog. For me the latter makes little sense given that it comes with the database. > But, if what we want is the definition that "information_schema is > excluded from publishable tables", I'm not satisfied with this > implementation of that rule. Dropping/recreating information_schema > would cause the behavior to change. We could, at the cost of an > additional syscache lookup, check the name of the schema that a > potentially publishable table belongs to and exclude information_schema > by name. I don't have much idea about how performance-critical > is_publishable_class is, so I don't know how acceptable that seems. > I think we need a better way of identifying what's part of system and what's user created in general. The FirstNormalObjectId seems somewhat okay approximation, but then we have plenty of other ways for checking, maybe it's time to consolidate it into some extra column in pg_class? -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions https://www.2ndQuadrant.com/
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 2019-05-09 04:37, Tom Lane wrote: >> But, if what we want is the definition that "information_schema is >> excluded from publishable tables", I'm not satisfied with this >> implementation of that rule. > ... It would also make sense to integrate both of these concepts more > consistently with the user_catalog_table feature. Perhaps the > information_schema tables could be made user catalogs. Really we should > just have a single flag in pg_class that says "I'm a catalog", > applicable both to built-in catalogs and to user-defined catalogs. I do not want to go there because (a) it means that you can't tell a catalog from a non-catalog without a catalog lookup, which has got obvious circularity problems, and (b) the idea that a user can add a catalog without hacking the C code is silly on its face. I would say that the actual important functional distinction between a catalog and a user table is whether the C code knows about it. Perhaps, for replication purposes, there's some value in having a third category of tables that are treated more nearly like catalogs than user tables in whether-to-replicate decisions. But let's not fuzz the issue by calling them catalogs. I think just calling it a "NO REPLICATE" property would be less confusing. > I think we can get rid of the ability to reload the information_schema > after initdb. That was interesting in the early phase of its > development, but now it just creates complications. We've relied on that more than once to allow minor-release updates of information_schema views, so I think losing the ability to do it is a bad idea. regards, tom lane
Petr Jelinek <petr.jelinek@2ndquadrant.com> writes: > I think we need a better way of identifying what's part of system and > what's user created in general. The FirstNormalObjectId seems somewhat > okay approximation, but then we have plenty of other ways for checking, > maybe it's time to consolidate it into some extra column in pg_class? I'd be on board with adding "bool relpublishable" or the like to pg_class. We'd also need infrastructure for setting that, of course, so it's not a five-minute fix. In the meantime I guess we have to leave the is_publishable_class test like it is. I am thinking though that the replication code's tests of type OIDs against FirstNormalObjectId are broken. The essential property that those are after, IIUC, is "will the remote server certainly have the same definition of this type as the local one?" That is *absolutely not guaranteed* for types defined in information_schema, because their OIDs aren't locked down and could plausibly be different across installations. I forget whether we load collations before or after information_schema, so this might or might not be a live bug today, but it's certainly something waiting to bite us on the rear. Actually --- that's for logical replication, isn't it? And we allow logical replication across versions, don't we? If so, it is a live bug. Only hand-assigned type OIDs should be trusted to hold still across major versions. In short I think we'd better s/FirstNormalObjectId/FirstGenbkiObjectId/ in logical/relation.c and pgoutput/pgoutput.c, and I think that's probably a back-patchable bug fix of some urgency. regards, tom lane
Hi, On 2019-05-09 09:30:50 +0200, Peter Eisentraut wrote: > On 2019-05-09 04:37, Tom Lane wrote: > > I'm of two minds what to do about that. One approach is to just > > define a "FOR ALL TABLES" publication as including the information_schema > > tables, > > certainly not Yea, that strikes me as a bad idea too. > It would also make sense to integrate both of these concepts more > consistently with the user_catalog_table feature. Perhaps the > information_schema tables could be made user catalogs. Really we should > just have a single flag in pg_class that says "I'm a catalog", > applicable both to built-in catalogs and to user-defined catalogs. Hm - I'm not convinced by that. There's some lower-level reasons why we can't easily replicate changes to system catalogs, but those don't exist for user catalog tables. And in fact, they can be replicated today. > I think we can get rid of the ability to reload the information_schema > after initdb. That was interesting in the early phase of its > development, but now it just creates complications. Yea, I'm far from convinced it's worth having that available. I wonder if we at least could have the reordering instructions not drop information_schema, so we'd have a stable oid for that. Or use some pg_upgrade style logic to recreate it. Or have NamespaceCreate() just hardcode the relevant oid for information_schema. Greetings, Andres Freund
On Thu, May 9, 2019 at 9:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I think we can get rid of the ability to reload the information_schema > > after initdb. That was interesting in the early phase of its > > development, but now it just creates complications. > > We've relied on that more than once to allow minor-release updates of > information_schema views, so I think losing the ability to do it is > a bad idea. +1 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2019-05-09 15:41, Tom Lane wrote: >> I think we can get rid of the ability to reload the information_schema >> after initdb. That was interesting in the early phase of its >> development, but now it just creates complications. > We've relied on that more than once to allow minor-release updates of > information_schema views, so I think losing the ability to do it is > a bad idea. In those cases we used CREATE OR REPLACE VIEW, which preserves OIDs. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services