Thread: Fuzzy thinking in is_publishable_class

Fuzzy thinking in is_publishable_class

From
Tom Lane
Date:
is_publishable_class has a test "relid >= FirstNormalObjectId",
which I think we should drop, for two reasons:

1. It makes the comment claiming that this function tests the same
things as check_publication_add_relation a lie.

2. The comment about it claims that the purpose is to reject
information_schema relations, but if that's so, it's ineffective.
We consider it supported to drop and recreate information_schema,
and have indeed recommended doing so for some minor-version
upgrades.  After that, the information_schema relations would no
longer have OIDs recognizable to this test.

So what is the motivation for this test?  If there's an important
reason for it, we need to find a less fragile way to express it.

            regards, tom lane



Re: Fuzzy thinking in is_publishable_class

From
Tom Lane
Date:
I wrote:
> is_publishable_class has a test "relid >= FirstNormalObjectId",
> which I think we should drop, for two reasons:

> 1. It makes the comment claiming that this function tests the same
> things as check_publication_add_relation a lie.

> 2. The comment about it claims that the purpose is to reject
> information_schema relations, but if that's so, it's ineffective.
> We consider it supported to drop and recreate information_schema,
> and have indeed recommended doing so for some minor-version
> upgrades.  After that, the information_schema relations would no
> longer have OIDs recognizable to this test.

> So what is the motivation for this test?  If there's an important
> reason for it, we need to find a less fragile way to express it.

After further digging around, I wonder whether this test wasn't
somehow related to the issue described in

https://postgr.es/m/2321.1557263978@sss.pgh.pa.us

That doesn't completely make sense, since the restriction on
relkind should render it moot whether IsCatalogClass thinks
that a toast table is a catalog table, but maybe there's a link?

            regards, tom lane



Re: Fuzzy thinking in is_publishable_class

From
Tom Lane
Date:
I wrote:
> is_publishable_class has a test "relid >= FirstNormalObjectId",
> which I think we should drop, for two reasons:
> ...
> So what is the motivation for this test?  If there's an important
> reason for it, we need to find a less fragile way to express it.

I tried removing the FirstNormalObjectId check, and found that the
reason for it seems to be "the subscription/t/004_sync.pl test
falls over without it".  That's because that test supposes that
the *only* entry in pg_subscription_rel will be for the test table
that it creates.  Without the FirstNormalObjectId check, the
information_schema relations also show up in pg_subscription_rel,
confusing the script's simplistic status check.

I'm of two minds what to do about that.  One approach is to just
define a "FOR ALL TABLES" publication as including the information_schema
tables, in which case 004_sync.pl is wrong and we should fix it by
adding a suitable WHERE restriction to its pg_subscription_rel check.
However, possibly that would break some applications that are likewise
assuming that no built-in tables appear in pg_subscription_rel.

But, if what we want is the definition that "information_schema is
excluded from publishable tables", I'm not satisfied with this
implementation of that rule.  Dropping/recreating information_schema
would cause the behavior to change.  We could, at the cost of an
additional syscache lookup, check the name of the schema that a
potentially publishable table belongs to and exclude information_schema
by name.  I don't have much idea about how performance-critical
is_publishable_class is, so I don't know how acceptable that seems.

            regards, tom lane



Re: Fuzzy thinking in is_publishable_class

From
Peter Eisentraut
Date:
On 2019-05-09 04:37, Tom Lane wrote:
> I tried removing the FirstNormalObjectId check, and found that the
> reason for it seems to be "the subscription/t/004_sync.pl test
> falls over without it".  That's because that test supposes that
> the *only* entry in pg_subscription_rel will be for the test table
> that it creates.  Without the FirstNormalObjectId check, the
> information_schema relations also show up in pg_subscription_rel,
> confusing the script's simplistic status check.

right

> I'm of two minds what to do about that.  One approach is to just
> define a "FOR ALL TABLES" publication as including the information_schema
> tables,

certainly not

> But, if what we want is the definition that "information_schema is
> excluded from publishable tables", I'm not satisfied with this
> implementation of that rule.  Dropping/recreating information_schema
> would cause the behavior to change.  We could, at the cost of an
> additional syscache lookup, check the name of the schema that a
> potentially publishable table belongs to and exclude information_schema
> by name.  I don't have much idea about how performance-critical
> is_publishable_class is, so I don't know how acceptable that seems.

I would classify the tables in information_schema on the side of being a
system catalog, meaning that they are not replicated and they are
covered by whatever REINDEX SYSTEM thinks it should cover.

It would also make sense to integrate both of these concepts more
consistently with the user_catalog_table feature.  Perhaps the
information_schema tables could be made user catalogs.  Really we should
just have a single flag in pg_class that says "I'm a catalog",
applicable both to built-in catalogs and to user-defined catalogs.

I think we can get rid of the ability to reload the information_schema
after initdb.  That was interesting in the early phase of its
development, but now it just creates complications.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Fuzzy thinking in is_publishable_class

From
Petr Jelinek
Date:

On 09/05/2019 04:37, Tom Lane wrote:
> I wrote:
>> is_publishable_class has a test "relid >= FirstNormalObjectId",
>> which I think we should drop, for two reasons:
>> ...
>> So what is the motivation for this test?  If there's an important
>> reason for it, we need to find a less fragile way to express it.
> 
> I tried removing the FirstNormalObjectId check, and found that the
> reason for it seems to be "the subscription/t/004_sync.pl test
> falls over without it".  That's because that test supposes that
> the *only* entry in pg_subscription_rel will be for the test table
> that it creates.  Without the FirstNormalObjectId check, the
> information_schema relations also show up in pg_subscription_rel,
> confusing the script's simplistic status check.
> 
> I'm of two minds what to do about that.  One approach is to just
> define a "FOR ALL TABLES" publication as including the information_schema
> tables, in which case 004_sync.pl is wrong and we should fix it by
> adding a suitable WHERE restriction to its pg_subscription_rel check.
> However, possibly that would break some applications that are likewise
> assuming that no built-in tables appear in pg_subscription_rel.
> 

I was and still am worried that including information_schema in "FOR ALL
TABLES" will result in breakage or at least unexpected behavior in case
user adjusts anything in the information_schema catalogs.

IMHO only user created tables should be part of "FOR ALL TABLES" hence
the FirstNormalObjectId check.

The fact that information_schema can be recreated and is not considered
system catalog by some commands but kind of is by others is more problem
of how we added information_schema and it's definitely not ideal, we
should either consider it system schema like pg_catalog is or consider
it everywhere an user catalog. For me the latter makes little sense
given that it comes with the database.

> But, if what we want is the definition that "information_schema is
> excluded from publishable tables", I'm not satisfied with this
> implementation of that rule.  Dropping/recreating information_schema
> would cause the behavior to change.  We could, at the cost of an
> additional syscache lookup, check the name of the schema that a
> potentially publishable table belongs to and exclude information_schema
> by name.  I don't have much idea about how performance-critical
> is_publishable_class is, so I don't know how acceptable that seems.
> 

I think we need a better way of identifying what's part of system and
what's user created in general. The FirstNormalObjectId seems somewhat
okay approximation, but then we have plenty of other ways for checking,
maybe it's time to consolidate it into some extra column in pg_class?

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions
https://www.2ndQuadrant.com/



Re: Fuzzy thinking in is_publishable_class

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 2019-05-09 04:37, Tom Lane wrote:
>> But, if what we want is the definition that "information_schema is
>> excluded from publishable tables", I'm not satisfied with this
>> implementation of that rule.

> ... It would also make sense to integrate both of these concepts more
> consistently with the user_catalog_table feature.  Perhaps the
> information_schema tables could be made user catalogs.  Really we should
> just have a single flag in pg_class that says "I'm a catalog",
> applicable both to built-in catalogs and to user-defined catalogs.

I do not want to go there because (a) it means that you can't tell a
catalog from a non-catalog without a catalog lookup, which has got
obvious circularity problems, and (b) the idea that a user can add
a catalog without hacking the C code is silly on its face.  I would
say that the actual important functional distinction between a catalog
and a user table is whether the C code knows about it.

Perhaps, for replication purposes, there's some value in having a
third category of tables that are treated more nearly like catalogs
than user tables in whether-to-replicate decisions.  But let's not
fuzz the issue by calling them catalogs.  I think just calling it
a "NO REPLICATE" property would be less confusing.

> I think we can get rid of the ability to reload the information_schema
> after initdb.  That was interesting in the early phase of its
> development, but now it just creates complications.

We've relied on that more than once to allow minor-release updates of
information_schema views, so I think losing the ability to do it is
a bad idea.

            regards, tom lane



Re: Fuzzy thinking in is_publishable_class

From
Tom Lane
Date:
Petr Jelinek <petr.jelinek@2ndquadrant.com> writes:
> I think we need a better way of identifying what's part of system and
> what's user created in general. The FirstNormalObjectId seems somewhat
> okay approximation, but then we have plenty of other ways for checking,
> maybe it's time to consolidate it into some extra column in pg_class?

I'd be on board with adding "bool relpublishable" or the like to pg_class.
We'd also need infrastructure for setting that, of course, so it's not
a five-minute fix.  In the meantime I guess we have to leave the
is_publishable_class test like it is.

I am thinking though that the replication code's tests of type OIDs
against FirstNormalObjectId are broken.  The essential property that
those are after, IIUC, is "will the remote server certainly have the
same definition of this type as the local one?"  That is *absolutely
not guaranteed* for types defined in information_schema, because
their OIDs aren't locked down and could plausibly be different across
installations.  I forget whether we load collations before or after
information_schema, so this might or might not be a live bug today,
but it's certainly something waiting to bite us on the rear.

Actually --- that's for logical replication, isn't it?  And we allow
logical replication across versions, don't we?  If so, it is a live
bug.  Only hand-assigned type OIDs should be trusted to hold still
across major versions.

In short I think we'd better s/FirstNormalObjectId/FirstGenbkiObjectId/
in logical/relation.c and pgoutput/pgoutput.c, and I think that's
probably a back-patchable bug fix of some urgency.

            regards, tom lane



Re: Fuzzy thinking in is_publishable_class

From
Andres Freund
Date:
Hi,

On 2019-05-09 09:30:50 +0200, Peter Eisentraut wrote:
> On 2019-05-09 04:37, Tom Lane wrote:
> > I'm of two minds what to do about that.  One approach is to just
> > define a "FOR ALL TABLES" publication as including the information_schema
> > tables,
> 
> certainly not

Yea, that strikes me as a bad idea too.


> It would also make sense to integrate both of these concepts more
> consistently with the user_catalog_table feature.  Perhaps the
> information_schema tables could be made user catalogs.  Really we should
> just have a single flag in pg_class that says "I'm a catalog",
> applicable both to built-in catalogs and to user-defined catalogs.

Hm - I'm not convinced by that. There's some lower-level reasons why we
can't easily replicate changes to system catalogs, but those don't exist
for user catalog tables. And in fact, they can be replicated today.


> I think we can get rid of the ability to reload the information_schema
> after initdb.  That was interesting in the early phase of its
> development, but now it just creates complications.

Yea, I'm far from convinced it's worth having that available. I wonder
if we at least could have the reordering instructions not drop
information_schema, so we'd have a stable oid for that. Or use some
pg_upgrade style logic to recreate it. Or have NamespaceCreate() just
hardcode the relevant oid for information_schema.

Greetings,

Andres Freund



Re: Fuzzy thinking in is_publishable_class

From
Robert Haas
Date:
On Thu, May 9, 2019 at 9:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I think we can get rid of the ability to reload the information_schema
> > after initdb.  That was interesting in the early phase of its
> > development, but now it just creates complications.
>
> We've relied on that more than once to allow minor-release updates of
> information_schema views, so I think losing the ability to do it is
> a bad idea.

+1

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Fuzzy thinking in is_publishable_class

From
Peter Eisentraut
Date:
On 2019-05-09 15:41, Tom Lane wrote:
>> I think we can get rid of the ability to reload the information_schema
>> after initdb.  That was interesting in the early phase of its
>> development, but now it just creates complications.
> We've relied on that more than once to allow minor-release updates of
> information_schema views, so I think losing the ability to do it is
> a bad idea.

In those cases we used CREATE OR REPLACE VIEW, which preserves OIDs.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services